en:background:vocabulary
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:background:vocabulary [2022-06-19 13:02] – [How are words made into candidates?] christian | en:background:vocabulary [2022-11-08 12:04] (current) – [Custom selection of a word to add] Restore original spelling of link christian | ||
---|---|---|---|
Line 40: | Line 40: | ||
One common reason is that words are needed in order to express other words that were automatically selected for addition by the algorithm. For example, the concept //today (on the current day)// is expressed as **si den** in Lugamun, so the words **si** 'this, these' and **den** ' | One common reason is that words are needed in order to express other words that were automatically selected for addition by the algorithm. For example, the concept //today (on the current day)// is expressed as **si den** in Lugamun, so the words **si** 'this, these' and **den** ' | ||
- | The other common reason is that words are needed in order to express some specific content in Lugamun. For example, to translate the fable [[trans:Fen Norte wa Sol|The North Wind and the Sun]], the words **norte** ' | + | The other common reason is that words are needed in order to express some specific content in Lugamun. For example, to translate the fable [[trans:Fen Norte va Sol|The North Wind and the Sun]], the words **norte** ' |
===== Can a word in Lugamun represent several concepts? ===== | ===== Can a word in Lugamun represent several concepts? ===== | ||
Line 50: | Line 50: | ||
Lugamun largely relies on an algorithm that converts the corresponding words from all our [[source languages]] into a form that fits our [[en: grammar: | Lugamun largely relies on an algorithm that converts the corresponding words from all our [[source languages]] into a form that fits our [[en: grammar: | ||
- | XXX Integrate that article into the wiki and update it as needed. Explain that words are now ranked first by the number of related candidates, only then by penalty. | + | XXX Integrate that article into the wiki and update it as needed |
This algorithmic order is only a proposal, the ultimate decision on which word to add is made by a human. Often, but not always, it is indeed the best candidate as determined by the algorithm. The rankings determined by the algorithm and the ultimate choice made are always documented in the [[https:// | This algorithmic order is only a proposal, the ultimate decision on which word to add is made by a human. Often, but not always, it is indeed the best candidate as determined by the algorithm. The rankings determined by the algorithm and the ultimate choice made are always documented in the [[https:// | ||
Line 86: | Line 86: | ||
* In Swahili, the infinitive prefix //ku-// is stripped, except in the case of short verbs with just two syllables, since these tend to preserve the prefix in many conjugated forms. Hence //kula// remains unchanged, but //kuona// yields the candidate //ona//. | * In Swahili, the infinitive prefix //ku-// is stripped, except in the case of short verbs with just two syllables, since these tend to preserve the prefix in many conjugated forms. Hence //kula// remains unchanged, but //kuona// yields the candidate //ona//. | ||
* Japanese verbs generally end in //-u//, therefore this final vowel contributes little to making the word recognizable. Hence we strip it from long verbs with three (or more) syllables if the result is allowed by Lugamun' | * Japanese verbs generally end in //-u//, therefore this final vowel contributes little to making the word recognizable. Hence we strip it from long verbs with three (or more) syllables if the result is allowed by Lugamun' | ||
+ | |||
+ | ===== How are distortion penalties calculated? ===== | ||
+ | |||
+ | A distortion penalty is applied when Lugamun' | ||
+ | |||
+ | Generally, certain cases result in "one penalty" | ||
+ | |||
+ | * Changing a single vowel into Lugamun' | ||
+ | * Reducing a diphthong to a single vowel incurs a penalty; e.g. English //name//, pronounced /neɪm/, becomes //nem// with a penalty. However, there are certain cases where the original vowel may typically be perceived as a single vowel rather than a diphthong by the speakers of the source language. In such cases, no penalty applies. One example is English /oʊ/ as in //goat//, which becomes //o// without a penalty. | ||
+ | * Nasalized vowels are converted to the nearest vowel in Lugamun followed by //n//, without a penalty. | ||
+ | * No penalty is applied when converting a consonant to another, as long as both are derived from the same basic letter in IPA. For example, the IPA representations of all rhotic consonants are derived from the letter //r//, therefore they are all converted to //r// without a penalty. | ||
+ | * Otherwise a penalty is applied when a consonantal sound doesn' | ||
+ | * Usually /ŋ/ is changed to //n// without a penalty, following the rules described above. However, in the case of Mandarin, a penalty is applied, since these are essentially the only consonantal endings allowed, making this a pretty severe change. | ||
+ | * A penalty is applied when dropping a consonant or adding a vowel (always //e//) for phonotactic reasons. | ||
+ | |||
+ | To be considered eligible for selection, a word can have at most //one// penalty applied to it. Those with two or more penalties are automatically skipped when sorting candidates. Originally it was just a rule of thumb that such more severely distorted words would be skipped, but meanwhile it has become a inherent part of the candidate generation process. | ||
===== How are compounds and derived words chosen? ===== | ===== How are compounds and derived words chosen? ===== | ||
XXX Explain. | XXX Explain. |
en/background/vocabulary.txt · Last modified: 2022-11-08 12:04 by christian