en:background:vocabulary
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:background:vocabulary [2022-06-18 10:56] – [How are words made into candidates?] christian | en:background:vocabulary [2022-11-08 12:04] (current) – [Custom selection of a word to add] Restore original spelling of link christian | ||
---|---|---|---|
Line 40: | Line 40: | ||
One common reason is that words are needed in order to express other words that were automatically selected for addition by the algorithm. For example, the concept //today (on the current day)// is expressed as **si den** in Lugamun, so the words **si** 'this, these' and **den** ' | One common reason is that words are needed in order to express other words that were automatically selected for addition by the algorithm. For example, the concept //today (on the current day)// is expressed as **si den** in Lugamun, so the words **si** 'this, these' and **den** ' | ||
- | The other common reason is that words are needed in order to express some specific content in Lugamun. For example, to translate the fable [[trans:Fen Norte wa Sol|The North Wind and the Sun]], the words **norte** ' | + | The other common reason is that words are needed in order to express some specific content in Lugamun. For example, to translate the fable [[trans:Fen Norte va Sol|The North Wind and the Sun]], the words **norte** ' |
===== Can a word in Lugamun represent several concepts? ===== | ===== Can a word in Lugamun represent several concepts? ===== | ||
Line 50: | Line 50: | ||
Lugamun largely relies on an algorithm that converts the corresponding words from all our [[source languages]] into a form that fits our [[en: grammar: | Lugamun largely relies on an algorithm that converts the corresponding words from all our [[source languages]] into a form that fits our [[en: grammar: | ||
- | XXX Integrate that article into the wiki and update it as needed. Explain that words are now ranked first by the number of related candidates, only then by penalty. | + | XXX Integrate that article into the wiki and update it as needed |
This algorithmic order is only a proposal, the ultimate decision on which word to add is made by a human. Often, but not always, it is indeed the best candidate as determined by the algorithm. The rankings determined by the algorithm and the ultimate choice made are always documented in the [[https:// | This algorithmic order is only a proposal, the ultimate decision on which word to add is made by a human. Often, but not always, it is indeed the best candidate as determined by the algorithm. The rankings determined by the algorithm and the ultimate choice made are always documented in the [[https:// | ||
Line 63: | Line 63: | ||
To show a few examples for one of Lugamun' | To show a few examples for one of Lugamun' | ||
- | * The Arabic male form أَبْيَض (ʾabyaḍ) becomes //abyade//. The final //-e// is needed because Lugamun syllables are not allowed to end in //d//. The female form بَيْضَاء (bayḍāʾ) becomes //baida//. | + | * The Arabic male form أَبْيَض (ʾabyaḍ) becomes //abyade//. The final //-e// is needed because Lugamun' |
* The Chinese word 白 (bái) becomes //bai//. This was the best-ranked candidate and actually chosen for integration into Lugamun. | * The Chinese word 白 (bái) becomes //bai//. This was the best-ranked candidate and actually chosen for integration into Lugamun. | ||
* The English word //white// (IPA: /waɪt/) becomes //wait//. | * The English word //white// (IPA: /waɪt/) becomes //wait//. | ||
- | * French //blanc// becomes //blanke// – again an //-e// is added because Lugamun' | + | * French //blanc// becomes //blanke// – again an //-e// is added because Lugamun' |
* Indonesian //putih// becomes //puti// – syllable-final //-h// is considered less essential for the sound of a word, therefore it is dropped instead of adding //-e// behind it. | * Indonesian //putih// becomes //puti// – syllable-final //-h// is considered less essential for the sound of a word, therefore it is dropped instead of adding //-e// behind it. | ||
* Rusisian бе́лый (bélyj) becomes //beli//. | * Rusisian бе́лый (bélyj) becomes //beli//. | ||
Line 86: | Line 86: | ||
* In Swahili, the infinitive prefix //ku-// is stripped, except in the case of short verbs with just two syllables, since these tend to preserve the prefix in many conjugated forms. Hence //kula// remains unchanged, but //kuona// yields the candidate //ona//. | * In Swahili, the infinitive prefix //ku-// is stripped, except in the case of short verbs with just two syllables, since these tend to preserve the prefix in many conjugated forms. Hence //kula// remains unchanged, but //kuona// yields the candidate //ona//. | ||
* Japanese verbs generally end in //-u//, therefore this final vowel contributes little to making the word recognizable. Hence we strip it from long verbs with three (or more) syllables if the result is allowed by Lugamun' | * Japanese verbs generally end in //-u//, therefore this final vowel contributes little to making the word recognizable. Hence we strip it from long verbs with three (or more) syllables if the result is allowed by Lugamun' | ||
+ | |||
+ | ===== How are distortion penalties calculated? ===== | ||
+ | |||
+ | A distortion penalty is applied when Lugamun' | ||
+ | |||
+ | Generally, certain cases result in "one penalty" | ||
+ | |||
+ | * Changing a single vowel into Lugamun' | ||
+ | * Reducing a diphthong to a single vowel incurs a penalty; e.g. English //name//, pronounced /neɪm/, becomes //nem// with a penalty. However, there are certain cases where the original vowel may typically be perceived as a single vowel rather than a diphthong by the speakers of the source language. In such cases, no penalty applies. One example is English /oʊ/ as in //goat//, which becomes //o// without a penalty. | ||
+ | * Nasalized vowels are converted to the nearest vowel in Lugamun followed by //n//, without a penalty. | ||
+ | * No penalty is applied when converting a consonant to another, as long as both are derived from the same basic letter in IPA. For example, the IPA representations of all rhotic consonants are derived from the letter //r//, therefore they are all converted to //r// without a penalty. | ||
+ | * Otherwise a penalty is applied when a consonantal sound doesn' | ||
+ | * Usually /ŋ/ is changed to //n// without a penalty, following the rules described above. However, in the case of Mandarin, a penalty is applied, since these are essentially the only consonantal endings allowed, making this a pretty severe change. | ||
+ | * A penalty is applied when dropping a consonant or adding a vowel (always //e//) for phonotactic reasons. | ||
+ | |||
+ | To be considered eligible for selection, a word can have at most //one// penalty applied to it. Those with two or more penalties are automatically skipped when sorting candidates. Originally it was just a rule of thumb that such more severely distorted words would be skipped, but meanwhile it has become a inherent part of the candidate generation process. | ||
===== How are compounds and derived words chosen? ===== | ===== How are compounds and derived words chosen? ===== | ||
XXX Explain. | XXX Explain. |
en/background/vocabulary.txt · Last modified: 2022-11-08 12:04 by christian