en:background:vocabulary
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:background:vocabulary [2022-06-18 10:56] – [How are words made into candidates?] christian | en:background:vocabulary [2022-11-08 12:04] (current) – [Custom selection of a word to add] Restore original spelling of link christian | ||
|---|---|---|---|
| Line 40: | Line 40: | ||
| One common reason is that words are needed in order to express other words that were automatically selected for addition by the algorithm. For example, the concept //today (on the current day)// is expressed as **si den** in Lugamun, so the words **si** 'this, these' and **den** ' | One common reason is that words are needed in order to express other words that were automatically selected for addition by the algorithm. For example, the concept //today (on the current day)// is expressed as **si den** in Lugamun, so the words **si** 'this, these' and **den** ' | ||
| - | The other common reason is that words are needed in order to express some specific content in Lugamun. For example, to translate the fable [[trans:Fen Norte wa Sol|The North Wind and the Sun]], the words **norte** ' | + | The other common reason is that words are needed in order to express some specific content in Lugamun. For example, to translate the fable [[trans:Fen Norte va Sol|The North Wind and the Sun]], the words **norte** ' |
| ===== Can a word in Lugamun represent several concepts? ===== | ===== Can a word in Lugamun represent several concepts? ===== | ||
| Line 50: | Line 50: | ||
| Lugamun largely relies on an algorithm that converts the corresponding words from all our [[source languages]] into a form that fits our [[en: grammar: | Lugamun largely relies on an algorithm that converts the corresponding words from all our [[source languages]] into a form that fits our [[en: grammar: | ||
| - | XXX Integrate that article into the wiki and update it as needed. Explain that words are now ranked first by the number of related candidates, only then by penalty. | + | XXX Integrate that article into the wiki and update it as needed |
| This algorithmic order is only a proposal, the ultimate decision on which word to add is made by a human. Often, but not always, it is indeed the best candidate as determined by the algorithm. The rankings determined by the algorithm and the ultimate choice made are always documented in the [[https:// | This algorithmic order is only a proposal, the ultimate decision on which word to add is made by a human. Often, but not always, it is indeed the best candidate as determined by the algorithm. The rankings determined by the algorithm and the ultimate choice made are always documented in the [[https:// | ||
| Line 63: | Line 63: | ||
| To show a few examples for one of Lugamun' | To show a few examples for one of Lugamun' | ||
| - | * The Arabic male form أَبْيَض (ʾabyaḍ) becomes //abyade//. The final //-e// is needed because Lugamun syllables are not allowed to end in //d//. The female form بَيْضَاء (bayḍāʾ) becomes //baida//. | + | * The Arabic male form أَبْيَض (ʾabyaḍ) becomes //abyade//. The final //-e// is needed because Lugamun' |
| * The Chinese word 白 (bái) becomes //bai//. This was the best-ranked candidate and actually chosen for integration into Lugamun. | * The Chinese word 白 (bái) becomes //bai//. This was the best-ranked candidate and actually chosen for integration into Lugamun. | ||
| * The English word //white// (IPA: /waɪt/) becomes //wait//. | * The English word //white// (IPA: /waɪt/) becomes //wait//. | ||
| - | * French //blanc// becomes //blanke// – again an //-e// is added because Lugamun' | + | * French //blanc// becomes //blanke// – again an //-e// is added because Lugamun' |
| * Indonesian //putih// becomes //puti// – syllable-final //-h// is considered less essential for the sound of a word, therefore it is dropped instead of adding //-e// behind it. | * Indonesian //putih// becomes //puti// – syllable-final //-h// is considered less essential for the sound of a word, therefore it is dropped instead of adding //-e// behind it. | ||
| * Rusisian бе́лый (bélyj) becomes //beli//. | * Rusisian бе́лый (bélyj) becomes //beli//. | ||
| Line 86: | Line 86: | ||
| * In Swahili, the infinitive prefix //ku-// is stripped, except in the case of short verbs with just two syllables, since these tend to preserve the prefix in many conjugated forms. Hence //kula// remains unchanged, but //kuona// yields the candidate //ona//. | * In Swahili, the infinitive prefix //ku-// is stripped, except in the case of short verbs with just two syllables, since these tend to preserve the prefix in many conjugated forms. Hence //kula// remains unchanged, but //kuona// yields the candidate //ona//. | ||
| * Japanese verbs generally end in //-u//, therefore this final vowel contributes little to making the word recognizable. Hence we strip it from long verbs with three (or more) syllables if the result is allowed by Lugamun' | * Japanese verbs generally end in //-u//, therefore this final vowel contributes little to making the word recognizable. Hence we strip it from long verbs with three (or more) syllables if the result is allowed by Lugamun' | ||
| + | |||
| + | ===== How are distortion penalties calculated? ===== | ||
| + | |||
| + | A distortion penalty is applied when Lugamun' | ||
| + | |||
| + | Generally, certain cases result in "one penalty" | ||
| + | |||
| + | * Changing a single vowel into Lugamun' | ||
| + | * Reducing a diphthong to a single vowel incurs a penalty; e.g. English //name//, pronounced /neɪm/, becomes //nem// with a penalty. However, there are certain cases where the original vowel may typically be perceived as a single vowel rather than a diphthong by the speakers of the source language. In such cases, no penalty applies. One example is English /oʊ/ as in //goat//, which becomes //o// without a penalty. | ||
| + | * Nasalized vowels are converted to the nearest vowel in Lugamun followed by //n//, without a penalty. | ||
| + | * No penalty is applied when converting a consonant to another, as long as both are derived from the same basic letter in IPA. For example, the IPA representations of all rhotic consonants are derived from the letter //r//, therefore they are all converted to //r// without a penalty. | ||
| + | * Otherwise a penalty is applied when a consonantal sound doesn' | ||
| + | * Usually /ŋ/ is changed to //n// without a penalty, following the rules described above. However, in the case of Mandarin, a penalty is applied, since these are essentially the only consonantal endings allowed, making this a pretty severe change. | ||
| + | * A penalty is applied when dropping a consonant or adding a vowel (always //e//) for phonotactic reasons. | ||
| + | |||
| + | To be considered eligible for selection, a word can have at most //one// penalty applied to it. Those with two or more penalties are automatically skipped when sorting candidates. Originally it was just a rule of thumb that such more severely distorted words would be skipped, but meanwhile it has become a inherent part of the candidate generation process. | ||
| ===== How are compounds and derived words chosen? ===== | ===== How are compounds and derived words chosen? ===== | ||
| XXX Explain. | XXX Explain. | ||
en/background/vocabulary.1655542595.txt.gz · Last modified: by christian
