en:background:vocabulary
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:background:vocabulary [2022-04-20 12:52] – [Automatic selection of the next word to add] christian | en:background:vocabulary [2022-11-08 12:04] (current) – [Custom selection of a word to add] Restore original spelling of link christian | ||
---|---|---|---|
Line 36: | Line 36: | ||
==== Custom selection of a word to add ==== | ==== Custom selection of a word to add ==== | ||
- | The alternative is that a manually selected | + | The alternative is that a manually selected |
- | One common reason is that words are required | + | One common reason is that words are needed in order to express other words that were automatically selected |
- | The other common reason is that words are required | + | The other common reason is that words are needed in order to express some specific content in Lugamun. For example, to translate the fable [[trans:Fen Norte va Sol|The North Wind and the Sun]], the words **norte** ' |
===== Can a word in Lugamun represent several concepts? ===== | ===== Can a word in Lugamun represent several concepts? ===== | ||
- | Yes. XXX Explain | + | Yes. XXX Explain |
+ | |||
+ | ===== How does the word selection algorithm work? ===== | ||
+ | |||
+ | Lugamun largely relies on an algorithm that converts the corresponding words from all our [[source languages]] into a form that fits our [[en: grammar: | ||
+ | |||
+ | XXX Integrate that article into the wiki and update it as needed (see also file reddit/ | ||
+ | |||
+ | This algorithmic order is only a proposal, the ultimate decision on which word to add is made by a human. Often, but not always, it is indeed the best candidate as determined by the algorithm. The rankings determined by the algorithm and the ultimate choice made are always documented in the [[https:// | ||
+ | |||
+ | > Candidate #3 " | ||
+ | > Selection rationale: We prefer this form over ' | ||
+ | |||
+ | ===== How are words made into candidates? ===== | ||
+ | |||
+ | Before the words from the different source languages can be compared with each other and ranked, they are first converted into " | ||
+ | |||
+ | To show a few examples for one of Lugamun' | ||
+ | |||
+ | * The Arabic male form أَبْيَض (ʾabyaḍ) becomes //abyade//. The final //-e// is needed because Lugamun' | ||
+ | * The Chinese word 白 (bái) becomes //bai//. This was the best-ranked candidate and actually chosen for integration into Lugamun. | ||
+ | * The English word //white// (IPA: /waɪt/) becomes //wait//. | ||
+ | * French //blanc// becomes //blanke// – again an //-e// is added because Lugamun' | ||
+ | * Indonesian //putih// becomes //puti// – syllable-final //-h// is considered less essential for the sound of a word, therefore it is dropped instead of adding //-e// behind it. | ||
+ | * Rusisian бе́лый (bélyj) becomes //beli//. | ||
+ | * The candidates from the other source languages (Spanish, Hindi, Japanese, Swahili) are converted into candidates in a similar manner. | ||
+ | |||
+ | The candidate form of any word is automatically created by Lugamun' | ||
+ | |||
+ | Generally speaking, the idea is to create candidate words that are close to the original pronunciation, | ||
+ | |||
+ | * Generally nasalized vowels are handled by adding //-n// after the vowel. But in the case of French, we use either //-n// or //-m//, following the written form. In the case of /ɑ̃/, we also use //e// instead of //a// if that corresponds to the original spelling. Hence //temps// becomes //tem// rather than //tan//. This results in a greater similarity to the written form and also to the corresponding forms in other languages, which often are closer to the written form (such as Spanish //tiempo//, Portuguese //tempo//, English //time// etc.) | ||
+ | * Though Japanese has strictly speaking no diphthongs, we convert the vowel combinations //ai, au, oi// to our diphthongs. Phonetically, | ||
+ | * For Mandarin, the plosives (stops) are adapted following their pinyin spellings, hence pinyin //b, d, g// are preserved as such instead of changing them to the voiceless equivalents //p, t, k//. In this way, Mandarin' | ||
+ | |||
+ | When generating candidates for verbs, we use the infinitive as starting point, if one exists, otherwise the customary " | ||
+ | |||
+ | * Arabic verbs in their dictionary form usually end in //-a//. Especially in long verbs with three (or more) syllables, this vowel often changes in conjugated forms, therefore it contributes little to making the word recognizable. Hence we strip it in such cases if the result is allowed by Lugamun' | ||
+ | * Final //-r// is stripped from Spanish and French verbs: Spanish //comer// yields the candidate //kome//; French //écrire// yields //ekri//. In the case of French verbs ending in //-er//, the //e// is stripped as well, if Lugamun' | ||
+ | * The prefix //meng-// (and its variants) are stripped from Indonesian verbs. | ||
+ | * In Swahili, the infinitive prefix //ku-// is stripped, except in the case of short verbs with just two syllables, since these tend to preserve the prefix in many conjugated forms. Hence //kula// remains unchanged, but //kuona// yields the candidate //ona//. | ||
+ | * Japanese verbs generally end in //-u//, therefore this final vowel contributes little to making the word recognizable. Hence we strip it from long verbs with three (or more) syllables if the result is allowed by Lugamun' | ||
+ | |||
+ | ===== How are distortion penalties calculated? ===== | ||
+ | |||
+ | A distortion penalty is applied when Lugamun' | ||
+ | |||
+ | Generally, certain cases result in "one penalty" | ||
+ | |||
+ | * Changing a single vowel into Lugamun' | ||
+ | * Reducing a diphthong to a single vowel incurs a penalty; e.g. English //name//, pronounced /neɪm/, becomes //nem// with a penalty. However, there are certain cases where the original vowel may typically be perceived as a single vowel rather than a diphthong by the speakers of the source language. In such cases, no penalty applies. One example is English /oʊ/ as in //goat//, which becomes //o// without a penalty. | ||
+ | * Nasalized vowels are converted to the nearest vowel in Lugamun followed by //n//, without a penalty. | ||
+ | * No penalty is applied when converting a consonant to another, as long as both are derived from the same basic letter in IPA. For example, the IPA representations of all rhotic consonants are derived from the letter //r//, therefore they are all converted to //r// without a penalty. | ||
+ | * Otherwise a penalty is applied when a consonantal sound doesn' | ||
+ | * Usually /ŋ/ is changed to //n// without a penalty, following the rules described above. However, in the case of Mandarin, a penalty is applied, since these are essentially the only consonantal endings allowed, making this a pretty severe change. | ||
+ | * A penalty is applied when dropping a consonant or adding a vowel (always //e//) for phonotactic reasons. | ||
+ | |||
+ | To be considered eligible for selection, a word can have at most //one// penalty applied to it. Those with two or more penalties are automatically skipped when sorting candidates. Originally it was just a rule of thumb that such more severely distorted words would be skipped, but meanwhile it has become a inherent part of the candidate generation process. | ||
+ | |||
+ | ===== How are compounds and derived words chosen? ===== | ||
+ | |||
+ | XXX Explain. |
en/background/vocabulary.txt · Last modified: 2022-11-08 12:04 by christian