Differences

This shows you the differences between two versions of the page.

--- en:background:vocabulary [2022-06-18 10:56] – [How are words made into candidates?] christian
+++ en:background:vocabulary [2022-11-08 12:04] (current) – [Custom selection of a word to add] Restore original spelling of link christian
@@ Line 40: / Line 40: @@
 One common reason is that words are needed in order to express other words that were automatically selected for addition by the algorithm. For example, the concept //today (on the current day)// is expressed as **si den** in Lugamun, so the words **si** 'this, these' and **den** 'day' had to be found and added first before it could be expressed and added as well.
-The other common reason is that words are needed in order to express some specific content in Lugamun. For example, to translate the fable [[trans:Fen Norte wa Sol|The North Wind and the Sun]], the words **norte** 'north, northern' and many others had first to be found and added to the dictionary.
+The other common reason is that words are needed in order to express some specific content in Lugamun. For example, to translate the fable [[trans:Fen Norte va Sol|The North Wind and the Sun]], the words **norte** 'north, northern' and many others had first to be found and added to the dictionary.
 ===== Can a word in Lugamun represent several concepts? =====
@@ Line 50: / Line 50: @@
 Lugamun largely relies on an algorithm that converts the corresponding words from all our [[source languages]] into a form that fits our [[en: grammar:phonology and spelling|phonology]] and then ranks them into an order from "best fit" to "worst fit". For the details of how that ordering works, see [[https://www.reddit.com/r/auxlangs/comments/mlf8h8/vocabulary_selection_for_a_worldlang/|Vocabulary selection for a worldlang]].
-XXX Integrate that article into the wiki and update it as needed. Explain that words are now ranked first by the number of related candidates, only then by penalty.
+XXX Integrate that article into the wiki and update it as needed (see also file reddit/vocabulary.md). Explain that words are now ranked first by the number of related candidates, only then by penalty.
 This algorithmic order is only a proposal, the ultimate decision on which word to add is made by a human. Often, but not always, it is indeed the best candidate as determined by the algorithm. The rankings determined by the algorithm and the ultimate choice made are always documented in the [[https://gitlab.com/ChristianSi/lugamun/-/blob/main/data/selectionlog.txt?expanded=true&viewer=simple|selection log]]. If the chosen word was **not** the candidate ranked first, then the rationale for that choice is always stated in the selection log. For example, for the word **un** 'one, first', the log states:
@@ Line 63: / Line 63: @@
 To show a few examples for one of Lugamun's oldest words, representing the concept //[[https://en.wiktionary.org/wiki/white/translations#Adjective|white (bright and colourless)]]//.
-  * The Arabic male form أَبْيَض‎ (ʾabyaḍ) becomes //abyade//. The final //-e// is needed because Lugamun syllables are not allowed to end in //d//. The female form بَيْضَاء‎ (bayḍāʾ) becomes //baida//.
+  * The Arabic male form أَبْيَض‎ (ʾabyaḍ) becomes //abyade//. The final //-e// is needed because Lugamun's syllables are not allowed to end in //d//. The female form بَيْضَاء‎ (bayḍāʾ) becomes //baida//.
   * The Chinese word 白 (bái) becomes //bai//. This was the best-ranked candidate and actually chosen for integration into Lugamun.
   * The English word //white// (IPA: /waɪt/) becomes //wait//.
-  * French //blanc// becomes //blanke// – again an //-e// is added because Lugamun's phonology doesn't allow syllable-final clusters.
+  * French //blanc// becomes //blanke// – again an //-e// is added because Lugamun's phonology doesn't allow syllable-final consonant clusters.
   * Indonesian //putih// becomes //puti// – syllable-final //-h// is considered less essential for the sound of a word, therefore it is dropped instead of adding //-e// behind it.
   * Rusisian бе́лый (bélyj) becomes //beli//.
@@ Line 86: / Line 86: @@
   * In Swahili, the infinitive prefix //ku-// is stripped, except in the case of short verbs with just two syllables, since these tend to preserve the prefix in many conjugated forms. Hence //kula// remains unchanged, but //kuona// yields the candidate //ona//.
   * Japanese verbs generally end in //-u//, therefore this final vowel contributes little to making the word recognizable. Hence we strip it from long verbs with three (or more) syllables if the result is allowed by Lugamun's phonology. Thus 腐る (kusaru) becomes //kusar//. In shorter verbs, the final vowel is preserved, since the resulting word would otherwise have only a single syllable, and preserving the second vowel reduces the number of collisions with other words.
+===== How are distortion penalties calculated? =====
+A distortion penalty is applied when Lugamun's pronunciation of a candidate word has to be changed significantly compared to the original word, since an originally used sound doesn't exist in Lugamun or a certain sound combination is not allowed by Lugamun's phonology.
+Generally, certain cases result in "one penalty" or "one penalty point" being applied, while others are not considered sufficiently distorting to deserve a penalty. More specifically,
+  * Changing a single vowel into Lugamun's nearest vowel sound doesn't incur a penalty; e.g. /y/ in French words becomes **u** without a penalty.
+  * Reducing a diphthong to a single vowel incurs a penalty; e.g. English //name//, pronounced /neɪm/, becomes //nem// with a penalty. However, there are certain cases where the original vowel may typically be perceived as a single vowel rather than a diphthong by the speakers of the source language. In such cases, no penalty applies. One example is English /oʊ/ as in //goat//, which becomes //o// without a penalty.
+  * Nasalized vowels are converted to the nearest vowel in Lugamun followed by //n//, without a penalty.
+  * No penalty is applied when converting a consonant to another, as long as both are derived from the same basic letter in IPA. For example, the IPA representations of all rhotic consonants are derived from the letter //r//, therefore they are all converted to //r// without a penalty.
+  * Otherwise a penalty is applied when a consonantal sound doesn't exist in Lugamun and must be converted to the nearest consonant that does exist.
+  * Usually /ŋ/ is changed to //n// without a penalty, following the rules described above. However, in the case of Mandarin, a penalty is applied, since these are essentially the only consonantal endings allowed, making this a pretty severe change.
+  * A penalty is applied when dropping a consonant or adding a vowel (always //e//) for phonotactic reasons.
+To be considered eligible for selection, a word can have at most //one// penalty applied to it. Those with two or more penalties are automatically skipped when sorting candidates. Originally it was just a rule of thumb that such more severely distorted words would be skipped, but meanwhile it has become a inherent part of the candidate generation process.
 ===== How are compounds and derived words chosen? =====
 XXX Explain.