Lugamun

An easy and fair language for global communication

User Tools

Site Tools


en:background:vocabulary

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:background:vocabulary [2022-07-29 15:34] – [How are distortion penalties calculated?] christianen:background:vocabulary [2022-11-08 12:04] (current) – [Custom selection of a word to add] Restore original spelling of link christian
Line 40: Line 40:
 One common reason is that words are needed in order to express other words that were automatically selected for addition by the algorithm. For example, the concept //today (on the current day)// is expressed as **si den** in Lugamun, so the words **si** 'this, these' and **den** 'day' had to be found and added first before it could be expressed and added as well. One common reason is that words are needed in order to express other words that were automatically selected for addition by the algorithm. For example, the concept //today (on the current day)// is expressed as **si den** in Lugamun, so the words **si** 'this, these' and **den** 'day' had to be found and added first before it could be expressed and added as well.
  
-The other common reason is that words are needed in order to express some specific content in Lugamun. For example, to translate the fable [[trans:Fen Norte wa Sol|The North Wind and the Sun]], the words **norte** 'north, northern' and many others had first to be found and added to the dictionary.+The other common reason is that words are needed in order to express some specific content in Lugamun. For example, to translate the fable [[trans:Fen Norte va Sol|The North Wind and the Sun]], the words **norte** 'north, northern' and many others had first to be found and added to the dictionary.
  
 ===== Can a word in Lugamun represent several concepts? ===== ===== Can a word in Lugamun represent several concepts? =====
Line 50: Line 50:
 Lugamun largely relies on an algorithm that converts the corresponding words from all our [[source languages]] into a form that fits our [[en: grammar:phonology and spelling|phonology]] and then ranks them into an order from "best fit" to "worst fit". For the details of how that ordering works, see [[https://www.reddit.com/r/auxlangs/comments/mlf8h8/vocabulary_selection_for_a_worldlang/|Vocabulary selection for a worldlang]]. Lugamun largely relies on an algorithm that converts the corresponding words from all our [[source languages]] into a form that fits our [[en: grammar:phonology and spelling|phonology]] and then ranks them into an order from "best fit" to "worst fit". For the details of how that ordering works, see [[https://www.reddit.com/r/auxlangs/comments/mlf8h8/vocabulary_selection_for_a_worldlang/|Vocabulary selection for a worldlang]].
  
-XXX Integrate that article into the wiki and update it as needed. Explain that words are now ranked first by the number of related candidates, only then by penalty.+XXX Integrate that article into the wiki and update it as needed (see also file reddit/vocabulary.md). Explain that words are now ranked first by the number of related candidates, only then by penalty.
  
 This algorithmic order is only a proposal, the ultimate decision on which word to add is made by a human. Often, but not always, it is indeed the best candidate as determined by the algorithm. The rankings determined by the algorithm and the ultimate choice made are always documented in the [[https://gitlab.com/ChristianSi/lugamun/-/blob/main/data/selectionlog.txt?expanded=true&viewer=simple|selection log]]. If the chosen word was **not** the candidate ranked first, then the rationale for that choice is always stated in the selection log. For example, for the word **un** 'one, first', the log states: This algorithmic order is only a proposal, the ultimate decision on which word to add is made by a human. Often, but not always, it is indeed the best candidate as determined by the algorithm. The rankings determined by the algorithm and the ultimate choice made are always documented in the [[https://gitlab.com/ChristianSi/lugamun/-/blob/main/data/selectionlog.txt?expanded=true&viewer=simple|selection log]]. If the chosen word was **not** the candidate ranked first, then the rationale for that choice is always stated in the selection log. For example, for the word **un** 'one, first', the log states:
Line 99: Line 99:
   * Otherwise a penalty is applied when a consonantal sound doesn't exist in Lugamun and must be converted to the nearest consonant that does exist.   * Otherwise a penalty is applied when a consonantal sound doesn't exist in Lugamun and must be converted to the nearest consonant that does exist.
   * Usually /ŋ/ is changed to //n// without a penalty, following the rules described above. However, in the case of Mandarin, a penalty is applied, since these are essentially the only consonantal endings allowed, making this a pretty severe change.   * Usually /ŋ/ is changed to //n// without a penalty, following the rules described above. However, in the case of Mandarin, a penalty is applied, since these are essentially the only consonantal endings allowed, making this a pretty severe change.
-  * A penalty is applied when dropping a sound or adding a vowel (always //e//) for phonotactic reasons.+  * A penalty is applied when dropping a consonant or adding a vowel (always //e//) for phonotactic reasons.
  
 To be considered eligible for selection, a word can have at most //one// penalty applied to it. Those with two or more penalties are automatically skipped when sorting candidates. Originally it was just a rule of thumb that such more severely distorted words would be skipped, but meanwhile it has become a inherent part of the candidate generation process. To be considered eligible for selection, a word can have at most //one// penalty applied to it. Those with two or more penalties are automatically skipped when sorting candidates. Originally it was just a rule of thumb that such more severely distorted words would be skipped, but meanwhile it has become a inherent part of the candidate generation process.
en/background/vocabulary.1659101641.txt.gz · Last modified: 2022-07-29 15:34 by christian

Except where otherwise noted, content on this wiki is licensed under the following license: CC0 1.0 Universal
CC0 1.0 Universal Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki