Lugamun

An easy and fair language for global communication

User Tools

Site Tools


en:background:source_languages

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:background:source_languages [2022-07-29 15:39] – Why do you consider Hindustani a single language? christianen:background:source_languages [2022-11-14 21:59] (current) – w -> v christian
Line 24: Line 24:
   * Wikipedia articles on language families and individual languages   * Wikipedia articles on language families and individual languages
   * Worldometer: [[https://www.worldometers.info/world-population/|Current World Population]]   * Worldometer: [[https://www.worldometers.info/world-population/|Current World Population]]
 +
 +===== Are different source languages treated differently? =====
 +
 +Yes. A distinction is made between the five most widely spoken languages (the "top 5") and the other five source languages (the "next 5"). A candidate word must be from one of the top 5 languages //or// it must have a related candidate in another language to be eligible for selection.
 +
 +This means that words from the "next 5" (French, Russian, Indonesian/Malay, Japanese, and Swahili) are not considered candidate words unless they have a related word (a true or false cognate) in any of the other nine source languages. For example, the word **to** 'that' is based on Japanese と (to) and related to Russian что (što) – without this related candidate, it would not have been eligible for selection and hence could not have made it into the dictionary.
 +
 +On the other hand, candidates from the "top 5" (English, Mandarin Chinese. Hindustani, Arabic, and Spanish) are eligible for selection even if they don't have any related candidate. For example, **tvi** 'leg' is from Chinese 腿 (tuǐ); there are no related (similar) words in any of the other source languages.
 +
 +All candidate words are sorted first by the number of related candidates and only then by their total penalty, which means that words that have at least one related candidate will always be preferred over those that have none. Hence the candidates from the "top 5" languages without any related candidates will be placed at the end of the candidate list, after all candidates that do have related candidates. So they can be considered as "choices of last resort" that are only considered if no candidate word has (true or false) cognates in other source languages.
 +
 +The reason for limiting these "choices of last resort" to the top 5 languages is that it helps to ensure that all of Lugamun's words will be recognizable to a considerable number of people. Even without related candidates, many people will recognize words from English or Mandarin – much more than would recognize a word from Japanese or Swahili. This is why only the former, but not the latter, become "choices of last resort."
 +
 +In cases where no word has any related candidates in other languages, words from the top 5 languages will therefore always be chosen, since only they are eligible. In all other cases, words from any of the source languages may be chosen, based on their overall penalty.
 +
 +Nevertheless, the fact that the top 5 languages are sometimes privileged gives them a visible advantage in the [[en:statistics#influence distribution|influence statistics]], where the top 5 (plus French) all have a higher influence than the other 4. French, thought not a top 5 language and so never yielding "choices of last resort," manages to retain a very high influence because its words are often related to the words used in other source languages (especially with Spanish and English, but also with Russian and even Indonesian). On the other hand, Mandarin quite rarely shares words with any other source language, and since Lugamun's selection algorithm always prefers shared words, its influence is therefore lower than that of any other top 5 language.
 +
 +(In a few exceptional cases, words from the next 5 languages may be chosen even without the support of a related word. But this is only the case in the rare situation that //none// of the normally eligible candidates is suitable, so that the usual selection criteria need to be relaxed. If this is the case, the exceptional choice is always explained and justified in the [[https://gitlab.com/ChristianSi/lugamun/-/raw/main/data/selectionlog.txt|selection log]]. One example where this is the case is the optional object preposition **o** (from Japanese), which was accepted because none of the top 5 languages has an object marker/preposition that could replace it.)
  
 ===== Why do you consider Hindustani a single language? ===== ===== Why do you consider Hindustani a single language? =====
en/background/source_languages.1659101966.txt.gz · Last modified: 2022-07-29 15:39 by christian

Except where otherwise noted, content on this wiki is licensed under the following license: CC0 1.0 Universal
CC0 1.0 Universal Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki