en:background:source_languages
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
en:background:source_languages [2022-03-21 13:17] – created christian | en:background:source_languages [2022-11-14 21:59] (current) – w -> v christian | ||
---|---|---|---|
Line 3: | Line 3: | ||
====== Source languages ====== | ====== Source languages ====== | ||
- | (Coming soon.) | + | ===== What are Lugamun' |
+ | |||
+ | Lugamun has ten source language: Arabic, English, French, Hindustani | ||
+ | |||
+ | The exact method for selecting these languages was as follows: | ||
+ | |||
+ | * For the Indo-European languages – by far the most widely spoken language family in the world – we select the biggest language from each subfamily (or branch), provided that that language has at least 100 (or 50, it doesn' | ||
+ | * For each of the four next biggest language families (all of which have more than 300 million speakers in total), we use the most widely spoken language: Mandarin Chinese (Sino-Tibetan family), Swahili (Niger-Congo family), Standard Arabic (Afroasiatic family), and Indonesian (Austronesian family). | ||
+ | * We also add French (the second most widely spoken Italic language), since it is one of the official languages of the United Nations – the only official language not yet in our list. French vies with Bengali in being the most widely spoken language not yet in our list – but it is arguably more international, | ||
+ | * To avoid having more Indo-European than other languages and to increase diversity, we also add the most widely spoken language from a family not yet represented: | ||
+ | |||
+ | This choice was made in July 2021. | ||
+ | |||
+ | Sources: | ||
+ | |||
+ | * Ethnologue: [[https:// | ||
+ | * Reddit: [[https:// | ||
+ | * Wikipedia: [[https:// | ||
+ | * Wikipedia: [[https:// | ||
+ | * Wikipedia articles on language families and individual languages | ||
+ | * Worldometer: | ||
+ | |||
+ | ===== Are different source languages treated differently? | ||
+ | |||
+ | Yes. A distinction is made between the five most widely spoken languages (the "top 5") and the other five source languages (the "next 5"). A candidate word must be from one of the top 5 languages //or// it must have a related candidate in another language to be eligible for selection. | ||
+ | |||
+ | This means that words from the "next 5" (French, Russian, Indonesian/ | ||
+ | |||
+ | On the other hand, candidates from the "top 5" (English, Mandarin Chinese. Hindustani, Arabic, and Spanish) are eligible for selection even if they don't have any related candidate. For example, **tvi** ' | ||
+ | |||
+ | All candidate words are sorted first by the number of related candidates and only then by their total penalty, which means that words that have at least one related candidate will always be preferred over those that have none. Hence the candidates from the "top 5" languages without any related candidates will be placed at the end of the candidate list, after all candidates that do have related candidates. So they can be considered as " | ||
+ | |||
+ | The reason for limiting these " | ||
+ | |||
+ | In cases where no word has any related candidates in other languages, words from the top 5 languages will therefore always be chosen, since only they are eligible. In all other cases, words from any of the source languages may be chosen, based on their overall penalty. | ||
+ | |||
+ | Nevertheless, | ||
+ | |||
+ | (In a few exceptional cases, words from the next 5 languages may be chosen even without the support of a related word. But this is only the case in the rare situation that //none// of the normally eligible candidates is suitable, so that the usual selection criteria need to be relaxed. If this is the case, the exceptional choice is always explained and justified in the [[https:// | ||
+ | |||
+ | ===== Why do you consider Hindustani a single language? ===== | ||
+ | |||
+ | Because most linguists do so. Hindi and Urdu are listed as separate languages in the dictionary because they use different writing systems, but that doesn' |
en/background/source_languages.1647865027.txt.gz · Last modified: 2022-03-21 13:17 by christian