Table of Contents
Dictionary statistics
This file is automatically updated once a day, provided the dictionary has been modified since the last update. DON’T try to modify it manually, since all your changes will soon be overwritten! Last update: 2023-03-16.
Influence distribution
- Spanish: 14.5%
- English: 13.5%
- French: 13.4%
- Hindi/Urdu: 10.6%
- Arabic: 9.7%
- Russian: 7.7%
- Indonesian/Malay: 7.6%
- Swahili: 7.2%
- Mandarin Chinese: 6.9%
- Japanese: 6.9%
- Others: 2.0%
986 of 1605 entries directly derived from source languages.
Related vocabulary percentages
- Spanish: 53.9%
- French: 51.3%
- English: 50.1%
- Russian: 32.7%
- Hindi/Urdu: 31.5%
- Indonesian/Malay: 30.5%
- Arabic: 29.7%
- Swahili: 26.8%
- Japanese: 24.9%
- Mandarin Chinese: 16.8%
For an explanation of how influences and related vocabulary are measured and why the latter can never be lower than the former, see the section “Influence distribution and similarity ratios” in this article.
Most common source language combinations
Combinations involving three languages:
- English/Spanish/French: 37.0%
- Spanish/French/Russian: 23.0%
- English/Spanish/Russian: 22.0%
- English/French/Russian: 21.7%
- Spanish/French/Indonesian: 16.7%
- English/French/Indonesian: 16.2%
- English/Spanish/Indonesian: 16.0%
- Spanish/Indonesian/Russian: 14.5%
- English/Indonesian/Russian: 14.4%
- French/Indonesian/Russian: 14.2%
- Spanish/French/Hindi: 13.2%
- English/French/Hindi: 12.3%
- English/French/Japanese: 12.1%
- English/Spanish/Japanese: 12.0%
- English/Spanish/Hindi: 11.9%
- …
- Spanish/French/Swahili: 11.9%
- …
- Arabic/Spanish/French: 11.2%
- …
- Mandarin/Spanish/French: 5.5%
Combinations involving two languages:
- Spanish/French: 45.0%
- English/French: 39.8%
- English/Spanish: 39.6%
- Spanish/Russian: 25.9%
- English/Russian: 24.6%
- French/Russian: 24.5%
- Spanish/Indonesian: 19.0%
- French/Indonesian: 18.7%
- English/Indonesian: 18.3%
- Arabic/Swahili: 17.3%
- Indonesian/Russian: 15.8%
- English/Japanese: 15.6%
- Spanish/Hindi: 15.5%
- Arabic/Hindi: 15.1%
- French/Hindi: 15.0%
- …
- Mandarin/Japanese: 7.8%
Percentages specify how many of the entries directly derived from source languages have been derived from this combination (and possibly other source languages).
Word class distribution
- 806 Nouns
- 351 Adjectives
- 313 Verbs
- 78 Proper nouns
- 46 Adverbs
- 39 Numbers
- 35 Prepositions
- 25 Pronouns
- 23 Suffixes
- 19 Quantifiers
- 17 Conjunctions
- 17 Interjections
- 16 Prefixes
- 12 Particles
- 9 Auxiliary verbs
- 9 Selectors
- 8 Phrases
- 8 Prepositional phrases
1623 lemmas in total, 18 of which are synonyms of another lemma. 223 entries belong to more than one class.
Sound frequency distribution
- a: 14.10%
- i: 10.92%
- e: 9.17%
- n: 7.87%
- s: 7.20%
- r: 7.11%
- t: 5.25%
- o: 5.17%
- u: 4.45%
- k: 4.11%
- m: 4.03%
- l: 3.90%
- d: 2.91%
- b: 2.37%
- p: 1.88%
- v: 1.53%
- g: 1.47%
- j: 1.27%
- f: 1.13%
- h: 1.07%
- y: 0.91%
- x: 0.69%
- c: 0.63%
- ai: 0.40%
- au: 0.37%
- oi: 0.07%
Multi-word expressions are ignored.
Trivia
Average length of root words: 4.81 letters.
Longest roots (10 letters): demokrasia, horisontal, koresponde, papyamentu, Sakartvelo, Slovenesko.
Roots with maximum number of relations in the source languages (10 or more): -istan, afegan, amen, Amerika, Andora, arabi, Argentina, Asturie, Balgaria, Belarus, Belgie, Bolivia, Brasil, Buda, Cile, Danmarke, Ekvador, Galisia, han, hindi, Indonesia, islam, Israel, Italya, jaket, Kanada, katalan, Katalunya, latin, Mexiko, Mormon, muslim, papyamentu, Paragvai, Serbia, Turkie, urdu.
(A word can have more relations than we have source languages if it’s derived from a language that is not normally among our sources, such as Italya ‘Italy’, derived from Italian.)