Lugamun

An easy and fair language for global communication

User Tools

Site Tools


en:background:vocabulary

This is an old revision of the document!


Vocabulary

In which order are words added?

There are essentially two ways in which Lugamun’s vocabulary is growing:

  1. New concepts are chosen for addition by the algorithm.
  2. New concepts are added because they are needed for some specific reason, say to translate a text, to express a sentence, or to expand the grammar.

(1) was used exclusively for the first 100 words of the language. Since then, both (1) and (2) have been used together, sometimes letting the algorithm select concepts for addition and sometimes explicitly specify them in order to fill a particular gap in the dictionary.

What do we mean by concept here? Essentially, a specific unit of meaning that can become a word in Lugamun. Wiktionary, which provides much of the data used to create Lugamun’s vocabulary, distinguishes concepts by grouping the different meanings of a word first by word class – a word can be an adjective, noun, verb etc. – and then listing the specific meanings (often more than one) for each word/class pair and collecting translations separately by meaning.

For example, the English word water is used both as a noun and as a verb. For the noun meaning, Wiktionary gives several definitions – among them “clear liquid H₂O” and “body of water, or specific part of it” – and then lists translations into other languages for each of these meanings. With concept we mean here such a combination of an English word with one specific sense definition, for example water (clear liquid H₂O).

Automatic selection of the next word to add

Lugamun’s algorithm used for selecting words is somewhat state-dependent – words from a source language whose current influence is low get a higher chance of being selected, and vice versa. Therefore the order in which words are selected matters to some degree.

But where to start – which words to add first? Intuitively, it makes sense to start with words that are particularly fundamental and widespread. But how to formalize this?

Since Lugamun’s algorithm relies on translations listed in Wiktionary, an initial idea was to start with concepts that are represented in a high number of languages, and documented in Wiktionary as such. So, prior to proposing a sorted list of candidate words for any given concept as documented, the algorithm first decides which concept should be added next, starting with those concepts that have the highest number of translations into other languages in Wiktionary.

The concept with the highest number of translations is water (clear liquid H₂O), for which Wiktionary lists translations into more than 3000 languages. This was indeed the first word added to Lugamun, though the chosen word was later revised and replaced.

One problem with only following translations counts, however, would be that most of the words with a very high number of translations are nouns. To avoid creating a core vocabulary made up of lots of nouns and not much else, the words in Wiktionary are sorted into three groups:

  1. nouns
  2. adjectives and adverbs
  3. verbs and all other word classes (numerals, pronouns etc.)

The word selection algorithm proceeds in such a way as to ensure that these three groups are equally represented in the dictionary. Since the first word added was a noun, the second word must come from group (2) or (3). Among these, the numeral un ‘one’ has the highest number of translations, so it was added second – now it is the oldest word in Lugamun’s dictionary that has survived without later changes. This word belongs to group (3), hence the third word had to be an adjective or adverb – among these, bai ‘white’ had the highest number of translations and was added next. After that, the algorithm was again free to add a word from any of the groups, since all three were now evenly distributed. While the process continues, the algorithm always ensures that one third of the core vocabulary comes from each of the three groups.

Custom selection of a word to add

The alternative is that a manually selected concept is added to the language because it’s needed for some particular reason. If this is the case, the reason for adding that concept at that time is always documented in the selection log. Entries for such words always start with “Processing entry … as requested, rationale:” – and then the reason for the addition follows.

One common reason is that words are needed in order to express other words that were automatically selected for addition by the algorithm. For example, the concept today (on the current day) is expressed as si den in Lugamun, so the words si ‘this, these’ and den ‘day’ had to be found and added first before it could be expressed and added as well.

The other common reason is that words are needed in order to express some specific content in Lugamun. For example, to translate the fable The North Wind and the Sun, the words norte ‘north, northern’ and many others had first to be found and added to the dictionary.

Can a word in Lugamun represent several concepts?

Yes. XXX Explain how the polysemy check works and how it is used. Also explain that Lugamun’s word formation rules ensure that many words represent several concepts, e.g. all verbs can also be used as nouns.

en/background/vocabulary.1651657992.txt.gz · Last modified: 2022-05-04 11:53 by christian

Except where otherwise noted, content on this wiki is licensed under the following license: CC0 1.0 Universal
CC0 1.0 Universal Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki