How to build a language

What do Klingon, Sindarin and Dothraki have in common?  They are all conlangs – languages which have been constructed specifically for use in books, TV and films.  The task of building a language is more difficult than it might first appear.  It is not just a case of coining nonsense words and stringing them together to form sentences.  Conlangers are often professional linguists who have studied a number of foreign languages.  Their aim is to create conlangs which match the complexity and dynamism of natural languages.

Read more: How to build a language

In order to create a convincing conlang, we must consider what the ingredients of natural languages are.  Clearly all languages are different on the level of vocabulary, but they do share fundamental properties in other areas.  In this article, I give a list of decisions that you would have to make about your conlang.  By reading this article, you will also gain (i) an appreciation of the complexity of natural language, as well as (ii) an accessible introduction to the field of linguistics.

Step 1: Phoneme inventory

Let’s assume that we are dealing with a spoken language (although sign languages are equally valid and complicated).  Consider the physical properties of the human mouth cavity.  As these properties are the same for all of us, there can only be a finite number of possible sounds which can be produced in any of the world’s languages.  Each language selects a sub-set of phonemes (i.e. individual speech sounds) – and this set is known as its phoneme inventory.  There are languages with as few as 11 distinct phonemes, as well as others with as many as 80.  English is somewhere in the middle with 42. 

The phonemes of any given language can be represented on a grid and described with reference to two criteria: (i) place of articulation, and (ii) manner of articulation.  Place describes the location within the mouth cavity where the sound is produced (e.g. at the intersection of the lips, or on the soft palate).  Manner refers to the way in which the sound is produced (i.e. by obstructing airflow, wholly or partly).  Your first task, when creating a conlang, will be to compile its phoneme inventory and to describe each phoneme in terms of place and manner of articulation.  

Example language: Mandarin Chinese

Mandarin Chinese has a relatively small set of phonemes.  This means that many words in Chinese are homophones (i.e. they sound the same).  You can imagine how this situation, if not remedied, would result in comprehension difficulties.  Chinese solves this issue by adopting tones.  Each possible syllable in Chinese can (theoretically) be pronounced in five tones (or pitches).  In this way, Chinese gets by without many phonemes – as the tones serve to augment the phoneme inventory.

Step 2: Syllables

We make a fundamental distinction between consonants (where airflow is interrupted) and vowels (where airflow is uninterrupted).  Languages put consonants and vowels together to form syllables – the unit of language which carries stress.  A syllable has three parts: onset, rime and coda.  We represent possible syllable structures in this way – C(C)V(C).  Here, the onset must be a consonant; the rime must contain at least a vowel and codas are optional.  When creating a conlang, you must decide what your syllable structure is.  You must also decide which syllables will carry stress (i.e. emphasis).   

Linguists make a distinction between stress-timed and syllable-timed languages.  In syllable-timed languages (like French), each syllable takes up roughly the same length of time.  In stress-timed languages (like English), the time spent on each syllable depends on whether it is stressed or unstressed.  Again, this is a choice you must make when designing a conlang, as it will greatly influence its rhythm and overall sound.

Example language: English

English is a stress-timed language in the sense that syllables can be lengthened or shortened depending on stress.  In casual speech, many syllables are swallowed (i.e. they pretty much disappear).  Moreover, the stress-carrying syllable varies as a function of word-class.  Say the phrases ‘to invite’ and ‘an invite’.  Notice how the stress shifts on the word ‘invite’ depending on whether it is being used as a verb (as in the first example) or as a noun (as in the second example).

Step 3: Morphological classification

A morpheme is the smallest chunk of language which carries meaning.  In other words, we cannot break a morpheme down into constituent parts.  Some words (e.g. happy or sad) comprise a single morpheme.  Others (e.g. happily or sadly) comprise more than one.  Moreover, not all morphemes correspond to our everyday definition of word.  Morphemes such as ‘-ing’ and ‘anti-‘ are clearly not words and cannot exist on their own.  Languages manipulate their morphemes in a huge variety of ways and linguists classify them accordingly.  You must decide how the morphemes in your conlang behave. 

For example, languages are either analytic or synthetic.  Analytic (or isolating) languages (e.g. Malay) use stand-alone morphemes, while synthetic languages (e.g. Latin) put morphemes together, based on requirements of the grammar.  In other words, there is no verbal inflection or noun declension in languages like Malay, although there is a lot of both in languages like Latin.  Most synthetic languages attach morphemes to the front or to the end of their words.  In addition, synthetic languages are either fusional or agglutinative.  Morphemes in fusional languages can fulfill different grammatical functions simultaneously, whereas those in agglutinative languages cannot.

Example language: Arabic

Arabic (in common with other Simitic languages like Hebrew) has an unusual kind of synthetic morphology known as non-concatenative morphology.  Content words in Arabic are built from tri-consonantal roots (i.e. a kind of skeleton consisting of three consonants).  Vowels or consonants are then added in between these original consonants in order to form related words.  For example, the root k-t-b yields kitab (= book), maktab (= office) and maktabat (= library).  Note that these words all belong to the same semantic category (i.e. their meanings are related).

Step 4: Word order

Each language has a canonical (i.e. default) word-order.  In English, this is subject-verb-object (SVO) – where the subject performs an action and the object receives the effect of the action.  This basic word-order may change when it comes to forming questions, or when turning an active sentence into its passive equivalent.  There is often a link between syntax (the study of sentences) and morphology (the study of words) – so a change in word-order may trigger a change in morpheme.

All languages exist along a continuum depending on how fixed (= strict) or free (= lenient) their word-order rules are.  Here we see the relationship between syntax and morphology.  For example, word-order tends to be more important for analytic languages than it is for synthetic languages.  The reason is that analytic languages lack the noun declensions and verbal inflections which would offer additional clues as to meaning.  In a highly-synthetic language, free word-order is possible: the morphology alone helps the listener to identify categories such as subject, verb and object.

Example language: German

Many German verbs are formed by joining prefixes to more basic verbs.  For example, leben (= to live) becomes überleben (= to survive).  Many (but not all) of these prefixes can detach and move independently of their verbs.  Consider the example sentences below.  In single-verb sentences, the prefix moves to the end of the sentence; in two-verb sentences, the entire verb moves to the end.  In German, subordinating conjunctions (e.g. because, although) also send the verb to the end of a sentence. 

Ich rufe dich an (= I call you)

Darf ich dich an+rufen (= May I call you?)

Step 5: Writing system

We are so familiar with our Latin alphabet that we may fall into the trap of thinking all languages have alphabets.  In an alphabet, each phoneme is represented by a separate symbol.  But there are several alternatives to this system.  In an abjad (as used in Hebrew), only each consonant has its own symbol (vowels are implied, or added as diacritics).  In an abugida (as used in various Indian languages), each possible consonant-vowel pair is given its own distinct symbol.  In a syllabary (as used in Japanese Kana), each syllable has its own symbol.  In a logographic system (as used in Chinese), each ideogram conveys a separate concept.  So, in Chinese, ideograms which look completely different may share the same pronunciation.

Example language: Korean

You may think Korean script (Hangul) looks like Chinese Hanzi (a logography) or Japanese Kana (a syllabary).  In fact, Korean uses an alphabet like English does.  The difference is that Korean has what is known as a featural alphabet.  Consider the English letters ‘f’ and ‘v’.  The phonemes represented by these letters are very similar (differing only with respect to voicing).  And yet there is no physical similarity between the symbols.  Contrast this with Hangul, where the physical appearance of each letter serves as a guide to its pronunciation. 

Conclusion

The linguists who create conlangs (artificial languages) for TV shows and films perform a task which is both highly-skilled and overlooked.  A convincing conlang must boast the complexity and flexibility of a natural language – and this is nothing short of mindboggling.  In this article, I have attempted to show you a glimpse of what happens beneath the hood of a natural language.  My aim was partly to raise awareness of the work of conlangers, but also to introduce you to the fascinating subject of linguistics.

Leave a Comment

Your email address will not be published. Required fields are marked *