Lost languages: how to decipher undeciphered scripts

This article is from issue

Subscribe now for full access and no adverts

[Michael] Ventris was able to discover among the bewildering variety of the mysterious signs, patterns and regularities which betrayed the underlying structure. It is this quality, the power of seeing order in apparent confusion, that has marked the work of all great men.

John Chadwick,
The Decipherment of Linear B, 1958

In ordinary conversation, to decipher someone’s ‘indecipherable’ handwriting means to make sense of the meaning; it does not imply that one can read every single word. In its more technical sense, as applied to ancient scripts, ‘deciphered’ means different things to different scholars. At one extreme, everyone agrees that the Egyptian hieroglyphs have been deciphered – because every trained Egyptologist would make the same sense of virtually every word of a given hieroglyphic inscription (though their individual translations would still differ, as do all independent translations of the same work from one language into another). At the other extreme, almost everyone agrees that the scripts of the Indus Valley civilisation and Easter Island (Rongorongo) are undeciphered – because no scholar can make sense of their inscriptions to the satisfaction of the majority of other specialists. Between these extremes lies a vast spectrum of opinion. In the case of the Maya glyphs, for example, most scholars agree that a high proportion, as much as 85 per cent, of the inscriptions can be meaningfully read, and yet there remain large numbers of individual glyphs that are contentious or obscure.

above The site of Knossos. BELOW Linear B tablet from Knossos in the Ashmolean Museum recording bull‘s head rhyta and golden Vapheio cups. — The site of Knossos. Photo: I Galanakis

In other words, no shibboleth exists by which we judge a script to be ‘deciphered’ or ‘undeciphered’; we should instead speak of degrees of decipherment. The most useful criterion is that a proposed decipherment can generate consistent readings from new samples of the script, preferably produced by persons other than the original decipherer, so as to avoid bias. In this sense, the Egyptian hieroglyphs were deciphered in the 1820s by Jean-François Champollion and others; Babylonian cuneiform in the 1850s by Henry Creswicke Rawlinson and others; Linear B in 1952–1953 by Michael Ventris and John Chadwick; the Maya glyphs in the 1950s and after by Yuri Knorosov and others; and the Hittite (Luvian) hieroglyphs of Anatolia during the 20th century by a series of scholars – to name only the most important of the generally accepted decipherments.

Linear B tablet from Knossos in the Ashmolean Museum recording bull‘s head rhyta and golden Vapheio cups.

This leaves a number of significant undeciphered languages/scripts. They fall into three basic categories: an unknown script writing a known language; a known script writing an unknown language; and an unknown script writing an unknown language.

The Maya glyphs were, until the 1950s, an example of the first category, since the Maya languages are still spoken in Central America. The Zapotec script may be, too, if it writes a language related to the modern Zapotec languages of Mexico. Even Rongorongo may belong to this first category, since it almost certainly writes a Polynesian language related to the Tahitian-influenced Polynesian language spoken today on Easter Island. Etruscan writing exemplifies the second category, since the Etruscan script is basically the same as the Greek alphabet, while the Etruscan language is not related to Indo-European and other languages. The Indus script belongs to the third category, since the signs on the seals and other inscriptions bear no resemblance to any other script, and the language of the Indus Valley civilisation does not appear to have survived – unless, as many scholars have speculated, it is an ancestor of the Dravidian languages such as Tamil and Brahui, spoken predominantly in south India but also in parts of Pakistan.

ABOVE Table listing the status of some undeciphered ancient writing systems. (An asterisk indicates cases in which there is no scholarly consensus on the nature of the script and/or its underlying language.) BELOW Michael Ventris (1922 1956) announced the decipherment of Linear B in 1952. Trained as an architect, he was also a phenomenal linguist who became fascinated by Linear B as a schoolboy. The photograph shows the young man at his Linear B drawing board in mid-1953, just after his decipherment was confirmed by the discovery of a new — Michael Ventris (1922 1956) announced the decipherment of Linear B in 1952. Trained as an architect, he was also a phenomenal linguist who became fascinated by Linear B as a schoolboy. The photograph shows the young man at his Linear B drawing board in mid-1953, just after his decipherment was confirmed by the discovery of a new tablet in Greece. Photo: Tom Blau/Camera Press, London

First steps to code cracking

Michael Ventris, perhaps the greatest of the decipherers, summarised the decipherment process masterfully, as follows:

Each operation needs to be planned in three phases: an exhaustive analysis of the signs, words and contexts in all the available inscriptions, designed to extract every possible clue as to the spelling system, meaning and language structure; an experimental substitution of phonetic values to give possible words and inflections in a known or postulated language; and a decisive check, preferably with the aid of virgin material, to ensure that the apparent results are not due to fantasy, coincidence or circular reasoning.

Although successful decipherments do not simply follow this sequence, they always involve three processes: analysis, substitution and check.

What are the minimum conditions for a high degree of decipherment to be feasible? According to Ventris again:

Prerequisites are that the material should be large enough for the analysis to yield usable results, and (in the case of an unreadable script without bilinguals or identifiable proper names) that the concealed language should be related to one which we already know.

Lack of material means that without further discoveries there is, at present, no prospect of deciphering the Olmec and Isthmian scripts from Mexico, the Phaistos Disc from Crete and the Byblos ‘pseudo-hieroglyphic’ script from Lebanon, among those mentioned in the table of undeciphered scripts. Linear B was decipherable – despite lacking a ‘Rosetta Stone’ bilingual with identifiable proper names – because the concealed language was discovered (by Ventris) to be archaic Greek.

left Jean-François Champollion (1790 1832) who announced the decipherment of Egyptian hieroglyphic in 1823. This portrait painting of c.1823, attributed to Mme de Rumilly, shows him holding his initial ‘Tableau des Signes Phonétiques’, published in 1822. This gave hieroglyphic and demotic equivalents for the letters of the Greek alphabet. — Jean-François Champollion (1790 1832) who announced the decipherment of Egyptian hieroglyphic in 1823. This portrait painting of c.1823, attributed to Mme de Rumilly, shows him holding his initial ‘Tableau des Signes Phonétiques’, published in 1822. This gave hieroglyphic and demotic equivalents for the letters of the Greek alphabet. Photos: Musée Champollion.

Two elements of an unknown script usually yield up their secrets without too much effort. The first is the direction of the writing: from left to right or from right to left, from top to bottom or from bottom to top. Clues to the direction include the position of unfilled space in the text, the way in which characters sometimes crowd (on the left or on the right), and the direction in which pictographic signs face (as in Egyptian hieroglyphic). However, there are certain scripts that are written boustrophedon, a term from the Greek for ‘as the ox turns’, when ploughing: in other words, first from left to right (say), then from right to left, then again from left to right, and so on. There are even reverse-boustrophedon scripts, in which the writer turned the original document through 180 degrees come the end of each line; Rongorongo is an example of this.

The second element is the system of counting. Numerals frequently stand out graphically from the rest of the text, especially if they are used for calculations (which helpfully suggests that the non-numerical signs next to the numerals are likely to stand for counted objects or people). Easily visible numerals are a particular feature of the Linear B and Maya scripts and, among the undeciphered scripts, of the proto-Elamite script. A numerical system is obvious in the Etruscan script, Linear A and the Zapotec and Isthmian scripts, and fairly clear in the Indus script; but it seems to be largely absent from the Meroitic script and Rongorongo, and not at all evident in the Phaistos Disc. Of course, in working out a system of ancient numerals, decipherers have to be aware that it may differ radically from our decimal system. The Babylonians, for instance, used a sexagesimal system, from which we inherit 60 seconds in a minute and 360 degrees in a circle, and no zero; the Maya had a vigesimal system, increasing in multiples of 20, and a shell symbol for zero.

right The Rosetta Stone, now held in London‘s British Museum. — The Rosetta Stone, now held in London‘s British Museum. Photo: D Vincon/Conseil general de l’Isere, British Museum

Analysing the sign system

More challenging than the direction of writing or numerals is the analysis of the sign system as a whole. Suppose you were unfamiliar with the Roman alphabet. If you were to take a typical chapter of an ordinary novel printed in English, it would be a fairly straightforward matter, by careful study and comparison of the thousands of characters in the text, to work out that they could be classified into a set of signs – 26 lower-case ones and the same number of upper-case signs (though you might wonder whether letters with ascenders like b, d, f, h, k should be classified with the lower-case or with the upper-case letters), plus sundry other signs such as punctuation marks, numerals and logograms like @ and £. Now imagine that the same text is handwritten. Immediately, the task of isolating the signs is far harder, because the letters are joined up and different writers write the same letter in different ways, also differently from its printed equivalent, and not always distinctly.

The same sign written in a variant form is known in epigraphy as an allograph. A key challenge for the epigrapher/decipherer – who naturally cannot be sure in advance that different-looking signs are in fact allographs of only one sign – is how to distinguish signs that are genuinely different, such as ‘l’ and ‘I’, from signs that are probably allographs, such as printed ‘a’ and handwritten ‘a’ (not to mention ‘A’). Judging by deciphered scripts, an undeciphered script may easily contain three or four allographs of the same basic sign.

Above left Maya glyphs at Copán in Honduras, as drawn by Frederick Catherwood in 1841. His colleague, John Lloyd Stephens, guessed that the glyphs formed a writing system like the Egyptian hieroglyphs deciphered in the 1820s, but his insight was effectively ignored until the 1950s. Above A close-up of Maya glyphs. BELOW Still-undeciphered: a Rongorongo tablet from Easter Island (written in reverse boustrophedon style). — Maya glyphs at Copán in Honduras, as drawn by Frederick Catherwood in 1841. His colleague, John Lloyd Stephens, guessed that the glyphs formed a writing system like the Egyptian hieroglyphs deciphered in the 1820s, but his insight was effectively ignored until the 1950s. Above A close-up of Maya glyphs. Image: J Kenoyer/Dept. of Archaeology and Museums, Govt. of Pakistan, www.harappa.com

Unless epigraphers can distinguish allographs with a fair degree of confidence, generally by comparing their contexts in many very similar inscriptions, they cannot classify the phonetic signs in a script (its signary) correctly, neither can they establish the total number of signs in the signary. Classification is self-evidently crucial to decipherment, but the number of signs is almost as important. Alphabets like English and consonantal scripts like Arabic mostly number between 20 and about 40 signs; Hebrew has 22 signs, English 26, Arabic 28, and Cyrillic 43 signs, 33 of which are used in modern Russian. (Some consonant-rich languages of the northern Caucasus have more than 40 alphabetic signs.) Essentially syllabic scripts, in which the signs stand for syllables not vowels and consonants, number between 40 and about 85-90 basic signs; Persian has 40 signs, Japanese around 50 syllabic kana, and Linear B 60 basic signs. More complex scripts, which mix a relatively small set of phonetic signs with large numbers of logograms – such as Egyptian and Maya hieroglyphic, and Babylonian cuneiform – number many hundreds of signs, or even several thousands of signs, as in Chinese characters and the Japanese kanji borrowed from Chinese.

Once we know the size of an undeciphered script’s signary, we can then get a fair idea of whether it is an alphabetic/consonantal script, a syllabary, or a mixture of syllables and logograms, i.e. a logosyllabic script, without having any idea of the phonetic values of the signs. This broad system of classifying scripts was first recognised in the 1870s and was taken up by decipherers in the 20th century. For instance, the decipherers of Ugaritic cuneiform quickly realised that with a signary of only 30, Ugaritic could not be a logosyllabic script like Babylonian cuneiform. Ventris, from the size of the Linear B signary, convinced himself that Linear B was a syllabic script, not an alphabet or a logosyllabic script, which was an important step in the direction of decipherment.

Still-undeciphered: a Rongorongo tablet from Easter Island (written in reverse boustrophedon style). Image: J Kenoyer/Dept. of Archaeology and Museums, Govt. of Pakistan, www.harappa.com

A similar line of argument has been useful in narrowing the range of possibilities for the still-undeciphered scripts: there appear to be about 60 phonetic signs in Linear A, and perhaps 55 in Rongorongo, which, if true, would imply that both scripts are syllabaries.

Computers versus humans

If the signs of an undeciphered script can be correctly classified, with the allographs accurately identified – a challenging condition, it has to be said – each sign can be given a number and each inscription written in terms of a sequence of numbers instead of the usual graphic symbols. The inscription can also be classified by computer in a concordance, that is a catalogue organised by sign (not by inscription) that under each sign lists every inscription containing the particular sign. (Literary concordances are used by scholars to research every instance of a particular word in, say, the entire works of Shakespeare.)

above main View of the Great Bath at Mohenjo-daro, one of the major Indus Valley sites; its ancient writing system remains undeciphered. above left and above Rectangular sealstones from Mohenjodaro showing the enigmatic Indus script. — View of the Great Bath at Mohenjo-daro, one of the major Indus Valley sites; its ancient writing system remains undeciphered. Image: J Kenoyer/Dept. of Archaeology and Museums, Govt. of Pakistan, www.harappa.com

Concordances offer important possibilities for analysing the distribution of signs. Once all of the text data has been computerised in a concordance, one can ask the computer to calculate the relative sign frequencies (for instance, which is the most common sign, and which is the least common?), or to list all the inscriptions in which a particular combination of signs occurs. If one suspects this combination of representing, say, a certain word or proper name, one can then analyse in exactly which contexts (at the beginning of inscriptions, in the middle words, next to which other signs?) the combination occurs within every inscription in a corpus.

Rectangular sealstones from Mohenjodaro showing the enigmatic Indus script.
Image: J Kenoyer/Dept. of Archaeology and Museums, Govt. of Pakistan, www.harappa.com

Although such frequency analysis has been done by computer in the case of the Linear A, Meroitic and Indus script corpuses, the truth is that computers have made little impact on archaeological decipherment. Electronic computers came along more or less too late for Ventris (who, anyway, does not appear to have been interested in computing), yet none of the decipherers of recent decades has found computers as useful as they had hoped. One reason is the difficulty of discriminating between signs and their allographs, which is still a matter of human judgement; another is the great graphical complexity of, say, the Maya script, which does not lend itself to the black-and-white, discrete nature of numerical classification; yet another reason, more general, is that there is not really enough text available in the undeciphered scripts for computerised statistical techniques to prove decisive. On the whole, successful decipherment has turned out to require a synthesis of logic and intuition based on wide linguistic, archaeological and cultural knowledge that computers do not (and presumably cannot) possess. Hence, the human factor remains key to the future unlocking of the remaining undeciphered scripts. Where is the next Ventris or Champollion?

Given the discovery of new material, some of the undeciphered scripts will probably yield up their secrets. The most likely candidate, and the most important, is the script of the Indus Valley where we can surely expect the discovery of more inscriptions, given the large area of Pakistan/India in which the civilisation flourished (about a quarter the size of Europe). But even in the case of the Easter Island script, where there is virtually zero chance of finding more of the wooden tablets, since they will have rotted away, progress is still possible.

Yet another new book proposing a decipherment of Rongorongo has just reached me. The undeciphered scripts will continue to tantalise us. Who would not like to be the first to ‘speak’ with the Indus Valley dwellers of 4,000 years ago, or the sculptors of those mysterious Easter Island moai?

SOURCE
Andrew Robinson is the author of over 15 books and is a visiting fellow of Wolfson College, Cambridge.

His book Lost Languages: The Enigma of the World’s Undeciphered Scripts was published by Thames & Hudson in February 2009; his allied handbook Writing and Script: A Very Short Introduction, was published by Oxford University Press in August 2009.