A Plausible Language Identification for the Voynich Manuscript


Although I have shown how meaningless text very similar to that of the Voynich Manuscript could have been generated by means of a simple algorithm, I'm still open to the possibility that the text might have meaning, and would be quite happy to be proven wrong. In fact, I'd prefer this to the current limbo state where my solution isn't accepted but no one else can offer anything better. For a meaningful text solution to be acceptable, however, it would have to be accompanied by a translation of a significant quantity of the text into a modern language, the translation should make sense, and any decryption step prior to translation should not add information. (For example, while transposition ciphers are acceptable, anagrams generally aren't.)

So far, despite an increasing frequency of attempts, not only has no one has produced any such solution, there isn't even any consensus on which language the manuscript could have been written in.

Here, I'll arrive at a plausible language, even if – or perhaps especially if – it proves I was wrong in concluding that the manuscript is meaningless gibberish. In Principal Component Analysis of Glyphs in Various Languages, I used a standard statistical technique to compare text in several different languages to the Voynich Manuscript text. It was shown that different languages have different PCA plots, that this can be used to identify the language in which a text is written, and that it's even immune to simple forms of encipherment, which is important as all text is really enciphered speech. In fact, each PCA glyph plot contains within it valuable information about the underlying language and can be used to identify the sounds to some extent.

The main difficulty in matching the Voynich Manuscript glyphs with sounds is that, unlike the known languages I have already plotted, there is no clear vowel branch. (Vowels appear on PCA glyph plots on a separate branch from consonants, with the space, or word separator, at the root. Almost all languages have at least five vowel phonemes. For example, Modern Greek, Spanish, Georgian, and Japanese each has exactly five. Russian has six. English has more, with the exact number depending on the speaker's accent. (Mine has ten). Standard French has 15 including three nasalized vowels.

There are, however, a few languages with as few as two vowel phonemes, and given the apparent lack of vowels in the Voynich Manuscript glyph PCA plot, these are worth investigating. After excluding from consideration phonetically similar languages spoken in parts of the world not yet reached by early 15th century Europeans, only the Northwest Caucasian languages remain. These form a distinct group, unrelated to any other languages. As we shall see later, one, Abkhaz, has phonemes which match the glyphs in Vonyich Manuscript text quite well. Abkhaz has only two phonemically distinct vowel phonemes, а and ы, but has as many as 58 consonantal phonemes.

While Abkhaz is well worth checking, there are a few practical problems with this identification. Firstly, I'm comparing against modern Abkhaz, which contains loan words from Russian, and have observed that the shape of the PCA plot varies depending on the amount of Russian present in the text, and is also, more than other languages (probably because of the size of its phonemic inventory), dependent on the quantity of text. Secondly, apart from a few phrases recorded in the 17th century by a Turkish traveller, nothing is known to have been written in Abkhaz until the late 19th century. Originally literate Abkhazians wrote in Greek, and later in Georgian. Even today, there is very limited material in Abkhaz on the World Wide Web. (Bibles are very useful for this kind of analysis, as they're long and you know what the text means, but although most Abkhazians are Christian, I cannot find an online Abkhaz bible.) Thirdly, Abkhaz has few speakers, and Abkhazia is a republic which recently seceded from Georgia and is recognized by few counties other than the Russian Federation, so access to anyone with knowledge of Abkhaz will be difficult, and the language is even unsupported by the preferred online resource of Canadian computational linguists, Google Translate. Fourthly, Voynichese has a distinct word structure which Abkhaz appears to lack. And finally, as the number of glyphs in Voynich EVA and Abkhaz differ so much, a method would have to be found either to write Abkhaz with fewer glyphs, or form new glyphs from Voyhich di- or trigraphs.

Abkhaz is written in a modified Cyrillic alphabet. Text in Abkhaz can be found at

  3. 100 Texts in the Abkhaz language
Most of the parenthesized text in the PDF document (3.) is in Russian.

Below are the PCA glyph plots for Abkhaz text, followed by the Voynich Manuscript plot, and the plot for the text of The Darling by Chekhov in standard Russian. As you might expect, the Russian plot resembles that of other European languages. Interestingly, the closest match is Figure 2. Assuming the language of the Voynich Manuscript is Abkhaz, it's likely that EVA o is Abkhaz а, and EVA a is Abkhaz ы. Matching the consonants in the two scripts is more difficult, and would involve trial pairings and comparison of Voynichese and Abkhaz text. Cryptography isn't something I can claim expertise on, but the there must be a standard way of doing this, though it would be complicated by the different numbers of glyphs in Abkhaz and EVA.

glyph PCA from text in Abkhaz

Figure 1: PCA of Abkhaz glyphs from sites 1. and 2.

glyph PCA from text in Abkhaz annotated in Russian

Figure 2: PCA of glyphs from site 3., which is in Abkhaz with annotations in Russian.

glyph PCA from Voynich Manuscrip text

Figure 3: PCA of Voynich Manuscript glyphs.

glyph PCA from Checkov's The Darling

Figure 4: PCA of Russian glyphs from Chekhov's The Darling.

It's premature to identify Abkhaz as the language of the Voynich Manuscript. The Abkhaz of today didn't exist in the early 15th century. However, further investigation by someone familiar with Abkhaz would be worthwhile.


© Copyright Donald Fisk 2018