Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. In What We Know About The Voynich Manuscript, it is stated "The word frequency distribution follows Zipf's law, which is a necessary (though not sufficient) test of linguistic plausibility."
Figure 1 shows that Zipf's law is followed by the Voynich Manuscript text, but only approximately. Figure 2 shows that the text generated using state transition tables has an almost identical distribution to that of the real Voynich Manuscript.
Figure 1: loge frequency vs loge rank for Voynich Manuscript
Figure 2: loge frequency vs loge rank for generated Voynich Manuscript
The distribution of word lengths in the generated manuscript also resembles that of the original manuscript, but not exactly. The real manuscript has more long words, and the generated manuscript more words with a single glyph. Possibly, this could be remedied by using state transition tables with more feedback.
Figure 3: Word length distribution for Voynich Manuscript
Figure 4: Word length distribution for generated Voynich Manuscript
Table 1 shows the number of occurrences of the commonest words in the real and the generated manuscripts.
Table 1: Word counts for generated and real Voynich Manuscript
© Copyright Donald Fisk 2017