Generating the Voynich Manuscript

2017-03-29

It is possible to generate most of the words in the manuscript by adding extra states and transitions to the state transition table in Voynich State Transitions. Without much effort, I was able to generate 33,123 words out of a total of 37,123, i.e. just under 90%, using the state transition table shown in Table 1 below. It may be possible to generate almost all of it, using a larger table or different states and transitions. Anything remaining (e.g. rare glyphs) could have been improvised by the scribe.

 startqod1r1s1o1a1l1ch1sh1y1fkptcfhckhcphcthe1e2e3o3d2ch2sh2y2a2o2l2r2s2imiiminiiniiiniriirmfinish
start  150941333205293716089433311325ε5313ε112ε  42518           ε
qoqo  246   485ε 859731214ε131551061ε  1104      ε    6
d1d     51210 623424        38342  15738134           28
r1r        4316         11ε322  4418464           613
s1s     169158 5116         37ε44  2122087           259
o1o  75 19  28995 92603424218210456     11 10           
a1a  7518   461   ε8ε4εεεε                      
l1l  41  43  10944 7177617    3326    4930           465
ch1ch     180    6ε1627124313271582901872  76425742         3
sh1sh     142    4ε13ε5ε21ε12402173434236  4929393ε         7
y1y  60 10  1014854 1334945302            9             
ff        52350         6εε25ε  65171106           53
kk        11222         48234119141  7130962           8
pp        59257         εεε2719  34128121           20
tt        17532         29168117282  7827388           9
cfhcfh                   ε111130241111  2969319            
ckhckh                   56522410338  516417            
cphcph                   1313218223338  25812619            
cthcth                   36514924032  4237414            
e1e               3113556 823                    
e2e                     541448                 11
e3e           442618        47713437037   29         
o3o           ε34318        26862   25512649        239
d2d                          78214914           55
ch2ch                     388 178  426             8
sh2sh                     349 181  446             24
y2y                       7                993
a2a                             1731806421553531043966 
o2o                          23  26258911εε55192541 
l2l                       442091083220           768
r2r                       681376923           857
s2s                                        1000
imim                                        1000
iimiim                                        1000
inin                                        1000
iiniin                                        1000
iiiniiin                                        1000
irir                                        1000
iiriir                                        1000
mm                                        1000
finish 1000                                        

Table 1: State machine diagram accounting for almost 90% of the words in the Voynich Manuscript

It would have been impossible to generate all of the manuscript using a single table such as Table 1, however. To account for the differences between sections, each manuscript section requires its own table, which could have been composed by tweaking entries in a previously produced table.

Measuring transition probabilities using the method described in Transition Probabilities, with the expanded state transition table (Table 1) as a template, on average resulted in a significantly higher error rate (ranging from slightly lower to more than double), so I decided to adjust the transition probabilities, by finding words generated more or less often using the state transition table than are present in the manuscript, and adjusting their transition counts downwards or upwards. This improved the results, but there were still some discrepancies. It might have been possible to improve the results further, e.g. by doing gradient descent in a multi-dimensional space (each dimension corresponding to a non-zero entry in the state transition table), but this would be very computationally intensive.

Figure 1 shows the word count for text, generated ten times using the state transition table for the red herbal pages (Currier B) of the manuscript (Table 2), plotted against actual word count in the Voynich Manuscript. Ideally, this should be a straight line of gradient 1.0, but with any process with an element of randomness will result in points being spread about that line. Here, there is an additional error caused by an imperfect state transition table. Similar plots for some other parts of the manuscript are shown in Figure 2, Figure 3, and Figure 4.

Generated vs actual word counts for red herbal pages

Figure 1: Generated vs actual word counts for red herbal pages

 startqod1r1s1o1a1l1ch1sh1y1fkptcfhckhcphcthe1e2e3o3d2ch2sh2y2a2o2l2r2s2imiiminiiniiiniriirmfinish
start  119100721211221915379636471639ε734212εε  34433           ε
qoqo  343   113ε 2666026198ε8ε8ε38εε  ε55      ε    3
d1d     32170 383244        2822ε  15245732           28
r1r        9ε         9εε5217  781559           672
s1s     37110 3712         ε24εε12  2446324           256
o1o  124 16  18932 18371232091916219     9 7           
a1a  6577   393   ε18εεεεε6                      
l1l  125  78  9544 20176731    3εεε    8537           298
ch1ch     130    6ε29211280215291183008174  561515εε         6
sh1sh     160    εε20εεε26ε1547215334684  49920εε         15
y1y  66 4  ε6626 3144570284            9             
ff        45513         26εεεε  6523478           130
kk        10524         22145165162  10035355           13
pp        56460         9εε5151  5113734           43
tt        16542         610919644ε  7728660           14
cfhcfh                   ε167167333167  167εε            
ckhckh                   ε4215033100  63342ε            
cphcph                   ε33333383ε  250εε            
cthcth                   ε223486587  43543ε            
e1e               13125ε75 788                    
e2e                     461514                 26
e3e           201501524        560841818   30         
o3o           326319        511103   1607064        131
d2d                          78115013           55
ch2ch                     400 450  150             ε
sh2sh                     222 333  444             ε
y2y                       10                990
a2a                             14322085ε753925451493 
o2o                          11  25768818εεε74εε14 
l2l                       633071331720           730
r2r                       192423468           897
s2s                                        1000
imim                                        1000
iimiim                                         
inin                                        1000
iiniin                                        1000
iiiniiin                                        1000
irir                                        1000
iiriir                                        1000
mm                                        1000
finish 1000                                        

Table 2: State transition diagram for the red herbal pages

blue herbal word counts

Figure 2: Generated vs actual word counts for blue herbal pages

astro herbal word counts

Figure 3: Generated vs actual word counts for astronomical/astrological/cosmological pages

blue bio word counts

Figure 4: Generated vs actual word counts for blue biological pages

The recomputed RMS errors, for the augmented state transition tables, are shown in Table 3.

blue
herbal
black
herbal
red
herbal
green
herbal
astropharmablue
text
black
text
red
text
blue
bio
black
bio
red
bio
Error3.933.704.813.224.033.453.717.814.133.133.593.41

Table 3: RMS errors for non-zero values of predicted word counts

Each section of the manuscript was generated using its state transition table. Text was broken every time a p or f was generated at the start of a word, and lines were broken after reaching a certain length. There are probably better ways of determining paragraph breaks, but I didn't investigate this. Note that one of the characteristics of the text, word repetition or similar words in close proximity, occurs naturally as a result of randomness.

The resulting text is contained in Generated Voynich Manuscript. I may convert it to HTML and add some illustrations later. Note that two of the characteristics of the text, word repetition and similar words in close proximity, occur naturally as a result of randomness.

The real Voynich Manuscript, converted into EVA, is here.

Up

© Copyright Donald Fisk 2017