Voynich State Transitions

2017-03-22

One of the most surprising things about the Voynich Manuscript is that the glyphs within words follow a clearly discernible grammar, yet the words in sentences don't. (In fact, as there's no punctuation, where sentences begin and end, or whether they exist at all, is unclear.)

All the prefixes in Table 1 and Table 2 in The Structure of the Words in the Voynich Manuscript can be generated by traversing this state transition diagram:

simplified prefix states

Figure 1: State transition diagram for common prefixes

All the suffixes in Table 2, Table 4, and Table 6 can be generated by traversing this one:

simplified suffix states

Figure 2: State transition diagram for common suffixes

NB * isn't really a state in either diagram. Ideally, the two diagrams should be merged into a single diagram, and any state in Figure 1 with a transition shown to * has transitions to all of the states in Figure 2 with transitions shown from *.

q and o are normally considered separate glyphs, but q is only very rarely followed by anything other than o, so I have merged the two glyphs into a single glyph, qo.

The diagrams are simplified in a number of ways. States with the same precursor and successor states are merged, e.g. k/t is really two states, k and t, and a/o is a and o2. Numbered states, e.g. y1 and y2, both output the same letter but occur more than once. Finally, iX indicates endings such as iin and ir.

For example, daiin can be generated by traversing the states

    startd1a (one of a/o) → iin (one of iX) → finish

and qokeedy by traversing the states

    startqok (one of k/t) → e1e2d2y2finish

How it is decided which transition to taken from any state to the following one isn't at all obvious, but one possibility that should be considered is that it is random, possibly with different probabilities attached to each transition. This would mean that the text is meaningless, though it is possible that following a transition merely appears to be random, e.g. as a result of a previous encryption step, and it would be difficult to tell the two possibilities apart. However, it would render untenable the ideas that the manuscript is either plaintext in an unidentified language or non-verbose cyphertext.

Transition Probabilities

From any state, the transition probabilities can be reverse engineered by incrementing the count for the next transition leading to a state which writes the following glyph and then moving to that state, or backtracking (and decrementing the count) whenever no such state is found, and then following an alternative path from the last decision point. When this is repeated for the entire text, the probabity for a transition from a state are the number of times the transition has been traversed divided by the number of times any transition from that state has. As word frequencies vary throughout the manuscript, this will have to be done section by section.

For the blue herbal pages, the number of occurrences of words with the common prefixes and suffixes is shown in Table 1. Note that one word, dy, occurs twice, as -dy and again as d-y.

--al-aly-aim-ain-aiin-aiiin-air-aiir-am-ar-ary-as-ol-oly-oim-oin-oiin-oiiin-oir-oiir-om-or-ory-os-y-ey-eey-eeey-dy-ed-edy-eed-eedy
-020021000137001320030000200423010290000
d-111000257458071810120012101113002910000000
od-0100080000100000000000100101000000
qod-0000010101200000000001000300000000
chod-00201701012000000000000001110000000
shod-2000110000010000000000000500000000
yd-0200130100001000000000000000000000
k-1000190100010510000000601315100000
ok-040031300212101000020001500512100000
qok-0000040001200300020011200314000000
chok-2000210000000000010000100310000000
shok-1000010000000000000000000201000000
yk-0100030000000100000000100102000000
t-0000030000000200010000200311000000
ot-1200190011100900000000500612000000
qot-10000200000004000000003001012100000
chot-2000100000100200000001100810000000
shot-1000010000000100000000000100000000
yt-0000110000000000000000100200000000
ld-0000000000000000000000000000000000
old-0001110000000000000000000100000000
qold-0000000000000000000000000000000000
chold-0000010000100000000000000200000000
shold-0000000000000000000000000100000000
yld-0000000000000000000000000000000000
lk-0000000000000000000000000000000000
olk-0000000000000000000000000000000000
qolk-0000000000000000000000000000000000
cholk-0000000000000000000000000000000000
sholk-0000000000000000000000000000000000
ylk-0000000000000000000000000000000000
lt-0000000000000000000000000000000000
olt-0000000000000000000000000000000000
qolt-0000000000000000000000000000000000
cholt-0000010000000000000000000000000000
sholt-0000000000000000000000000000000000
ylt-0000000000000000000000000000000000

Table 1: Word counts for the blue herbal pages

The state transition table derived from the blue herbal pages, and limited to common prefixes and suffixes, is shown in Table 2. The first column is the state ID. The second is the glyph to be printed when the state is reached, and the subsequent columns are the per mille probabilities of the next state. Very small probabilities are shown as ε, and zero probabilities are not shown.

Note that Table 2 is a single table, without any boundary between prefixes and suffixes.

To generate a word, begin at the row for the start state. Make a weighted random choice from among the entries, and then move to the row for the corresponding state, and repeat until the finish state has been reached.

Making a weighted random choice is easy to do on a computer, but how could it have been done quickly in the early 15th Century? Dice are unsuitable. The scribe would have had to throw at least two, and then do some mental arithmetic before being able to select the next state. A much easier way would be to shuffle a deck of playing cards and draw each card in turn. This gives 52 possibilities if the suits are ordered. If the probabilities in each state transition table were entered cumulatively as particular cards in the deck, our scribe could search along the row until one with a higher card value is found, choosing the next state that way, which would be quite fast.

According to Early references to Playing Cards, playing cards were introduced into Europe in the late 14th century, and became popular over the next few decades. Tarot cards appeared by 1440, and might have been used instead.

 startqod1ol1chshy1kte1e2e3d2y2ao2l2rsiminiiniiiniriirmfinish
start  91352174ε9432355721ε2εε374067          ε
qoqo  158 ε   421421                  
d1d          ε373179652111          44
oo  261 53   367319                  
l1l  900     ε100                  
chch   1000                        
shsh   1000                        
y1y  364 ε   409227                  
kk          1410129ε123384319          29
tt          104939ε291252311          49
e1e           1000                
e2e            1000ε              
e3e              1000             
d2d              ε            1000
y2y                           1000
aa                 811383313551917401351 
o2o                 43040433ε7737ε1333 
l2l              56            944
rr              39            961
ss                           1000
imim                           1000
inin                           1000
iiniin                           1000
iiiniiin                           1000
irir                           1000
iiriir                           1000
mm                           1000
finish 1000                           

Table 2: State transition table for the blue herbal pages

The state transition diagram in Table 2 was then used to predict the number of occurrences of words, resulting in Table 3.

--al-aly-aim-ain-aiin-aiiin-air-aiir-am-ar-ary-as-ol-oly-oim-oin-oiin-oiiin-oir-oiir-om-or-ory-os-y-ey-eey-eeey-dy-ed-edy-eed-eedy
-020031301013001710030011161123010400000
d-101110197526271910101002000110014011000000
od-11002100101200100000000100500000000
qod-0000130000100000000000000200000000
chod-1100150001100100000000100300000000
shod-0000020000000000000000000100000000
yd-0000130000100000000000000100000000
k-1100270101200500010000400414100000
ok-1100280101200500010000500514100000
qok-1100150000100300010000300312000000
chok-1100140000100300010000300312000000
shok-0000010000000100000000100101000000
yk-0000020000000100000000100101000000
t-1000020000000200000000200411000000
ot-21001500001004000100004001012000000
qot-1000130000100300010000300711000000
chot-1000120000100200000000200511000000
shot-0000010000000100000000100200000000
yt-0000010000000100000000100100000000
ld-0000000000000000000000000000000000
old-0000020000000000000000000100000000
qold-0000000000000000000000000000000000
chold-0000010000000000000000000100000000
shold-0000000000000000000000000000000000
yld-0000000000000000000000000000000000
lk-0000000000000000000000000000000000
olk-0000000000000000000000000000000000
qolk-0000000000000000000000000000000000
cholk-0000000000000000000000000000000000
sholk-0000000000000000000000000000000000
ylk-0000000000000000000000000000000000
lt-0000000000000000000000000000000000
olt-0000000000000000000000000000000000
qolt-0000000000000000000000000000000000
cholt-0000000000000000000000000000000000
sholt-0000000000000000000000000000000000
ylt-0000000000000000000000000000000000

Table 3: Word counts for the blue herbal pages

The RMS error on the non-zero values in either the Table 1 or Table 3 is 1.933788. A different state transition table was generated for each section, and the RMS errors of the predicted word counts were calculated as before. The results are shown in Table 4.

blue
herbal
black
herbal
red
herbal
green
herbal
astropharmablue
text
black
text
red
text
blue
bio
black
bio
red
bio
Error1.932.773.852.132.092.854.195.025.182.303.443.16

Table 4: RMS errors for non-zero values of predicted word counts

Table 5 and Table 6 show the state transition diagrams for two further sections, red text and blue biological.

 startqod1ol1chshy1kte1e2e3d2y2ao2l2rsiminiiniiiniriirmfinish
start  2837028571126153323εεεε514949          ε
qoqo  10 3   831156                  
d1d          11εεε12983911          11
oo  40 61   511389                  
l1l  10     94941                  
chch   1000                        
shsh   1000                        
y1y  ε ε   813188                  
kk          29239351ε192287          9
tt          21637534ε3432714          ε
e1e           1000                
e2e            384616              
e3e              1000             
d2d              971            29
y2y                           1000
aa                 215176ε141832951432ε71 
o2o                 557344εε1633εεε49 
l2l              16            984
rr              51            949
ss                            
imim                           1000
inin                           1000
iiniin                           1000
iiiniiin                           1000
irir                           1000
iiriir                            
mm                           1000
finish 1000                           

Table 5: State transition table for the red text pages

 startqod1ol1chshy1kte1e2e3d2y2ao2l2rsiminiiniiiniriirmfinish
start  27615225991ε3943281εεε633154          ε
qoqo  ε 16   837147                  
d1d          7εεε155669155          14
oo  9 199   476316                  
l1l  69     828103                  
chch   1000                        
shsh                            
y1y  29 ε   571400                  
kk          1663755525230940          ε
tt          974486564525378          6
e1e           1000                
e2e            134866              
e3e              1000             
d2d              974            26
y2y                           1000
aa                 242232εε22525977ε27 
o2o                 755218εε51111εεε 
l2l              94            906
rr              55            945
ss                            
imim                            
inin                           1000
iiniin                           1000
iiiniiin                           1000
irir                           1000
iiriir                            
mm                           1000
finish 1000                           

Table 6: State transition table for the blue biological pages

With the existing state transition tables, less than 40% of the words can be generated, but it is possible to extend them to handle most if not all of the text.

Up

© Copyright Donald Fisk 2017