Voynich State Transitions

2017-03-22

One of the most surprising things about the Voynich Manuscript is that the glyphs within words follow a clearly discernible grammar, yet the words in sentences don't. (In fact, as there's no punctuation, where sentences begin and end, or whether they exist at all, is unclear.)

All the prefixes in Table 1 and Table 2 in The Structure of the Words in the Voynich Manuscript can be generated by traversing this state transition diagram:

Figure 1: State transition diagram for common prefixes

All the suffixes in Table 2, Table 4, and Table 6 can be generated by traversing this one:

Figure 2: State transition diagram for common suffixes

NB * isn't really a state in either diagram. Ideally, the two diagrams should be merged into a single diagram, and any state in Figure 1 with a transition shown to * has transitions to all of the states in Figure 2 with transitions shown from *.

q and o are normally considered separate glyphs, but q is only very rarely followed by anything other than o, so I have merged the two glyphs into a single glyph, qo.

The diagrams are simplified in a number of ways. States with the same precursor and successor states are merged, e.g. k/t is really two states, k and t, and a/o is a and o2. Numbered states, e.g. y1 and y2, both output the same letter but occur more than once. Finally, iX indicates endings such as iin and ir.

For example, daiin can be generated by traversing the states

start → d1 → a (one of a/o) → iin (one of iX) → finish

and qokeedy by traversing the states

start → qo → k (one of k/t) → e1 → e2 → d2 → y2 → finish

How it is decided which transition to taken from any state to the following one isn't at all obvious, but one possibility that should be considered is that it is random, possibly with different probabilities attached to each transition. This would mean that the text is meaningless, though it is possible that following a transition merely appears to be random, e.g. as a result of a previous encryption step, and it would be difficult to tell the two possibilities apart. However, it would render untenable the ideas that the manuscript is either plaintext in an unidentified language or non-verbose cyphertext.

Transition Probabilities

From any state, the transition probabilities can be reverse engineered by incrementing the count for the next transition leading to a state which writes the following glyph and then moving to that state, or backtracking (and decrementing the count) whenever no such state is found, and then following an alternative path from the last decision point. When this is repeated for the entire text, the probabity for a transition from a state are the number of times the transition has been traversed divided by the number of times any transition from that state has. As word frequencies vary throughout the manuscript, this will have to be done section by section.

For the blue herbal pages, the number of occurrences of words with the common prefixes and suffixes is shown in Table 1. Note that one word, dy, occurs twice, as -dy and again as d-y.

	-	-al	-aly	-aim	-ain	-aiin	-aiiin	-air	-aiir	-am	-ar	-ary	-as	-ol	-oly	-oim	-oin	-oiin	-oiiin	-oir	-oiir	-om	-or	-ory	-os	-y	-ey	-eey	-eeey	-dy	-ed	-edy	-eed	-eedy
-	0	2	0	0	2	10	0	0	1	3	7	0	0	13	2	0	0	3	0	0	0	0	20	0	4	23	0	1	0	29	0	0	0	0
d-	11	10	0	0	25	74	5	8	0	7	18	1	0	12	0	0	1	2	1	0	1	1	13	0	0	29	1	0	0	0	0	0	0	0
od-	0	1	0	0	0	8	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	1	0	0	1	0	1	0	0	0	0	0	0
qod-	0	0	0	0	0	1	0	1	0	1	2	0	0	0	0	0	0	0	0	0	0	1	0	0	0	3	0	0	0	0	0	0	0	0
chod-	0	0	2	0	1	7	0	1	0	1	2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	11	1	0	0	0	0	0	0	0
shod-	2	0	0	0	1	1	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	5	0	0	0	0	0	0	0	0
yd-	0	2	0	0	1	3	0	1	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
k-	1	0	0	0	1	9	0	1	0	0	0	1	0	5	1	0	0	0	0	0	0	0	6	0	1	3	1	5	1	0	0	0	0	0
ok-	0	4	0	0	3	13	0	0	2	1	2	1	0	10	0	0	0	2	0	0	0	1	5	0	0	5	1	2	1	0	0	0	0	0
qok-	0	0	0	0	0	4	0	0	0	1	2	0	0	3	0	0	0	2	0	0	1	1	2	0	0	3	1	4	0	0	0	0	0	0
chok-	2	0	0	0	2	1	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	1	0	0	3	1	0	0	0	0	0	0	0
shok-	1	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	2	0	1	0	0	0	0	0	0
yk-	0	1	0	0	0	3	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	1	0	0	1	0	2	0	0	0	0	0	0
t-	0	0	0	0	0	3	0	0	0	0	0	0	0	2	0	0	0	1	0	0	0	0	2	0	0	3	1	1	0	0	0	0	0	0
ot-	1	2	0	0	1	9	0	0	1	1	1	0	0	9	0	0	0	0	0	0	0	0	5	0	0	6	1	2	0	0	0	0	0	0
qot-	1	0	0	0	0	2	0	0	0	0	0	0	0	4	0	0	0	0	0	0	0	0	3	0	0	10	1	2	1	0	0	0	0	0
chot-	2	0	0	0	1	0	0	0	0	0	1	0	0	2	0	0	0	0	0	0	0	1	1	0	0	8	1	0	0	0	0	0	0	0
shot-	1	0	0	0	0	1	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
yt-	0	0	0	0	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	2	0	0	0	0	0	0	0	0
ld-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
old-	0	0	0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
qold-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
chold-	0	0	0	0	0	1	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0
shold-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
yld-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
lk-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
olk-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
qolk-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
cholk-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
sholk-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
ylk-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
lt-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
olt-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
qolt-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
cholt-	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
sholt-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
ylt-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

Table 1: Word counts for the blue herbal pages

The state transition table derived from the blue herbal pages, and limited to common prefixes and suffixes, is shown in Table 2. The first column is the state ID. The second is the glyph to be printed when the state is reached, and the subsequent columns are the per mille probabilities of the next state. Very small probabilities are shown as ε, and zero probabilities are not shown.

Note that Table 2 is a single table, without any boundary between prefixes and suffixes.

To generate a word, begin at the row for the start state. Make a weighted random choice from among the entries, and then move to the row for the corresponding state, and repeat until the finish state has been reached.

Making a weighted random choice is easy to do on a computer, but how could it have been done quickly in the early 15th Century? Dice are unsuitable. The scribe would have had to throw at least two, and then do some mental arithmetic before being able to select the next state. A much easier way would be to shuffle a deck of playing cards and draw each card in turn. This gives 52 possibilities if the suits are ordered. If the probabilities in each state transition table were entered cumulatively as particular cards in the deck, our scribe could search along the row until one with a higher card value is found, choosing the next state that way, which would be quite fast.

According to Early references to Playing Cards, playing cards were introduced into Europe in the late 14th century, and became popular over the next few decades. Tarot cards appeared by 1440, and might have been used instead.

		start	qo	d1	o	l1	ch	sh	y1	k	t	e1	e2	e3	d2	y2	a	o2	l2	r	s	im	in	iin	iiin	ir	iir	m	finish
start			91	352	174	ε	94	32	35	57	21	ε	2	ε	ε	37	40	67											ε
qo	qo			158		ε				421	421
d1	d											ε	3	7	3	179	652	111											44
o	o			261		53				367	319
l1	l			900						ε	100
ch	ch				1000
sh	sh				1000
y1	y			364		ε				409	227
k	k											14	101	29	ε	123	384	319											29
t	t											10	49	39	ε	291	252	311											49
e1	e												1000
e2	e													1000	ε
e3	e															1000
d2	d															ε													1000
y2	y																												1000
a	a																		81	138	3	3	135	519	17	40	13	51
o2	o																		430	404	33	ε	7	73	7	ε	13	33
l2	l															56													944
r	r															39													961
s	s																												1000
im	im																												1000
in	in																												1000
iin	iin																												1000
iiin	iiin																												1000
ir	ir																												1000
iir	iir																												1000
m	m																												1000
finish		1000

Table 2: State transition table for the blue herbal pages

The state transition diagram in Table 2 was then used to predict the number of occurrences of words, resulting in Table 3.

	-	-al	-aly	-aim	-ain	-aiin	-aiiin	-air	-aiir	-am	-ar	-ary	-as	-ol	-oly	-oim	-oin	-oiin	-oiiin	-oir	-oiir	-om	-or	-ory	-os	-y	-ey	-eey	-eeey	-dy	-ed	-edy	-eed	-eedy
-	0	2	0	0	3	13	0	1	0	1	3	0	0	17	1	0	0	3	0	0	1	1	16	1	1	23	0	1	0	40	0	0	0	0
d-	10	11	1	0	19	75	2	6	2	7	19	1	0	10	1	0	0	2	0	0	0	1	10	0	1	40	1	1	0	0	0	0	0	0
od-	1	1	0	0	2	10	0	1	0	1	2	0	0	1	0	0	0	0	0	0	0	0	1	0	0	5	0	0	0	0	0	0	0	0
qod-	0	0	0	0	1	3	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0
chod-	1	1	0	0	1	5	0	0	0	1	1	0	0	1	0	0	0	0	0	0	0	0	1	0	0	3	0	0	0	0	0	0	0	0
shod-	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
yd-	0	0	0	0	1	3	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
k-	1	1	0	0	2	7	0	1	0	1	2	0	0	5	0	0	0	1	0	0	0	0	4	0	0	4	1	4	1	0	0	0	0	0
ok-	1	1	0	0	2	8	0	1	0	1	2	0	0	5	0	0	0	1	0	0	0	0	5	0	0	5	1	4	1	0	0	0	0	0
qok-	1	1	0	0	1	5	0	0	0	0	1	0	0	3	0	0	0	1	0	0	0	0	3	0	0	3	1	2	0	0	0	0	0	0
chok-	1	1	0	0	1	4	0	0	0	0	1	0	0	3	0	0	0	1	0	0	0	0	3	0	0	3	1	2	0	0	0	0	0	0
shok-	0	0	0	0	0	1	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	1	0	0	1	0	1	0	0	0	0	0	0
yk-	0	0	0	0	0	2	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	1	0	0	1	0	1	0	0	0	0	0	0
t-	1	0	0	0	0	2	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	2	0	0	4	1	1	0	0	0	0	0	0
ot-	2	1	0	0	1	5	0	0	0	0	1	0	0	4	0	0	0	1	0	0	0	0	4	0	0	10	1	2	0	0	0	0	0	0
qot-	1	0	0	0	1	3	0	0	0	0	1	0	0	3	0	0	0	1	0	0	0	0	3	0	0	7	1	1	0	0	0	0	0	0
chot-	1	0	0	0	1	2	0	0	0	0	1	0	0	2	0	0	0	0	0	0	0	0	2	0	0	5	1	1	0	0	0	0	0	0
shot-	0	0	0	0	0	1	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	1	0	0	2	0	0	0	0	0	0	0	0
yt-	0	0	0	0	0	1	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	1	0	0	1	0	0	0	0	0	0	0	0
ld-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
old-	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
qold-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
chold-	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
shold-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
yld-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
lk-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
olk-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
qolk-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
cholk-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
sholk-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
ylk-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
lt-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
olt-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
qolt-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
cholt-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
sholt-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
ylt-	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

Table 3: Word counts for the blue herbal pages

The RMS error on the non-zero values in either the Table 1 or Table 3 is 1.933788. A different state transition table was generated for each section, and the RMS errors of the predicted word counts were calculated as before. The results are shown in Table 4.

	blue herbal	black herbal	red herbal	green herbal	astro	pharma	blue text	black text	red text	blue bio	black bio	red bio
Error	1.93	2.77	3.85	2.13	2.09	2.85	4.19	5.02	5.18	2.30	3.44	3.16

Table 4: RMS errors for non-zero values of predicted word counts

Table 5 and Table 6 show the state transition diagrams for two further sections, red text and blue biological.

		start	qo	d1	o	l1	ch	sh	y1	k	t	e1	e2	e3	d2	y2	a	o2	l2	r	s	im	in	iin	iiin	ir	iir	m	finish
start			283	70	285	71	12	6	15	33	23	ε	ε	ε	ε	5	149	49											ε
qo	qo			10		3				831	156
d1	d											11	ε	ε	ε	129	839	11											11
o	o			40		61				511	389
l1	l			10						949	41
ch	ch				1000
sh	sh				1000
y1	y			ε		ε				813	188
k	k											292	393	51	ε	19	228	7											9
t	t											216	375	34	ε	34	327	14											ε
e1	e												1000
e2	e													384	616
e3	e															1000
d2	d															971													29
y2	y																												1000
a	a																		215	176	ε	14	183	295	14	32	ε	71
o2	o																		557	344	ε	ε	16	33	ε	ε	ε	49
l2	l															16													984
r	r															51													949
s	s
im	im																												1000
in	in																												1000
iin	iin																												1000
iiin	iiin																												1000
ir	ir																												1000
iir	iir
m	m																												1000
finish		1000

Table 5: State transition table for the red text pages

		start	qo	d1	o	l1	ch	sh	y1	k	t	e1	e2	e3	d2	y2	a	o2	l2	r	s	im	in	iin	iiin	ir	iir	m	finish
start			276	152	259	9	1	ε	39	43	28	1	ε	ε	ε	6	33	154											ε
qo	qo			ε		16				837	147
d1	d											7	ε	ε	ε	155	669	155											14
o	o			9		199				476	316
l1	l			69						828	103
ch	ch				1000
sh	sh
y1	y			29		ε				571	400
k	k											166	375	55	2	52	309	40											ε
t	t											97	448	65	6	45	253	78											6
e1	e												1000
e2	e													134	866
e3	e															1000
d2	d															974													26
y2	y																												1000
a	a																		242	232	ε	ε	225	259	7	7	ε	27
o2	o																		755	218	ε	ε	5	11	11	ε	ε	ε
l2	l															94													906
r	r															55													945
s	s
im	im
in	in																												1000
iin	iin																												1000
iiin	iiin																												1000
ir	ir																												1000
iir	iir
m	m																												1000
finish		1000

Table 6: State transition table for the blue biological pages

With the existing state transition tables, less than 40% of the words can be generated, but it is possible to extend them to handle most if not all of the text.