2017-03-22
One of the most surprising things about the Voynich Manuscript is that the glyphs within words follow a clearly discernible grammar, yet the words in sentences don't. (In fact, as there's no punctuation, where sentences begin and end, or whether they exist at all, is unclear.) All the prefixes in Table 1 and Table 2 in The Structure of the Words in the Voynich Manuscript can be generated by traversing this state transition diagram: ![]() Figure 1: State transition diagram for common prefixes All the suffixes in Table 2, Table 4, and Table 6 can be generated by traversing this one: ![]() Figure 2: State transition diagram for common suffixes NB
The diagrams are simplified in a number of ways. States
with the same precursor and successor states are merged, e.g.
For example, start → d1 → a (one of a/o ) → iin (one of iX ) → finish
and start → qo → k (one of k/t ) → e1 → e2 → d2 → y2 → finish
How it is decided which transition to taken from any state to the following one isn't at all obvious, but one possibility that should be considered is that it is random, possibly with different probabilities attached to each transition. This would mean that the text is meaningless, though it is possible that following a transition merely appears to be random, e.g. as a result of a previous encryption step, and it would be difficult to tell the two possibilities apart. However, it would render untenable the ideas that the manuscript is either plaintext in an unidentified language or non-verbose cyphertext. Transition ProbabilitiesFrom any state, the transition probabilities can be reverse engineered by incrementing the count for the next transition leading to a state which writes the following glyph and then moving to that state, or backtracking (and decrementing the count) whenever no such state is found, and then following an alternative path from the last decision point. When this is repeated for the entire text, the probabity for a transition from a state are the number of times the transition has been traversed divided by the number of times any transition from that state has. As word frequencies vary throughout the manuscript, this will have to be done section by section. For the blue herbal pages, the number of occurrences
of words with the common prefixes and suffixes is shown in Table 1.
Note that one word, |
- | -al | -aly | -aim | -ain | -aiin | -aiiin | -air | -aiir | -am | -ar | -ary | -as | -ol | -oly | -oim | -oin | -oiin | -oiiin | -oir | -oiir | -om | -or | -ory | -os | -y | -ey | -eey | -eeey | -dy | -ed | -edy | -eed | -eedy | |
- | 0 | 2 | 0 | 0 | 2 | 10 | 0 | 0 | 1 | 3 | 7 | 0 | 0 | 13 | 2 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 20 | 0 | 4 | 23 | 0 | 1 | 0 | 29 | 0 | 0 | 0 | 0 |
d- | 11 | 10 | 0 | 0 | 25 | 74 | 5 | 8 | 0 | 7 | 18 | 1 | 0 | 12 | 0 | 0 | 1 | 2 | 1 | 0 | 1 | 1 | 13 | 0 | 0 | 29 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
od- | 0 | 1 | 0 | 0 | 0 | 8 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
qod- | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
chod- | 0 | 0 | 2 | 0 | 1 | 7 | 0 | 1 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 11 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
shod- | 2 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
yd- | 0 | 2 | 0 | 0 | 1 | 3 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
k- | 1 | 0 | 0 | 0 | 1 | 9 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 5 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 1 | 3 | 1 | 5 | 1 | 0 | 0 | 0 | 0 | 0 |
ok- | 0 | 4 | 0 | 0 | 3 | 13 | 0 | 0 | 2 | 1 | 2 | 1 | 0 | 10 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 5 | 0 | 0 | 5 | 1 | 2 | 1 | 0 | 0 | 0 | 0 | 0 |
qok- | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 3 | 0 | 0 | 0 | 2 | 0 | 0 | 1 | 1 | 2 | 0 | 0 | 3 | 1 | 4 | 0 | 0 | 0 | 0 | 0 | 0 |
chok- | 2 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
shok- | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
yk- | 0 | 1 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 |
t- | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 3 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
ot- | 1 | 2 | 0 | 0 | 1 | 9 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 0 | 6 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 |
qot- | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 10 | 1 | 2 | 1 | 0 | 0 | 0 | 0 | 0 |
chot- | 2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 8 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
shot- | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
yt- | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ld- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
old- | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
qold- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
chold- | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
shold- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
yld- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
lk- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
olk- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
qolk- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
cholk- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
sholk- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ylk- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
lt- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
olt- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
qolt- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
cholt- | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
sholt- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ylt- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Table 1: Word counts for the blue herbal pages
The state transition table derived from the blue herbal pages, and limited to common prefixes and suffixes, is shown in Table 2. The first column is the state ID. The second is the glyph to be printed when the state is reached, and the subsequent columns are the per mille probabilities of the next state. Very small probabilities are shown as ε, and zero probabilities are not shown. Note that Table 2 is a single table, without any boundary between prefixes and suffixes. To generate a word, begin at the row for the start state. Make a weighted random choice from among the entries, and then move to the row for the corresponding state, and repeat until the finish state has been reached. Making a weighted random choice is easy to do on a computer, but how could it have been done quickly in the early 15th Century? Dice are unsuitable. The scribe would have had to throw at least two, and then do some mental arithmetic before being able to select the next state. A much easier way would be to shuffle a deck of playing cards and draw each card in turn. This gives 52 possibilities if the suits are ordered. If the probabilities in each state transition table were entered cumulatively as particular cards in the deck, our scribe could search along the row until one with a higher card value is found, choosing the next state that way, which would be quite fast. According to Early references to Playing Cards, playing cards were introduced into Europe in the late 14th century, and became popular over the next few decades. Tarot cards appeared by 1440, and might have been used instead. |
start | qo | d1 | o | l1 | ch | sh | y1 | k | t | e1 | e2 | e3 | d2 | y2 | a | o2 | l2 | r | s | im | in | iin | iiin | ir | iir | m | finish | ||
start | 91 | 352 | 174 | ε | 94 | 32 | 35 | 57 | 21 | ε | 2 | ε | ε | 37 | 40 | 67 | ε | ||||||||||||
qo | qo | 158 | ε | 421 | 421 | ||||||||||||||||||||||||
d1 | d | ε | 3 | 7 | 3 | 179 | 652 | 111 | 44 | ||||||||||||||||||||
o | o | 261 | 53 | 367 | 319 | ||||||||||||||||||||||||
l1 | l | 900 | ε | 100 | |||||||||||||||||||||||||
ch | ch | 1000 | |||||||||||||||||||||||||||
sh | sh | 1000 | |||||||||||||||||||||||||||
y1 | y | 364 | ε | 409 | 227 | ||||||||||||||||||||||||
k | k | 14 | 101 | 29 | ε | 123 | 384 | 319 | 29 | ||||||||||||||||||||
t | t | 10 | 49 | 39 | ε | 291 | 252 | 311 | 49 | ||||||||||||||||||||
e1 | e | 1000 | |||||||||||||||||||||||||||
e2 | e | 1000 | ε | ||||||||||||||||||||||||||
e3 | e | 1000 | |||||||||||||||||||||||||||
d2 | d | ε | 1000 | ||||||||||||||||||||||||||
y2 | y | 1000 | |||||||||||||||||||||||||||
a | a | 81 | 138 | 3 | 3 | 135 | 519 | 17 | 40 | 13 | 51 | ||||||||||||||||||
o2 | o | 430 | 404 | 33 | ε | 7 | 73 | 7 | ε | 13 | 33 | ||||||||||||||||||
l2 | l | 56 | 944 | ||||||||||||||||||||||||||
r | r | 39 | 961 | ||||||||||||||||||||||||||
s | s | 1000 | |||||||||||||||||||||||||||
im | im | 1000 | |||||||||||||||||||||||||||
in | in | 1000 | |||||||||||||||||||||||||||
iin | iin | 1000 | |||||||||||||||||||||||||||
iiin | iiin | 1000 | |||||||||||||||||||||||||||
ir | ir | 1000 | |||||||||||||||||||||||||||
iir | iir | 1000 | |||||||||||||||||||||||||||
m | m | 1000 | |||||||||||||||||||||||||||
finish | 1000 |
Table 2: State transition table for the blue herbal pages
The state transition diagram in Table 2 was then used to predict the number of occurrences of words, resulting in Table 3. |
- | -al | -aly | -aim | -ain | -aiin | -aiiin | -air | -aiir | -am | -ar | -ary | -as | -ol | -oly | -oim | -oin | -oiin | -oiiin | -oir | -oiir | -om | -or | -ory | -os | -y | -ey | -eey | -eeey | -dy | -ed | -edy | -eed | -eedy | |
- | 0 | 2 | 0 | 0 | 3 | 13 | 0 | 1 | 0 | 1 | 3 | 0 | 0 | 17 | 1 | 0 | 0 | 3 | 0 | 0 | 1 | 1 | 16 | 1 | 1 | 23 | 0 | 1 | 0 | 40 | 0 | 0 | 0 | 0 |
d- | 10 | 11 | 1 | 0 | 19 | 75 | 2 | 6 | 2 | 7 | 19 | 1 | 0 | 10 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 10 | 0 | 1 | 40 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
od- | 1 | 1 | 0 | 0 | 2 | 10 | 0 | 1 | 0 | 1 | 2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
qod- | 0 | 0 | 0 | 0 | 1 | 3 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
chod- | 1 | 1 | 0 | 0 | 1 | 5 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
shod- | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
yd- | 0 | 0 | 0 | 0 | 1 | 3 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
k- | 1 | 1 | 0 | 0 | 2 | 7 | 0 | 1 | 0 | 1 | 2 | 0 | 0 | 5 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 4 | 1 | 4 | 1 | 0 | 0 | 0 | 0 | 0 |
ok- | 1 | 1 | 0 | 0 | 2 | 8 | 0 | 1 | 0 | 1 | 2 | 0 | 0 | 5 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 5 | 0 | 0 | 5 | 1 | 4 | 1 | 0 | 0 | 0 | 0 | 0 |
qok- | 1 | 1 | 0 | 0 | 1 | 5 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 3 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 |
chok- | 1 | 1 | 0 | 0 | 1 | 4 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 3 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 |
shok- | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
yk- | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
t- | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 4 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
ot- | 2 | 1 | 0 | 0 | 1 | 5 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 4 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 10 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 |
qot- | 1 | 0 | 0 | 0 | 1 | 3 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 7 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
chot- | 1 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 5 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
shot- | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
yt- | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ld- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
old- | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
qold- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
chold- | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
shold- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
yld- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
lk- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
olk- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
qolk- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
cholk- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
sholk- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ylk- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
lt- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
olt- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
qolt- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
cholt- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
sholt- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ylt- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Table 3: Word counts for the blue herbal pages
The RMS error on the non-zero values in either the Table 1 or Table 3 is 1.933788. A different state transition table was generated for each section, and the RMS errors of the predicted word counts were calculated as before. The results are shown in Table 4.
Table 4: RMS errors for non-zero values of predicted word counts Table 5 and Table 6 show the state transition diagrams for two further sections, red text and blue biological. |
start | qo | d1 | o | l1 | ch | sh | y1 | k | t | e1 | e2 | e3 | d2 | y2 | a | o2 | l2 | r | s | im | in | iin | iiin | ir | iir | m | finish | ||
start | 283 | 70 | 285 | 71 | 12 | 6 | 15 | 33 | 23 | ε | ε | ε | ε | 5 | 149 | 49 | ε | ||||||||||||
qo | qo | 10 | 3 | 831 | 156 | ||||||||||||||||||||||||
d1 | d | 11 | ε | ε | ε | 129 | 839 | 11 | 11 | ||||||||||||||||||||
o | o | 40 | 61 | 511 | 389 | ||||||||||||||||||||||||
l1 | l | 10 | 949 | 41 | |||||||||||||||||||||||||
ch | ch | 1000 | |||||||||||||||||||||||||||
sh | sh | 1000 | |||||||||||||||||||||||||||
y1 | y | ε | ε | 813 | 188 | ||||||||||||||||||||||||
k | k | 292 | 393 | 51 | ε | 19 | 228 | 7 | 9 | ||||||||||||||||||||
t | t | 216 | 375 | 34 | ε | 34 | 327 | 14 | ε | ||||||||||||||||||||
e1 | e | 1000 | |||||||||||||||||||||||||||
e2 | e | 384 | 616 | ||||||||||||||||||||||||||
e3 | e | 1000 | |||||||||||||||||||||||||||
d2 | d | 971 | 29 | ||||||||||||||||||||||||||
y2 | y | 1000 | |||||||||||||||||||||||||||
a | a | 215 | 176 | ε | 14 | 183 | 295 | 14 | 32 | ε | 71 | ||||||||||||||||||
o2 | o | 557 | 344 | ε | ε | 16 | 33 | ε | ε | ε | 49 | ||||||||||||||||||
l2 | l | 16 | 984 | ||||||||||||||||||||||||||
r | r | 51 | 949 | ||||||||||||||||||||||||||
s | s | ||||||||||||||||||||||||||||
im | im | 1000 | |||||||||||||||||||||||||||
in | in | 1000 | |||||||||||||||||||||||||||
iin | iin | 1000 | |||||||||||||||||||||||||||
iiin | iiin | 1000 | |||||||||||||||||||||||||||
ir | ir | 1000 | |||||||||||||||||||||||||||
iir | iir | ||||||||||||||||||||||||||||
m | m | 1000 | |||||||||||||||||||||||||||
finish | 1000 |
Table 5: State transition table for the red text pages
start | qo | d1 | o | l1 | ch | sh | y1 | k | t | e1 | e2 | e3 | d2 | y2 | a | o2 | l2 | r | s | im | in | iin | iiin | ir | iir | m | finish | ||
start | 276 | 152 | 259 | 9 | 1 | ε | 39 | 43 | 28 | 1 | ε | ε | ε | 6 | 33 | 154 | ε | ||||||||||||
qo | qo | ε | 16 | 837 | 147 | ||||||||||||||||||||||||
d1 | d | 7 | ε | ε | ε | 155 | 669 | 155 | 14 | ||||||||||||||||||||
o | o | 9 | 199 | 476 | 316 | ||||||||||||||||||||||||
l1 | l | 69 | 828 | 103 | |||||||||||||||||||||||||
ch | ch | 1000 | |||||||||||||||||||||||||||
sh | sh | ||||||||||||||||||||||||||||
y1 | y | 29 | ε | 571 | 400 | ||||||||||||||||||||||||
k | k | 166 | 375 | 55 | 2 | 52 | 309 | 40 | ε | ||||||||||||||||||||
t | t | 97 | 448 | 65 | 6 | 45 | 253 | 78 | 6 | ||||||||||||||||||||
e1 | e | 1000 | |||||||||||||||||||||||||||
e2 | e | 134 | 866 | ||||||||||||||||||||||||||
e3 | e | 1000 | |||||||||||||||||||||||||||
d2 | d | 974 | 26 | ||||||||||||||||||||||||||
y2 | y | 1000 | |||||||||||||||||||||||||||
a | a | 242 | 232 | ε | ε | 225 | 259 | 7 | 7 | ε | 27 | ||||||||||||||||||
o2 | o | 755 | 218 | ε | ε | 5 | 11 | 11 | ε | ε | ε | ||||||||||||||||||
l2 | l | 94 | 906 | ||||||||||||||||||||||||||
r | r | 55 | 945 | ||||||||||||||||||||||||||
s | s | ||||||||||||||||||||||||||||
im | im | ||||||||||||||||||||||||||||
in | in | 1000 | |||||||||||||||||||||||||||
iin | iin | 1000 | |||||||||||||||||||||||||||
iiin | iiin | 1000 | |||||||||||||||||||||||||||
ir | ir | 1000 | |||||||||||||||||||||||||||
iir | iir | ||||||||||||||||||||||||||||
m | m | 1000 | |||||||||||||||||||||||||||
finish | 1000 |
Table 6: State transition table for the blue biological pages
With the existing state transition tables, less than 40% of the words can be generated, but it is possible to extend them to handle most if not all of the text. |
© Copyright Donald Fisk 2017