As far as I know, no one has discerned any sentence structure within the Voynich manuscript. I wanted to check whether the words in it were just randomly distributed, or there were any associations between them. This time, I applied principal component analysis to the words, using a method very similar to word2vec.
Words in the manuscript were represented as
bags of neighbouring
words (five on either side, though the exact number isn't important). These
bags were converted to vectors, with the same words in each bag determining
the indices, and their multiplicities determining the index values.
For example, the bag
Figure 1: words occurring 50-200 times
Figure 2: words occurring 20-50 times
I noticed that words ending in
Further analysis has shown that other words are common in Currier A pages and rare in Currier B pages, or vice-versa, so I examined their position on the word plots. This is best shown on the plot below, which uses two words on either side of each word for its bag.
Figure 3: words occurring 100 times or more
The words in red occur more frequently in Currier B pages, and those in blue in Currier A pages.
These can be used to define a function of pages:
(count(Currier B words) - count(Currier A words)) / count(words)
which can be applied to the page PCA plot.
Figure 4: PCA plot of pages. Blue = Currier A, red = Currier B.
This shows that there is no clear dividing line between Currier A and B. The transition from A to B is gradual.
The position of a page on the PCA plot of pages is determined by the frequency of certain words within the page, but not apparently by their order.
© Copyright Donald Fisk 2017