The Voynich manuscript is an illustrated codex, hand-written in an unknown script referred to as Voynichese. The vellum on which it is written has been carbon-dated to the early 15th century (1404–1438). Stylistic analysis has indicated the manuscript may have been composed in Italy during the Italian Renaissance. The origins, authorship, and purpose of the manuscript are still debated, but currently scholars lack the translation(s) and context needed to either properly entertain or eliminate any of the possibilities. Hypotheses range from a script for a natural language or constructed language, an unreadable code, cipher, or other form of cryptography, or perhaps a hoax, reference work (i.e. folkloric index or compendium), glossolalia, or work of fiction (e.g. science fantasy or mythopoeia, metafiction, and speculative fiction).
The first confirmed owner was Georg Baresch, a 17th-century alchemist from Prague. The manuscript is named after Wilfrid Voynich, a Polish book dealer who purchased it in 1912. The manuscript consists of around 240 pages, but there is evidence that some of the pages are missing. The text is written from left to right, and some pages are foldable sheets of varying sizes. Most of the pages have fantastical illustrations and diagrams, some crudely coloured, with sections of the manuscript showing people, unidentified plants and astrological symbols. Since 1969, it has been held in Yale University's Beinecke Rare Book and Manuscript Library. In 2020, Yale University published the entire manuscript online in its digital library.
The Voynich manuscript has been studied by both professional and amateur cryptographers, including American and British codebreakers from both World War I and World War II. Codebreakers Prescott Currier, William Friedman, Elizebeth Friedman, and John Tiltman were unsuccessful. The manuscript has never been demonstrably deciphered, and none of the proposed hypotheses have been independently verified. The mystery of its meaning and origin has excited speculation and provoked study.
Played with your browser's voice. Studio-quality audio can be added with a text-to-speech service.
Ask about this article
📝 Quick Quiz1 / 4
What is "Voynich manuscript" primarily known for?
Vocix Daily — In Your Inbox
Top stories, deep-dive articles, and "On This Day" history — one crisp digest delivered every morning.
Sources & references
Reference material for this entry is drawn from the open encyclopedic record, including Wikipedia , available under the CC BY-SA 4.0 license. Images are credited individually beside each photo.
Description
Codicology
The codicology, or physical characteristics of the manuscript, have been studied by many researchers. The manuscript measures 23.5 by 16.2 by 5 cm (9.3 by 6.4 by 2.0 in), with hundreds of vellum pages collected into 18 quires. The total number of pages is around 240, but the exact number depends on how the manuscript's unusual foldouts are counted. The quires have been numbered from 1 to 20 in various locations, using a style of numerals consistent with those used in the 15th century, and the top right corner of each recto (righthand) page has been numbered from 1 to 116, using a style of numerals that originated at a later date. From the various numbering gaps in the quires and pages, it seems likely that in the past, the manuscript had at least 272 pages in 20 quires, some of which were already missing when Wilfrid Voynich acquired the manuscript in 1912. There is strong evidence that many of the book's bifolios were reordered at various points in the book's history, and that its pages were originally in a different order than the order they are in today.
Parchment, covers, and binding
Samples from various parts of the manuscript were radiocarbon dated at the University of Arizona in 2009. The results were consistent for all samples tested and indicated a date for the parchment between 1404 and 1438. Protein testing in 2014 revealed that the parchment was made from calfskin, and multispectral analysis showed that it had not been written on before the manuscript was created (i.e., it is not a palimpsest). The quality of the parchment is average and has deficiencies, such as holes and tears, common in parchment codices, but was also prepared with so much care that the skin side is largely indistinguishable from the flesh side. The parchment is made from "at least fourteen or fifteen entire calfskins".
Some folios (such as 42 and 47) are thicker than the usual parchment. The goatskin binding and covers are not original to the book, but date to its possession by the Collegio Romano. Insect holes are present on the first and last folios of the manuscript in the current order and suggest that a wooden cover was present before the later covers. Discolouring on the edges points to a tanned leather inside cover.
Medievalist Lisa Fagin Davis describes the parchment as soft—a texture found in books that have been "heavily thumbed". This indicates the manuscript was handled or paged through a great deal and likely served some routine function such as a medical manual or celestial almanac. Its heavy usage also suggests it had a workmanlike role rather than anything sacred or ceremonial. The holes in the parchment from scabs, wounds, or insect bites, and the lack of any luxurious touches such as gold leaf, support this interpretation.
Ink
Many pages contain substantial drawings or charts which are coloured with paint. Based on modern analysis using polarized light microscopy (PLM), it has been determined that a quill pen and iron gall ink were used for the text and figure outlines. The ink of the drawings, text, and page and quire numbers have similar microscopic characteristics. In 2009, energy-dispersive X-ray spectroscopy (EDS) revealed that the inks contained major amounts of carbon, iron, sulphur, potassium and calcium with trace amounts of copper and occasionally zinc. EDS did not show the presence of lead, while X-ray diffraction (XRD) identified potassium lead oxide, potassium hydrogen sulphate, and syngenite in one of the samples tested. The similarity between the drawing inks and text inks suggested a contemporaneous origin.
Paint
Coloured paint was applied (somewhat crudely) to the ink-outlined figures, possibly at a later date. The blue, white, red-brown, and green paints of the manuscript have been analysed using PLM, XRD, EDS, and scanning electron microscopy (SEM).
The blue paint proved to be ground azurite with minor traces of the copper oxide cuprite.
The white paint is likely a mixture of egg-white and calcium carbonate.
The green paint is tentatively characterised by copper and copper-chlorine resinate; the crystalline material might be atacamite or some other copper-chlorine compound.
Analysis of the red-brown paint indicated a red ochre with the crystal phases hematite and iron sulphide. Minor amounts of lead sulphide and palmierite are possibly present in the red-brown paint.
The pigments used were deemed inexpensive.
Retouching
Computer scientist Jorge Stolfi of the University of Campinas highlighted that parts of the text and drawings have been modified, using darker ink over a fainter, earlier script. Evidence for this is visible in various folios, for example f1r, f3v, f26v, f57v, f67r2, f71r, f72v1, f72v3 and f73r.
The manuscript has roughly 38,000 words, of which 9,000 are unique.
Every page in the manuscript contains text, mostly in an unidentified language, but some have extraneous writing in Latin script. The bulk of the text in the 240-page manuscript is written in an unknown script, running left to right. Most of the characters are composed of one or two simple pen strokes. There exists some dispute as to whether certain characters are distinct, but a script of 20–25 characters would account for virtually all of the text; the exceptions are a few dozen rarer characters that occur only once or twice each. There is no obvious punctuation.
Much of the text is written in a single column in the body of a page, with a slightly ragged right margin and paragraph divisions and sometimes with stars in the left margin. Other text occurs in charts or as labels associated with illustrations. The ductus flows smoothly, giving the impression that the symbols were not enciphered; there is no delay between characters, as would normally be expected in written encoded text.
Only a few of the words in the manuscript are thought to have not been written in the unknown script:
f1r: A sequence of Latin letters in the right margin parallel with characters from the unknown script; also the now-unreadable signature of "Jacobj à Tepenecz" is found in the bottom margin.
f17r: A line of writing in Latin script in the top margin.
f66r: A small number of words in the bottom left corner near a drawing of a nude man have been read as der Mussteil, a High German phrase for 'a widow's share'.
f70v–f73v: The astrological series of diagrams in the astronomical section has the names of ten of the months (from March to December) written in Latin script, with spelling suggestive of the medieval languages of France, northwest Italy, or the Iberian Peninsula.
f116v: Four lines written in rather distorted Latin script, referred to as Michitonese, except for two words in the unknown script. The words in Latin script appear to be distorted with characteristics of the unknown language. The lettering resembles European alphabets of the late 14th and 15th centuries, but the words do not seem to make sense in any language. Whether these bits of Latin script were part of the original text or were added later is not known.
Transcription
Various transcription alphabets have been created to encode Voynich characters as Latin characters, to help with cryptanalysis, such as the Extensible (originally: European) Voynich Alphabet (EVA). The first major one was created by the "First Study Group", led by cryptographer William F. Friedman in the 1940s, where each line of the manuscript was transcribed to an IBM punch card to make it machine readable.
Statistical patterns
The text consists of over 170,000 characters, with spaces dividing the text into about 35,000 groups of varying length, usually referred to as words or word tokens (37,919); 8,114 of those words are considered unique word types. The structure of these words seems to follow phonological or orthographic laws of some sort; for example, certain characters must appear in each word (like English vowels), some characters never follow others, or some may be doubled or tripled, but others may not. The distribution of letters within words is also rather peculiar: Some characters occur only at the beginning of a word, some only at the end (like Greek ς), and some always in the middle section.
There are a number of patterns of word length, frequency, etc. found in the manuscript that are consistent with natural language (i.e. languages that occur organically in a human community). The word found most frequently in the manuscript appears roughly twice as often as the second-most-common word, and three times as often as the third-most-common, (following Zipf's law). The mix of word lengths and the ratio of unique words to total words are similar to languages found around the world. Certain words seem to follow one another in predictable order, as if following rules of grammar.
Just as one would expect in a normal book whose chapters focused on different subjects, the different sections of the manuscript (based on the drawings of plants, stars, bathing women, etc., accompanying them) have different sets of overrepresented words. The language of the manuscript differs from other known human languages in spelling; with "too many ... letters repeated in the same order, both within words and across neighboring words".
Many researchers have commented upon the highly regular structure of the words. Professor Gonzalo Rubio, an expert in ancient languages at Pennsylvania State University, stated:
The things we know as grammatical markers – things that occur commonly at the beginning or end of words, such as 's' or 'd' in our language, and that are used to express grammar, never appear in the middle of 'words' in the Voynich manuscript. That's unheard of for any Indo-European, Hungarian, or Finnish language.
Stephan Vonfelt studied statistical properties of the distribution of letters and their correlations (properties which can be vaguely characterised as rhythmic resonance, alliteration, or assonance) and found that under that respect Voynichese is more similar to the Mandarin Chinese pinyin text of the Records of the Grand Historian than to the text of works from European languages, although the numerical differences between Voynichese and pinyin look larger than those between pinyin and European languages.
Practically no words have fewer than two letters or more than ten. Some words occur in only certain sections, or in only a few pages; others occur throughout the manuscript. Few repetitions occur among the thousand or so labels attached to the illustrations. There are instances where the same common word appears up to five times in a row (see Zipf's law). Words that differ by only one letter also repeat with unusual frequency, causing single-substitution alphabet decipherings to yield babble-like text. In 1962, cryptanalyst Elizebeth Friedman described such statistical analyses as "doomed to utter frustration".
In 2014, a team led by Diego Amancio of the University of São Paulo published a study using statistical methods to analyse the relationships of the words in the text. Instead of trying to find the meaning, Amancio's team looked for connections and clusters of words. By measuring the frequency and intermittence of words, Amancio claimed to identify the text's keywords and produced three-dimensional models of the text's structure and word frequencies. The team concluded that, in 90% of cases, the Voynich systems are similar to those of other known books, indicating that the text is in an actual language, not random gibberish.
The use of the framework was exemplified with the analysis of the Voynich manuscript, with the final conclusion that it differs from a random sequence of words, being compatible with natural languages. Even though our approach is not aimed at deciphering Voynich, it was capable of providing keywords that could be helpful for decipherers in the future.
Linguists Claire Bowern and Luke Lindemann have applied statistical methods to the Voynich manuscript, comparing it to other languages and encodings of languages, and have found both similarities and differences in statistical properties. Character sequences in languages are measured using a metric called h2, or second-order conditional entropy. Natural languages tend to have an h2 between 3 and 4, but Voynichese has much more predictable character sequences, and an h2 around 2. At higher levels of organisation, the Voynich manuscript displays properties similar to those of natural languages. Based on this, Bowern dismisses theories that the manuscript is gibberish, but rather is likely to be an encoded natural language or a constructed language. Bowern also concludes that the statistical properties of the Voynich manuscript are not consistent with the use of a substitution cipher or polyalphabetic cipher.
As noted in Bowern's review, the writer or writers of the manuscript may have used two methods of encoding at least one natural language. The "language" Voynich A appears in the herbal and pharmaceutical parts of the manuscript. The "language" known as Voynich B appears in the balneological section, some parts of the medicinal and herbal sections, and the astrological section. The most common vocabulary items of Voynich A and Voynich B are substantially different. Topic modeling of the manuscript suggests that pages identified as written by a particular scribe may relate to a different topic.
In terms of morphology, if visual spaces in the manuscript are assumed to indicate word breaks, there are consistent patterns that suggest a three-part word structure of prefix, root or midfix, and suffix. Certain characters and character combinations are more likely to appear in particular fields. There are minor variations between Voynich A and Voynich B. The predictability of certain letters in a relatively small number of combinations in certain parts of words appears to explain the low entropy (h2) of Voynichese. In the absence of obvious punctuation, some variants of the same word appear to be specific to typographical positions, such as the beginning of a paragraph, line, or sentence.
The Voynich word frequencies of both variants appear to conform to a Zipfian distribution, supporting the idea that the text has linguistic meaning. This has implications for the encoding methods most likely to have been used, since some forms of encoding interfere with the Zipfian distribution. Measures of the proportional frequency of the ten most common words is similar to those of the Semitic, Iranian, and Germanic languages. Another measure of morphological complexity, lexical diversity, is similar to Iranian, Germanic, and Romance languages.
Handwriting
According to journalist Ariel Sabar, the handwriting of the manuscript has "the easy flow of a long-established script". Sabar and scholar Lisa Fagin Davis note that using the handwriting style, letterforms, abbreviations, and punctuation of the manuscript to determine its place or date is not feasible because there is "nothing in history to compare it to". However, Davis looked for and found "small, stylistic tells" in the manuscript's writing, indicating it was highly likely that more than one scribe had worked on writing the manuscript. While the handwriting is consistent, Davis found that in tracking particular letters, she found slight variations in style—"larger or smaller loops, straighter or curvier crossbars, longer or shorter feet"—and that the variations occurred between sections of the book but not within those sections, suggesting that the differences came from "different scribes rather than of one scribe writing the same letter in different ways". Davis believes the manuscript was the work of five different scribes, suggesting that it was likely the product of a community, rather than the work of an unwell mind, hoax or other concoction by a single person. Bowern's review also notes that multiple scribes may have written the manuscript.
Illustrations
Because the text cannot be read, the manuscript is conventionally divided into sections based on its illustrations. Most of the manuscript forms six different sections, each typified by illustrations with different styles and supposed subject matter except for the last section, in which the only drawings are small stars in the margin. The conventional sections are:
Herbal, 126 pages: Each page displays one or two plants and a few paragraphs of text, a format typical of European herbals of the time. Some parts of these drawings are larger and cleaner copies of sketches seen in the "pharmaceutical" section. None of the plants depicted are unambiguously identifiable.
Astronomical, 17 pages: Contains circular diagrams suggestive of astronomy or astrology, some of them with suns, moons, and stars. One series of 12 diagrams depicts conventional symbols for the zodiacal constellations (two fish for Pisces, a bull for Taurus, a hunter with crossbow for Sagittarius, etc.). Each of these has 30 female figures arranged in two or more concentric bands. Most of the females are at least partly nude, and each holds what appears to be a labelled star or is shown with the star attached to either arm by what could be a tether or cord of some kind. The last two pages of this section were lost (Aquarius and Capricornus, roughly February and January), while Aries and Taurus are split into four paired diagrams with 15 women and 15 stars each. Some of these diagrams are on fold-out pages.
Balneological, 20 pages: A dense, continuous text interspersed with drawings, mostly showing small nude women, some wearing crowns, bathing in pools or tubs connected by an elaborate network of pipes. The bifolio consists of folios 78 (verso) and 81 (recto); it forms an integrated design, with water flowing from one folio to the other.
Cosmological, 14 pages: More circular diagrams, but they are of an obscure nature. This section also has foldouts; one of them spans six pages, commonly called the Rosettes folio, and contains a map or diagram with nine "islands" or "rosettes" connected by "causeways" and containing castles, as well as what might be a volcano.
Pharmaceutical, 16 pages: Many labelled drawings of isolated plant parts (roots, leaves, etc.), objects resembling apothecary jars, ranging in style from the mundane to the fantastical, and a few text paragraphs.
Recipes, 25 pages: Full pages of text broken into many short paragraphs, each marked with a star in the left margin.
Eight pages contain only text, and therefore cannot be classified in any of those categories. At least 14 folios (28 pages) are missing from the manuscript.
Purpose
The overall impression given by the surviving leaves of the manuscript is that it was meant to serve as a pharmacopoeia or to address topics in medieval or early modernmedicine; however, the puzzling details of the illustrations have fuelled many theories about the book's origin, the contents of its text, and the purpose for which it was intended.