Spice Letter Counts

Audit report on the total number of Paleo letters in the Spice Bible.

Natural Letter Frequency

We use the Spice Bible (at Paleo.In) as a control for understanding a large block of Paleo text. This is NOT inspired text, but mostly text written by men. But, it is similar language structure to inspired text and gives a reasonable control for statistical analysis.

This page displays the results of an audit tool written in Javascript which counts the letters in that spice Bible text.

We can of course easily count the frequency of letters in the text. But perhaps more interesting is the spread of letters, or distance between the same letter in the running text.

Data gathered from the Spice Bible gives us a general pattern of what to expect when we look for similar text in the genome.

The most important letter, as is shown below, is the Dot. This is placed between every word in Paleo text.

As we attempt to find Paleo text in the genome we will be first looking for distributions of codons that have some codons behaving with the same pattern as the Dot.

Dots are like the ends of hammocks that words sit within. No hammocks, no words.

Vocabulary

The following tables use the word spread to indicate the distance between the same letter measured down the running text.

A spread of 1 means the letter is 1 unit away from the same letter previously. So there are NO other letters between.

The best spread in the following reports means the most common distance between these letters as counted within individual Bible books.

Letter Counts Alphabetically

This form useful for finding count for a specific letter.

Letter Frequency
Sorted by letter
LetterCountBest SpreadAvg Spread
437,522 5 5.49
254,169 3 9.45
96,400 4 24.90
16,735 2 142.53
113,070 3 21.23
90,608 5 26.48
203,511 4 11.80
12,534 2 189.84
40,227 6 59.61
17,830 6 133.66
166,833 5 14.39
60,111 6 39.89
141,232 4 17.00
129,300 6 18.57
158,982 3 15.10
25,430 7 94.14
58,857 5 40.78
28,579 5 83.79
7,638 6 309.57
33,148 6 72.16
102,974 7 23.30
49,231 5 48.75
98,565 7 24.34
57,180 7 42.01
1,191 1828 2031.94

Letter Counts Most Frequent

This form useful for seeing which letters are most heavily used.

Letter Frequency
Sorted by count
LetterCountBest SpreadAvg Spread
437,522 5 5.49
254,169 3 9.45
203,511 4 11.80
166,833 5 14.39
158,982 3 15.10
141,232 4 17.00
129,300 6 18.57
113,070 3 21.23
102,974 7 23.30
98,565 7 24.34
96,400 4 24.90
90,608 5 26.48
60,111 6 39.89
58,857 5 40.78
57,180 7 42.01
49,231 5 48.75
40,227 6 59.61
33,148 6 72.16
28,579 5 83.79
25,430 7 94.14
17,830 6 133.66
16,735 2 142.53
12,534 2 189.84
7,638 6 309.57
1,191 1828 2031.94

Discussion

The most common letter, by far, is the Dot. This is a punctuation, and it happens between words in running text.

A spread of 5 indicates that there are 4 letters between dots in the most common case. An average spread of 5.5 means on average there are 4.5 letters in Paleo vocabulary words as used in the Spice Bible.

The text passed to us by history has no single letter words, but the grammar would allow for this case. What are prefixes in the Spice Bible may really have been single letter words. If so, then text in the genome may have Dot distributions even lower than seen above.

Only about 1/5th of the text is thought to be inspired. The other 4/5ths is written by men later, and this may skew the results in ways we cannot yet understand.

So, for example, excessive use of the letter Wa might be skewing the results in these tables.

Notice that for other letters the most common spread is not near the average spread. This divergence is because of the structure of language itself. If these were similar the language would be without meaning. White noise if you will.