6

Is there a location I can download character distributions for frequency analysis used in decryption attempt validation? I am specifically interested in ASCII value [32, 126] frequency distribution for plain English language text. This would imply case-sensitivity and include punctuation. I'm not concerned with data formats.

Patrick Hoefler
  • 5,790
  • 4
  • 31
  • 47
recursion.ninja
  • 201
  • 1
  • 6

2 Answers2

6

Second try: Google Ngram Viewer contains raw counts of 1-, 2-, ...-grams of text, retrieved from its book scanning endeavor. The section 1-grams contains counts of the occurence of lettres, numbers and even punctuation. They are provided as tab-separated value files, so the frequencies should be derivable with modest scripting efforts.

Found via Wikipedia article Text corpus.

ojdo
  • 2,804
  • 14
  • 31
1

What about the (frequency analysis) article's first link to Letter frequency (Wikipedia)? It lists letter distribution for English and other languages, all properly sourced:

Letter   Relfreq
----------------
e        12.702%    
t         9.056%    
a         8.167%    
o         7.507%    
i         6.966%    
n         6.749%    
s         6.327%    
...
ojdo
  • 2,804
  • 14
  • 31
  • I was looking for all ASCII values [0,127] or at least printable ASCII values [32,126]. – recursion.ninja Sep 26 '13 at 13:52
  • @awashburn Why can't you manually convert them? – Kermit Sep 26 '13 at 18:31
  • 3
    @FreshPrinceOfSO these table only deal with a case-insensitive alphabet, while the question asks for ASCII frequencies, so at least lower-/uppercase letters, numbers and punctuation should be included. This is what I tried to address in my second answer. – ojdo Sep 26 '13 at 19:51