08 January 2013

ETAOIN SHRDLU is now ETAOIN SRHLDCU

Many are familiar with ETAOIN SHRDLU, the nonsense string that used to appear in print because of early-20thC printer design and now serves as shorthand for the most popular letters.

Now Google’s director of research Peter Norvig has used the vast data from the Google Books corpus – over 743 billion words – to produce updated word- and letter-frequency tables.
As a Scrabble player, I find it interesting that the letter "H" is relatively overvalued, and the "B" undervalued.

Image and text from Sentence First, citing Norvig's work, which includes a wealth of data deserving of a separate post (later)(sigh).

2 comments:

  1. From this observation, it's a good place to start if you want to Huffman encode text.

    ReplyDelete
  2. I'd be curious if they've done analyses on changes in frequency over time (of publication)

    ReplyDelete

Related Posts Plugin for WordPress, Blogger...