The team suggests that a work by an unknown author could therefore be compared to prior works, with the curve acting as a linguistic "fingerprint".For an interesting comparison piece, see the post I wrote about the aging-related changes in the vocabulary of Agatha Christie.
"It doesn't matter if I pull out 10,000 words from a book of 100,000 or from a book of 200,000, I get the same behaviour; you always simply pull a piece out of your very, very big 'meta book', which is just a representation of your style," said Sebastian Bernhardsson, who led the work.
And on a tangentially-related matter, one year ago I tested the "readability level" of this blog, results of which suggested the readership would be quite well educated. That particular test is no longer available, so I tried a different one this morning and got the results below. The Gunning-Fox index of 14 is a "rough measure of how many years of schooling it would take someone to understand the content" of the blog. The test apparently just sampled the front page (last 25 posts) of TYWKIWDBI, so the number would change from time to time (and I rather suspect it also samples the sidebar, which would greatly skew the results downward).
Got a blog? Test your blog's readability here.