Google Ngram Games

Google have just opened up their text mining project, a vast and ambitious project to allow searching their digital library for the frequency of words and phrases. It’s an astonishing resource, not only for its research potential but also for its ludic possibilities, not to mention the time-frittering capabilities.

It’s easy to play. Just go to http://ngrams.googlelabs.com/, put in some words or phrases, separating multiples with commas, adjust the settings as you wish, and press the button. Up comes a graph showing distribution across your chosen time period. Thrill to the peaks! Gasp at the troughs! Wonder at the abundances! Curse the absences!

Like all good games, there’s much wisdom to be gained from playing. For one thing, it tests the sources, both the original works and their translation into digital format. (Some of Google’s metadata is bizarrely inaccurate.) It makes one think about possible reasons: whatever the results of a query are, they never explain why. It questions the technologies of language, whether printed or digital, including orthography, typeface and grammar. (Note that it only covers  written language, not that spoken or sung, with rhythm, accents and inflections.) In all, it indicates the ambiguity and instability of language, as against any needs or claims of clarity and transparency of meaning.

Below are four jests: testing a grandiose claim, a revealing anachronism, typographical obscenity and the deleterious effects of popular culture on the Queen’s English.

Benjamin was right!

By searching on four major Western cities, we can see that Walter Benjamin was right to consider Paris ‘the capital of the nineteenth century‘ [pdf].

Google Ngram for Paris, London, Berlin, New York, 1800-1900

Google Ngram for Paris, London, Berlin, New York, 1800-1900

Link

And never mind population, trade, dominions and suchlike. But note that had I included ‘Rome’, Benjamin would have been refuted by the number of works on ancient history.

Surrealism: A Victorian Creation?

Although the word ‘surrealism’ was originally coined by Apollinaire in 1917, and given substance in the 1920s by Andre Breton, Google finds it in mid-Victorian English:

Google Ngram for 'surrealism.'

Google Ngram for 'surrealism.'

Link

There’s an interesting error behind this. In parsing The London Review, volume 15, 1860-1, the last hypenated word of one page, and the first of the next, the chapter heading rather than the continuation of the text proper, have been run together by Google’s OCR software. Such bugs, the ‘revealing errors‘  of logic, can be considered a manifestation of the surrealist spirit.

FFS!

Early modern printing is notorious for using ‘f’ in place of ‘s.’ Oh the comedic potential, as is born out by this graph:

Google Ngram for 'fuck.'

Google Ngram for 'fuck.'

Link

The alternative explanation is, of course, that we were a foulmouthed bunch until the Victorians, and only since the 1950s have we begun to throw off those moral shackles.

Star Trek and the split infinitive

That most controversial of grammatical issues had a historical turn in the 1980s, when “to boldly go” overtook “to go boldly.” A way of interrogating the corpus for the frequency of split infinitives isn’t obvious, but the results would be very interesting.

Google Ngram: 'to go boldly' and 'to boldly go.'

Google Ngram: 'to go boldly' and 'to boldly go.'

Link. Wikipedia on Split Infinitives

Google have provided some basic, but literate, documentation. And the datasets are freely available under a creative commons license. A Guardian article serves as a decent introduction, although it exaggerates the originality of the techniques. See also the New York Times. And don’t ever, ever, use the pseudo-word ‘culturomics.’

This entry was posted in digital humanities and tagged , , , , , , , , , . Bookmark the permalink.

One Response to Google Ngram Games

  1. Pingback: Playing with Google’s Ngram Viewer « Colonial Psychiatry Hub

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.