Public unveiling of Google’s Ngram Viewer
Take 15 million books and 4 billion words, use a simple interface to search for a few words or phrases, and presto – you have Google’s Ngram Viewer.
Ngram creates charts that show how often words or phrases have occurred in books since 1800. It can be used to trace to rise and decline of certain words, giving clues to researchers. Or it can be used for simple fun – (Red Sox, Yankees).
Jon Orwant, leader of the Google’s Digital Humanities effort and one of three co-creators of the Ngram, spoke to about 75 people at a meeting of the Boston chapter of Hacks/Hackers at Google in Cambridge on Feb. 9.
Orwant, a former publisher, described the Google Books project and how Ngram can be use. The Book project now has scanned in about 15 million books, or more than 10 percent of the estimated 129 million books “printed since Gutenberg,” he said.
(To try out: http://ngrams.googlelabs.com/)
Users can see how often words or phrases (up to 5 words) have appeared in print since 1800.
While it can be a lot of fun, it’s also a scholarly tool. But Orwant, who grew up in Fitchburg where his father was a reporter for the local paper, warned that he sees it more as a tool for helping discover which questions need to be asked, and “not an oracle.”
For instance, try: nursery school, kindergarten, and child care. “Kindergarten” is in heavy use from the 1860s on, peaking in the 1920s and 1930s, with a gradual decline. “Nursery school” is big in the 1940s, but has been on a long decline, too.
However, use of the term “child care” has exploded since the 1970s, far eclipsing use of the other terms. For a researcher interested in education or child rearing, the chart raises interesting questions.
The other two developers of Ngram are William Brockman (also in attendance), and Matthew Gray.
Orwant is an engineering manager at Google, where he works on Book Search, Patent Search, visualizations, and the digital humanities. He’s the author or co-author of several books on programming, including the bestselling Programming Perl, and once published an independent computer magazine. Before joining Google he was the CTO of O’Reilly & Associates and Director of Research for France Telecom. He received his doctorate from MIT’s Electronic Publishing Group in 1999.