A Seattle connection to the massive Google Books project
Peter Leonard, a doctoral student in Scandinavian studies at the University of Washington, and UCLA professor Tim Tangherlini have received $45,000 from Google to create tools for large-scale literary analysis through Google Books, an ongoing project by Google to scan most of the books on the world's library shelves and place them in an accessible online archive.
Seattle Times book editor
Lit life |
Like many people who grew up in the pre-Internet world, I regard Google with a combination of fear and awe. How does it do what it does? What happens to the information it collects? And what is the point, really, of that I'M FEELING LUCKY button?
So I was heartened to hear that Google is attending to something I can understand and appreciate — advancing the cause of literary research.
Peter Leonard is a doctoral student in Scandinavian studies at the University of Washington. He's bookish but is equally at home in the computer world (he has been the Webmaster at the UW's Simpson Center for the Humanities). He and a partner, UCLA professor Tim Tangherlini, have just received $45,000 from Google to create tools for large-scale literary analysis through Google Books, part of nearly $1 million Google has committed to support digital humanities research over the next two years.
Their subject will be 160,000 Swedish, Danish and Norwegian texts that are part of the 12-million-volume Google Books collection, an assemblage the blog Tech.Blorge called "a grand world library, a Library of Alexandria on Steroids."
I talked to Leonard last week about how he plans to use the immense online library known as Google Books.
A word about Google Books: It's an ongoing project by Google to scan most of the books on the world's library shelves and place them in an accessible online archive. As Leonard explains, "Google has partnered with many of the largest research libraries in the world," — Oxford, Stanford, Columbia University, Harvard, the New York Public Library, among others — to digitally scan the books in their collection (generally out-of-print books and those without copyright restrictions).
The project is controversial — there are ongoing battles over copyright protection and author rights. There are questions — "There's an enormous amount of secrecy around how Google scans books," Leonard said. "Even the libraries are not allowed to see the scanners; it is clear that it's nondestructive scanning."
But there's no denying that it's an amazing research tool. You can go to Google, hit "more" and then click "books" in the drag-down menu. You can type in the most obscure word imaginable and get hundreds of references: I used dvergr, a precursor to the word "dwarf," and 4,450 citations from books such as "An Analytic Dictionary of English Etymology" and "Elves in Anglo-Saxon England," popped up, all with the word "dvergr" conveniently highlighted in yellow. (If you love words, this is a bigger time-sucker than Facebook!)
Leonard, 35, a Berkeley, Calif., native, took to Scandinavian languages when he was an undergraduate at the University of Chicago. Up to now, he has done his research on recent Swedish fiction the old-fashioned way; studying a few books intensively. Now he and his partner propose to move from microanalysis to macroanalyis, sifting through thousands of books for clues to human culture and development, looking for clues in the texts to how people of a certain time and place thought and lived.
"We might ask: What kinds of adjectives were used near female characters in 19th-century novels?" says Leonard. "What words were used to describe nature? You might be able to find interesting things about how people talked about the city, or the country. You can do this only if you have computers that can count the words and do mathematical calculations." Once the relevant books are identified, they can be read intensively for more clues.
A test project for Leonard and Tangherlini: analyzing books to show how folklore spread through 19th-century Scandinavian literature, a subject in which Tangherlini already has expertise. While Scandinavian language books are a small fraction of the Google Books corpus, the two hope to develop strategies that will be "germane and applicable to someone who's studying Italian literature, or the American literature of the South.
"It's interesting what an equalizer technology is," says Leonard; that two experts in languages spoken by a fraction of the world's people can suddenly access most of the world's books in those languages. For free. Who knows where the scary magic of search engines will lead next?
Mary Ann Gwinn: 206-464-2357 or email@example.com. Mary Ann Gwinn appears on Classical KING-FM's Arts Channel.