Jobs with real authority: working on Microsoft's spell-checker
In the last two years, few names have become as recognizable as Barack Obama's, a rise that continues tonight as he accepts the Democratic...
Seattle Times technology reporter
In the last two years, few names have become as recognizable as Barack Obama's, a rise that continues tonight as he accepts the Democratic Party's presidential nomination in Denver.
But until spring 2007, "Obama" was unknown to Microsoft's spell-checker. The suggested correction was "Osama," a name that differs by a single letter, but carries lots of baggage.
Even though Osama is a common name around the world, it is inextricably linked to terrorism and Osama bin Laden. And the association in the spell-checker was fodder for the ongoing, false rumors surrounding the candidate's religion. (He is Christian.)
This example highlights the challenge Mike Calcagno and his team face in keeping up with the evolution of language.
The Microsoft Natural Language Group's aim is to build tools that help people improve their writing and avoid embarrassing mistakes. The job is increasingly complicated as more writing is done electronically and people blindly trust the judgment of the spell-checker.
"The speller is looked at sometimes as an arbiter of language," he said. "Once a word is in the Microsoft spell-checker, there's a notion out there that that word is now official and the word is now important. We don't take that stance internally, but that attitude exists in the world, and we have to take it into account."
Spell-checkers use logic to suggest corrections. In the case of Obama-Osama, the "edit distance" that separates the words is only one letter. Without the political context, Osama would be a reasonable suggestion for Obama.
"There's no amount of logic that we would ever build into the speller that would suggest that we wouldn't do that," Calcagno said.
Microsoft added "Obama" to the spell checkers in both Office 2003 and 2007 in spring of 2007, and any word can be added to an individual's custom dictionary. But Obama-Osama continues to persist as people encounter it on computers that have not been updated.
The Cupertino effect
Spell-checkers have come a long way, correcting countless misspellings. But the number of errors they've introduced is also substantial, and well-documented.
The phenomenon is known as the Cupertino effect. Ben Zimmer, executive producer of the Visual Thesaurus, has written extensively about it on the University of Pennsylvania's online Language Log.
Writers and translators working for the European Union came up with the name after they discovered that older spell checkers did not recognize the correctly spelled word "cooperation," without a hyphen, Zimmer said.
The suggested correction, which made it into several documents that can still be found online, was Cupertino, the town in California. That particular problem has long since been corrected, but the phenomenon persists.
"Most of the time the Cupertino effect happens because of proper names, which are really difficult for a dictionary to handle," Zimmer said. "[Microsoft] could have made the decision early on that we're just not going to include any proper names and that might have spared them a lot of grief."
The squiggly's power
Calcagno has other reasons why the spell-checker should not be looked to as the authority on language. For one thing, correctly spelled words are often left out on purpose.
Take "calender," for example. From the Merriam-Webster Online Dictionary, calender, with an -er, means "to press (as cloth, rubber, or paper) between rollers or plates in order to smooth and glaze or to thin into sheets."
"You can find it in any dictionary," Calcagno said.
But, the team asked itself, should "calender" be flagged, or squiggled — have the red squiggly underline that indicates a misspelling? Yes, because letting it go through as correct "more often masks the really common spelling error that people make for calendar."
"We basically ask that question across dozens of languages on a massive scale," Calcagno said. "There are thousands and thousands of words that aren't yet in our speller, which are infrequently used."
So what impact has the spell-checker had on the evolution of language?
Jerrold Zar, a biologist and statistician at Northern Illinois University, wrote a poem in 1992, "Candidate for a Pullet Surprise," to highlight how easily homonyms slip through the spell-checker's net. He sees the proliferation of word-processing, e-mailing and spell-checking accelerating changes in language, and not always for the better.
"Electronic writing has a tendency to value speed over accuracy, consistency, and clarity," he wrote in an e-mail.
Zimmer would agree.
"We just have to be vigilant ... and continue to cast the human eye over it and make sure we're not being too trusting of spell-checkers," he said.
Benjamin J. Romano: 206-464-2149 or firstname.lastname@example.org
Information in this article, originally published Aug. 28, 2008, was corrected Aug. 28, 2008. A previous version of this story misspelled the name of dictionary publisher Merriam-Webster.
Copyright © 2008 The Seattle Times Company
UPDATE - 09:46 AM
Exxon Mobil wins ruling in Alaska oil spill case
UPDATE - 09:32 AM
Bank stocks push indexes higher; oil prices dip
UPDATE - 08:04 AM
Ford CEO Mulally gets $56.5M in stock award
UPDATE - 07:54 AM
Underwater mortgages rise as home prices fall
NEW - 09:43 AM
Warner Bros. to offer movie rentals on Facebook