The Seattle Times Company

NWjobs | NWautos | NWhomes | NWsource | Free Classifieds |

Business / Technology

Our network sites | Advanced

Originally published August 28, 2008 at 12:00 AM | Page modified August 28, 2008 at 1:23 AM

Comments (0)     Print

How Microsoft's spell-check gatekeepers select words to add

Microsoft's Natural Language Group is in an ongoing race to keep up with the evolution of the dozens of languages for which they produce...

Microsoft's Natural Language Group is in an ongoing race to keep up with the evolution of the dozens of languages for which they produce spell-checkers and other writing tools.

Here's how the group selects words to add:

The first step is finding possible candidates for inclusion in the spell-checker lexicon. When Mike Calcagno started at Microsoft in 1998, that was done ad hoc, with candidate words or changes sent to someone high enough on the corporate ladder to get attention.

"The number of issues that we would see at that time was so small that we could keep track of it on a single Excel spreadsheet," he said.

Now, the company uses software to monitor actual language usage across its vast properties.

"When you add a word to your custom dictionary, either in Word itself or in Hotmail, that word comes to us," Calcagno said. When a word is added hundreds of times, it becomes part of the candidate list. Words still come in on an ad hoc basis, too.

The lists are filtered with software to eliminate words the team has already considered.

Then the words are sorted by frequency and sent to outside editors who evaluate each one against a set of guidelines Microsoft has created, such as whether a new word has appeared in a major dictionary.

Rarely, editors can't decide whether a word should be added and it's sent back to the Natural Language Group for debate. The team of about 50 software engineers, computational linguists, machine learning experts and other specialists hail from around the world.

With occasional exceptions, the words to be added — often tens of thousands of new ones — are shipped out to users in the next release of Office, used by hundreds of millions of people around the world.

"Everybody's speller gets updated and few people notice," he said.

— Benjamin J. Romano

Copyright © 2008 The Seattle Times Company

More Business & Technology headlines...

Print      Share:    Digg     Newsvine

No comments have been posted to this article.


UPDATE - 09:46 AM
Exxon Mobil wins ruling in Alaska oil spill case

UPDATE - 09:32 AM
Bank stocks push indexes higher; oil prices dip

UPDATE - 08:04 AM
Ford CEO Mulally gets $56.5M in stock award

UPDATE - 07:54 AM
Underwater mortgages rise as home prices fall

NEW - 09:43 AM
Warner Bros. to offer movie rentals on Facebook