Spell Check Dictionary Improvements

Wednesday, February 11, 2009

If you're anything like us, you're spending more and more of your time working online. The spellchecker built into Chromium can be a big help in keeping your blog, email, documents, and forum postings spelled correctly and easy to read. Chromium integrates the popular open source library Hunspell with WebKit's built-in spellchecking infrastructure to check words and to provide suggestions in 27 different languages.

The Hunspell dictionary maintainers have done a great job creating high-quality dictionaries that anybody can use, but one of the problems with any dictionary is that there are inevitably omissions, especially as new words appear or proper nouns come into common use. We at Google are in a good position to use our knowledge of the internet to identify and fix some of these omissions. The Google translation team used their language models to generate a sorted list of the most popular words in each language. This was cross-checked with the Hunspell dictionaries to generate a list of the top 1000 words not present in each dictionary. This list includes many popular words, but also common misspellings. To remove these words, each list was reviewed by specialist in that language. Generally, we tried to keep proper nouns and even foreign words as long as they were in common usage.

We hope that by using the the existing GPL/LGPL/MPL tri-license for our addition, our work can be picked up by other users of Hunspell. We also hope to make more improvements in the future, both for additional languages like Turkish, and to refine the word lists we already have. If you're passionate about your language, you can help out by writing affix rules for the added words or reviewing more word lists.

The recent dev-channel release of Google Chrome (2.0.160.0) has the additional words we generated for 19 of the languages. Hopefully, you'll see fewer common words marked as misspelled. For example, the English dictionary now includes "antivirus," "anime," "screensaver," and "webcam," and commonly used names such as "BibTeX," "Mozilla," "Obama," and "Wikipedia." For our scientific users, we even have "gastroenterology," "oligonucleotide," and "Saccharomyces"! We'd like to give special thanks to the great help we got from the translation team who generated the words and the language search specialists who reviewed the lists.

28 comments:

Adam said...

Hooray for BibTex!

yukuku said...

I hope the Indonesian dictionary is edited professionally. It has a lot of false-alerts on conjugated words, although root words are mostly okay.

Stefan said...

as a horrible spell I lean heavily on browser spell checks. I love chrome but I often miss FF's spell check. I keep hopeing you will some how integrate "did you mean" in to the spell check. As "did you mean" is the best spell check in the world.

Christin said...

Hooray! It had been bothering me for months that Schwarzenegger was in the Chrome dictionary, but not Obama. Good call!

Olaf Lederer said...

I have to write email, posts and comments in three different languages. In FF is it possible to switch between the dictionaries (right mouse click...). Since this feature is not available in Chrome I use that browser only for gmail and analytics :(

VicMatson said...

I hope the spell check gets as good as the one in Google toolbar? Someday.

sidchat said...

@Olaf Lederer

The recent developer version of Chrome (2.0.162.0) now has the ability to change spell check dictionary language by right clicking on the text field.

You can switch your current Chrome build to the developer channel by following instructions in
http://dev.chromium.org/getting-involved/dev-channel

المعلم حمادة said...

I've lived in America without a cent to my name since 2000. I don't do conscious barter, either, because conscious barter is just money in bulkier form.
a website I just got finished:
http://ebdaa.yoo7.com

barnabasnagy said...

Hey, thanks for the post! I was googling this because I wrote on the same topic (http://is.gd/jXH5) and then I found you. Keep up the good work!

Dicollecte said...

Hi,
Some dictionaries seem to be very old ones.
For example, the French dictionary is from 2002. This one contains a lot of mistakes. And it is an old myspell affixes structure.
A lot of improvements have been made since 2002.

You should have a look here:
http://wiki.services.openoffice.org/wiki/Dictionaries

The French dictionaries:
http://dicollecte.free.fr/download.php?prj=fr

Brett Wilson said...

Dicollecte:

Thanks for the report, I filed this as a bug:
http://code.google.com/p/chromium/issues/detail?id=7966

Mark said...

can we use Google toolbar on Google chrome? Does spell check offer fix it pop ups to help spell the word correctly. When I am in a hurry, I don't have time to respell the word. Please help me.

Pavel said...

The biggest flaw of the Chrom's spellchecker is inability to change language on the fly. For all of us, who use more than one language on daily basis, the inability to switch language is make this feature almost worthless.

sidchat said...

@Pavel

The developer version of Chrome has the ability to change spell check language on the fly.

You can switch your current Chrome build to the developer channel by following instructions in
http://dev.chromium.org/getting-involved/dev-channel

Tom said...

Hunspell is available for the .NET Framework too. NHunspell is a .NET version of Hunspell build with managed C++. So you can use the Hunspell dictionaries in your own .NET applications if you like. NHunspell is a free (LGPL) licensed spell checker.

JustLocal said...

Hi,

I'm the creator and maintainer of the Australian English spellcheck dictionary files which can be found at www.dictionary.JustLocal.com.au.

It would be greatly appreciated if you could add Australian English as an option so I don't have to fudge the system anymore. Then I and others could just copy the required dictionary files into the appropriate folders.

Thanks in advance.

Kelvin Eldridge

sidchat said...

@JustLocal

Thank you very much for the information. A bug has been filed on this:

http://code.google.com/p/chromium/issues/detail?id=8934

Steve said...

Spell check in Hotmail does not work when using Google Chrome. The Spell checker in Chrome is turned on. Spell check works when using Gmail and Yahoo Mail but does not in Hotmail.

Fish & Chips said...

Hi Everyone..
Congratulations for the chrome..

The portuguese language (Brazil), has been changed last month, those new changes will be implemented ?

Best regards,
Rafael Peixe

John said...

Can we sugest words to add? How?

Nick Demou said...

Can google publish the list of words that google apps (gmail etc) mark as spelling-errors but users choose to ignore. If yes it would be great!

(I see how you help a lot when you can so I thought I could drop a thought)

Jason said...

Not sure if it's a KDE issue or a chromium issue, but when KDE highlights a word inside chromium (ie: "definilty" ) as misspelled, and I right click and choose the corrected word the program splits and leaves the old word and inserts the new one (ie "defini definitely tly").

thanks,
Jason

Jason said...

Oops, sorry, I forget to mention my specific configureation:
I'm using openSUSE 10.2 and chromium installed from the google yast repository.

I've seen this issue in both gMail and Zimbra.

Bay area shirts said...

The Infoplease spelling checker combines spelling help with our dictionary and thesaurus helps a lot.
Hard Drive Recovery

Vince said...

The spell checker S***s in chrome. It will say the word is spelled wrong and not give me and options to fix. so i have to go to Google really quick paste it in and the word comes up spelled correctly in there. What is up with that? makes entirely no sense to me.

James C. Smith said...

It's nice to hear the word list was improved but the real problem with Chrome's spell checker is the suggestions it has for correcting a misspelled word. 30 times per day I end up copy/pasting my misspellings into a Google search to get a useful suggestion for how to correct my mistake. Chrome accurately finds all my mistakes but is hardly ever helpful when it comes to correcting them.

mikeqw said...

I'm the creator and maintainer of the Australian English spellcheck dictionary files which can be found at auto insurance quotes, adipex without prescription, cheap auto insurance It would be greatly appreciated if you could add Australian English as an option so I don't have to fudge the system anymore. Then I and others could just copy the required dictionary files into the appropriate folders.

M. C. Battilana said...

You wrote "We hope that by using the the existing GPL/LGPL/MPL tri-license for our addition". What about releasing Google's own wordlists (the ones used to determine the top 1000 missing words) under the same license?

As it is now, Google's 1000 extra words are under the tri-license, but the base dictionaries are not always (some appear to be GPL3 only), making legal use under the other licenses (including GPL2) dubious.