Thursday, April 21, 2011

Contextual Spell Checker Evaluation

Conventional spell checkers fix spelling mistakes (i.e. non-word errors) by essentially performing a dictionary look-up of a word, and fix grammar errors by applying a set of grammatical rules. Contextual spell checkers are a new class of software that go beyond this; they use context (i.e. a number of words surrounding the word, most typically a sentence) to identify and correct misused words, also known as real word errors.
While researching the contextual spell check feature first introduced into the 2007 edition of Microsoft Word, I discovered an evaluation of four contextual spell checkers including Word 2007 (on Windows), MS Word 2008 (on MacOS X), MacOS X Spell and Grammar Checker, and After the Deadline.
The evaluation was conducted by Raphael Mudge, the developer of After the Deadline, an open-source grammar, style, and misused word checker that can be added to various applications as an extension e.g. Firefox web browser, word processor. You can read his 2010/4/9 blog posting about this at Measuring the Real Word Error Corrector. Mudge has made available the data set and programs to perform this evaluation on other such tools. The data consists of 673 sentences containing 834 errors (of which Mudge determined 97.8% are real word errors) that were collected from writers with dyslexia by Dr. Jennifer Pedler for her PhD thesis. Pedler annotated the errors along with the expected corrections and Mudge wrote a program to compare a corrected version of Dr. Pedler’s error corpus to the original corpus with errors. The program outputs 2 results as a percentage:
  • recall: how many errors were found and changed to something
  • precision: how often these changes were correct
Note: Mudge's program does not measure the number of words outside the annotated errors that were changed correctly or incorrectly. This is unfortunate because I noticed many false positives when using Ghotit i.e. correct words that were identified as errors.
So I used Pedler's data and Mudge's script to evaluate Word 2007 (to attempt to replicate Mudge's results), Word 2010 (to compare it to Word 2007), and two relatively new online contextual spell checkers: Ginger Software and Ghotit. When doing so, I accepted the first suggestion offered by each checker. For Ginger, I ran the test twice: I first approved the corrected sentence exactly as offered and then on a fresh copy of the error corpus, I explicitly selected the first suggestion for those corrections annotated with the ? icon as these usually remain unchanged. So selecting the first suggestion would show that Ginger did identify a potential error (improving recall) and may correct the error (potentially improving precision).
Spell CheckerRecall % (identifies word)Precision % (corrects identified word)
Ginger Software v1.16.1 (select first suggestion for corrections labelled "?")56.688.8
Ginger Software v1.16.1 (approve default sentence)50.989.6
MS Word 200740.690.3
MS Word 201038.689.4
After the Deadline2888
My results show that Ghotit scores best at identifying errors with 72.2%. This is well ahead of the second highest recall value of 56.6% achieved by Ginger, when explicitly selecting the first suggestion; approving the corrected sentence exactly as offered identifies 50.9% of errors. The fourth highest recall is MS Word 2007 which identifies 40.6% of the errors.
But as I said above, I noticed that Ghotit generated a lot of false positives (the manner in which the Microsoft Word plugin interface is designed required me to select the first suggestion from a submenu for each highlighted error). This seems to have increased recall but notice that the precision score of 77.9% is significantly lower than the 88 - 90% that all of the other spell checkers achieved.
The error corpus does use UK English and I was able to specify that (or it was the default) for all the spell checkers I evaluated.
Note that the results for Ginger and Ghotit are subject to change as their algorithms and data sets are continuously being improved.