Checking in on grammar checking
‘Checking in on Grammar Checking’ by Robert Dale is the latest Industry Watch column to be published in the journal Natural Language Engineering.
Reflecting back to 2004, industry expert Robert Dale reminds us of a time when Microsoft Word was the dominant software used for grammar checking. Bringing us up-to-date in 2016, Dale discusses the evolution, capabilities and current marketplace for grammar checking and its diverse range of users: from academics, men on dating websites to the fifty top celebrities on Twitter.
Below is an extract from the article, which is available to read in full here.
An appropriate time to reflect
I am writing this piece on a very special day. It’s National Grammar Day, ‘observed’ (to use Wikipedia’s crowdsourced choice of words) in the US on March 4th. The word ‘observed’ makes me think of citizens across the land going about their business throughout the day quietly and with a certain reverence; determined, on this day of all days, to ensure that their subjects agree with their verbs, to not their infinitives split, and to avoid using prepositions to end their sentences with. I can’t see it, really. I suspect that, for most people, National Grammar Day ranks some distance behind National Hug Day (January 21st) and National Cat Day (October 29th). And, at least in Poland and Lithuania, it has to compete with St Casimir’s Day, also celebrated on March 4th. I suppose we could do a study to see whether Polish and Lithuanian speakers have poorer grammar than Americans on that day, but I doubt we’d find a significant difference. So National Grammar Day might not mean all that much to most people, but it does feel like an appropriate time to take stock of where the grammar checking industry has got to. I last wrote a piece on commercial grammar checkers for the Industry Watch column over 10 years ago (Dale 2004). At the time, there really was no alternative to the grammar checker in Microsoft Word. What’s changed in the interim? And does anyone really need a grammar checker when so much content these days consists of generated-on-a-whim tweets and SMS messages?The evolution of grammar checking
Grammar checking software has evolved through three distinct paradigms. First-generation tools were based on simple pattern matching and string replacement, using tables of suspect strings and their corresponding corrections. For example, we might search a text for any occurrences of the string isnt and suggest replacing them by isn’t. The basic technology here was pioneered by Bell Labs in the UNIX Writer’s Workbench tools (Macdonald 1983) in the late 1970s and early 1980s, and was widely used in a range of more or less derivative commercial software products that appeared on the market in the early ’80s. Anyone who can remember that far back might dimly recall using programs like RightWriter on the PC and Grammatik on the Mac. Second-generation tools embodied real syntactic processing. IBM’s Epistle (Heidorn et al. 1982) was the first really visible foray into this space, and key members of the team that built that application went on to develop the grammar checker that, to this day, resides inside Microsoft Word (Heidorn 2000). These systems rely on large rule-based descriptions of permissible syntax, in combination with a variety of techniques for detecting ungrammatical elements and posing potential corrections for those errors. Perhaps not surprisingly, the third generation of grammar-checking software is represented by solutions that make use of statistical language models in one way or another. The most impressive of these is Google’s context-aware spell checker (Whitelaw et al. 2009)—when you start taking context into account, the boundary between spell checking and grammar checking gets a bit fuzzy. Google’s entrance into a marketplace is enough to make anyone go weak at the knees, but there are other third-party developers brave enough to explore what’s possible in this space. A recent attempt that looks interesting is Deep Grammar (www.deepgrammar.com). We might expect to find that modern grammar checkers draw on techniques from each of these three paradigms. You can get a long way using simple table lookup for common errors, so it would be daft to ignore that fact, but each generation adds the potential for further coverage and capability.
The remainder of the article discusses the following:
- Today’s grammar-checking marketplace
- Capabilities
- Who needs a grammar checker?
‘Checking in on grammar checking’ is an Open Access article. You may also be interested in complimentary access to a collection of related articles about grammar published in Natural Language Engineering. These papers are fully available until 30th June 2016.
Other recent Industry Watch articles by Robert Dale: