Skip to main content
    • Aa
    • Aa

Identifying off-topic student essays without topic-specific training data

  • D. HIGGINS (a1), J. BURSTEIN (a1) and Y. ATTALI (a1)

Educational assessment applications, as well as other natural-language interfaces, need some mechanism for validating user responses. If the input provided to the system is infelicitous or uncooperative, the proper response may be to simply reject it, to route it to a bin for special processing, or to ask the user to modify the input. If problematic user input is instead handled as if it were the system's normal input, this may degrade users' confidence in the software, or suggest ways in which they might try to “game” the system. Our specific task in this domain is the identification of student essays which are “off-topic”, or not written to the test question topic. Identification of off-topic essays is of great importance for the commercial essay evaluation system CriterionSM. The previous methods used for this task required 200–300 human scored essays for training purposes. However, there are situations in which no essays are available for training, such as when users (teachers) wish to spontaneously write a new topic for their students. For these kinds of cases, we need a system that works reliably without training data. This paper describes an algorithm that detects when a student's essay is off-topic without requiring a set of topic-specific essays for training. This new system is comparable in performance to previous models which require topic-specific essays for training, and provides more detailed information about the way in which an essay diverges from the requested essay topic.

Hide All
The authors would like to thank Chi Lu and Slava Andreyev for their help in carrying out the experiments described in this paper.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 0
Total number of PDF views: 16 *
Loading metrics...

Abstract views

Total abstract views: 216 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 21st October 2017. This data will be updated every 24 hours.