Skip to main content
×
Home
    • Aa
    • Aa

Supervised approach to recognise Polish temporal expressions and rule-based interpretation of timexes

  • JAN KOCOŃ (a1) and MICHAŁ MARCIŃCZUK (a1)
Abstract
Abstract

A key challenge of the Information Extraction in Natural Language Processing is the ability to recognise and classify temporal expressions (timexes). It is a crucial source of information about when something happens, how often something occurs or how long something lasts. Timexes extracted automatically from text, play a major role in many Information Extraction systems, such as question answering or event recognition. We prepared a broad specification of Polish timexes – PLIMEX. It is based on the state-of-the-art annotation guidelines for English, mainly TIMEX2 and TIMEX3 (a part of TimeML – Markup Language for Temporal and Event Expressions). We have expanded our specification for a description of the local meaning of timexes, based on LTIMEX annotation guidelines for English. Temporal description supports further event identification and extends event description model, focussing on anchoring events in time, events ordering and reasoning about the persistence of events. We prepared the specification, which is designed to address these issues, and we annotated all documents in Polish Corpus of Wroclaw University of Technology (KPWr) using our annotation guidelines. We also adapted our Liner2 machine learning system to recognise Polish timexes and we propose two-phase method to select a subset of features for Conditional Random Fields sequence labelling method. This article presents the whole process of corpus annotation, evaluation of inter-annotator agreement, extending Liner2 system with new features and evaluation of the recognition models before and after feature selection with the analysis of statistical significance of differences. Liner2 with presented models is available as open source software under the GNU General Public License.

Copyright
Footnotes
Hide All

Work financed as part of the investment in the CLARIN-PL research infrastructure funded by the Polish Ministry of Science and Higher Education.

Footnotes
Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

J. Benthem 1983. The Logic of Time: A Model-Theoretic Investigation into the Varieties of Temporal Ontology and Temporal Discourse. Dordrecht, London, Boston: D. Reidel.

A. L. Blum and P. Langley 1997. Selection of relevant features and examples in machine learning. Artificial Intelligence 97 (1–2): 245–71.

T. G. Dietterich 1998. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10 (7): 1895–923.

C. Hou and L. Jiao 2010. Selecting features of linear-chain conditional random fields via greedy stage-wise algorithms. Pattern Recognition Letters 31 (2): 151–62.

G. Hripcsak and A. S. Rothschild 2005. Agreement, the f-measure and reliability in information retrieval. Journal of the American Medical Informatics Association 12 (3): 296–8.

P. Kędzia , M. Piasecki , J. Kocoń , and A. Indyka-Piasecka 2014. Distributionally extended network-based word sense disambiguation in semantic clustering of Polish texts. IERI Procedia 10 (1): 3844.

R. Kohavi and G. H. John 1997. Wrappers for feature subset selection. Artificial Intelligence 97 (1–2): 273324.

H. Llorens , E. Saquete and B. Navarro-Colorado 2013. Applying semantic knowledge to the automatic processing of temporal expressions and events in natural language. Information Processing & Management 49 (1): 179197.

M. Marcińczuk , J. Kocoń and M. Janicki 2013. Liner2 – a customizable framework for proper names recognition for Polish. In Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence. Berlin: Springer Verlag, pp. 231–53.

J. Pustejovsky , R. Knippen , J. Littman and R. Saurí 2005b. Temporal and event information in natural language text. Language Resources and Evaluation 39 (2–3): 123–64.

F. Schilder 2004. Extracting meaning from temporal nouns and temporal prepositions. ACM Transactions on Asian Language Information Processing (TALIP) 3 (1): 3350.

J. Strötgen and M. Gertz 2013. Multilingual and cross-domain temporal tagging. Language Resources and Evaluation 47 (2): 269–98.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Full text views

Total number of HTML views: 5
Total number of PDF views: 49 *
Loading metrics...

Abstract views

Total abstract views: 327 *
Loading metrics...

* Views captured on Cambridge Core between 27th September 2016 - 20th September 2017. This data will be updated every 24 hours.