A method based on rules and machine learning for logic form identification in Spanish†

F. MARTÍNEZ-SANTIAGO; M. C. DÍAZ-GALIANO; M. Á. GARCÍA-CUMBRERAS; A. MONTEJO-RÁEZ

doi:10.1017/S1351324915000297

A method based on rules and machine learning for logic form identification in Spanish†

Published online by Cambridge University Press: 24 August 2015

F. MARTÍNEZ-SANTIAGO ,

M. C. DÍAZ-GALIANO

M. Á. GARCÍA-CUMBRERAS and

A. MONTEJO-RÁEZ

Show author details

F. MARTÍNEZ-SANTIAGO: Affiliation:
Department of Computer Science, Universidad de Jaén, Paraje Las Lagunillas, s/n, 23071, Jaén, Spain e-mail: dofer@ujaen.es, mcdiaz@ujaen.es, magc@ujaen.es, amontejo@ujaen.es
M. C. DÍAZ-GALIANO: Affiliation:
Department of Computer Science, Universidad de Jaén, Paraje Las Lagunillas, s/n, 23071, Jaén, Spain e-mail: dofer@ujaen.es, mcdiaz@ujaen.es, magc@ujaen.es, amontejo@ujaen.es
M. Á. GARCÍA-CUMBRERAS: Affiliation:
Department of Computer Science, Universidad de Jaén, Paraje Las Lagunillas, s/n, 23071, Jaén, Spain e-mail: dofer@ujaen.es, mcdiaz@ujaen.es, magc@ujaen.es, amontejo@ujaen.es
A. MONTEJO-RÁEZ: Affiliation:
Department of Computer Science, Universidad de Jaén, Paraje Las Lagunillas, s/n, 23071, Jaén, Spain e-mail: dofer@ujaen.es, mcdiaz@ujaen.es, magc@ujaen.es, amontejo@ujaen.es

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Logic Forms (LF) are simple, first-order logic knowledge representations of natural language sentences. Each noun, verb, adjective, adverb, pronoun, preposition and conjunction generates a predicate. LF systems usually identify the syntactic function by means of syntactic rules but this approach is difficult to apply to languages with a high syntax flexibility and ambiguity, for example, Spanish. In this study, we present a mixed method for the derivation of the LF of sentences in Spanish that allows the combination of hard-coded rules and a classifier inspired on semantic role labeling. Thus, the main novelty of our proposal is the way the classifier is applied to generate the predicates of the verbs, while rules are used to translate the rest of the predicates, which are more straightforward and unambiguous than the verbal ones. The proposed mixed system uses a supervised classifier to integrate syntactic and semantic information in order to help overcome the inherent ambiguity of Spanish syntax. This task is accomplished in a similar way to the semantic role labeling task. We use properties extracted from the AnCora-ES corpus in order to train a classifier. A rule-based system is used in order to obtain the LF from the rest of the phrase. The rules are obtained by exploring the syntactic tree of the phrase and encoding the syntactic production rules. The LF algorithm has been evaluated by using shallow parsing with some straightforward Spanish phrases. The verb argument labeling task achieves 84% precision and the proposed mixed LFi method surpasses 11% a system based only on rules.

Information

Type: Articles
Information: Natural Language Engineering , Volume 23 , Issue 1 , January 2017 , pp. 131 - 153

DOI: https://doi.org/10.1017/S1351324915000297 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

†

This work has been partially funded by the ATTOS project (TIN2012-38536-C03-01) from the Spanish Government and the AORESCU project (TIC 07684) from the Andalucía Government.

References

Agerri, R., and Peñas, A. 2010. On the automatic generation of intermediate logic forms for WordNet glosses. In Gelbukh, A. (ed.), Computational Linguistics and Intelligent Text Processing, pp. 26–37. Lecture Notes in Computer Science, vol. 6008. Berlin Heidelberg: Springer.Google Scholar

Ahn, D., Fissaha, S., Jijkoun, V., and De Rijke, M. 2004. The University of Amsterdam at Senseval-3: semantic roles and logic forms. In Mihalcea, R., and Edmonds, P. (eds.), Senseval-3: 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain: Association for Computational Linguistics, pp. 49–53.Google Scholar

Alfaraz, G. 2012. Word order as a change in progress: evidence from Cuban Spanish. In Proceedings of the 6th International Workshop on Spanish Sociolinguistics. University of Arizona, published by Cascadilla Proceedings Project, Somerville, MA, USA.Google Scholar

Anthony, S., and Patrick, J. 2004. Dependency based logical form transformations. In Proceedings of the 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain, pp. 54–57.Google Scholar

Baker, C. F., Fillmore, C. J., and Lowe, J. B. 1998. The Berkeley FrameNet project. In Proceedings of the 17th International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, Université de Montreal, Canada, pp. 86–90.Google Scholar

Bick, E., and Valverde, M. 2009. Automatic semantic role annotation for Spanish. In Proceedings of NODALIDA, Odense, Denmark, pp. 215–218.Google Scholar

Carreras, X., Chao, I., Padró, L., and Padró, M. 2004a. FreeLing: an open-source suite of language analyzers. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC’04). Lisbon, Portugal.Google Scholar

Carreras, X., Màrquez, L., and Chrupała, G. 2004b. Hierarchical recognition of propositional arguments with perceptrons. In Proceeding of CoNLL’2004 Shared Task: Semantic Role Labeling. Boston, MA, USA.Google Scholar

Collins, M. J. 1999. Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis, Philadelphia: University of Pennsylvania.Google Scholar

Daelemans, W., and van den Bosch, A. 2005. Memory-Based Language Processing, Cambridge: Cambridge University Press.Google Scholar

Daelemans, W., Zavrel, J., van der Sloot, K., and van den Bosch, A. 2004. TiMBL: Tilburg Memory-Based Learner, version 5.1, Reference Guide. ILK Technical Report 04-02.Google Scholar

Delmonte, R., and Rotondi, A. 2012. Treebanks of logical forms: they are useful only if consistent. In LREC 2012, ISA7 Workshop.Google Scholar

Ferrández, Ó., Terol, R. M., Muñoz, R., Martínez-Barco, P., and Palomar, M. 2007. A knowledge-based textual entailment approach applied to the AVE task. In Peters, C., Clough, P., Gey, F. C., Karlgren, J., Magnini, B., Oard, D. W., Rijke, M., and Stempfhuber, M. (eds.), Evaluation of Multilingual and Multi-modal Information Retrieval, pp. 490–493. Lecture Notes in Computer Science, vol. 4730. Berlin Heidelberg: Springer.Google Scholar

Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A. A., Lally, A., Murdock, J. W., Nyberg, E., Prager, J., Schlaefer, N., and Welty, C. 2010. Building Watson: an overview of the DeepQA project. AI Magazine 31 (3): 59–79.Google Scholar

Fowler, A., Hauser, B., Hodges, D., Niles, I., Novischi, A., and Stephan, J. 2005. Applying COGEX to recognize textual entailment. In Proceedings of the PASCAL Challenges Workshop on Recognising Textual Entailment. Southampton, U.K.: PASCAL Recognising Textual Entailment Challenge, pp. 69–72.Google Scholar

Gildea, D., and Jurafsky, D. 2002. Automatic labeling of semantic roles. Computational Linguistics 28 (3), 245–288.Google Scholar

Harabagiu, S. M., Miller, G. A., and Moldovan, D. I. 1999. Wordnet 2-A morphologically and semantically enhanced resource. In Proceedings of SIGLEX, Vol. 99. College Park, Maryland, USA.Google Scholar

Henderson, J., Merlo, P., Titov, I., and Musillo, G. 2013. Multilingual joint parsing of syntactic and semantic dependencies with a latent variable model. Computational linguistics, 39 (4), 949–998.Google Scholar

Johansson, R., and Nugues, P. 2008. Dependency-based semantic role labeling of PropBank. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Waikiki, Honolulu, Hawai, USA, pp. 68–791.Google Scholar

Lepore, E., and Ludwig, K. 2001. What is logical form?. In Kotatko, P., Pagin, P., and Segal, G. (eds.), Interpreting Davidson, pp. 111–142. Stanford: CSLI, 2001.Google Scholar

Màrquez, L., Carreras, X., Litkowski, K., and Stevenson, S. 2008. Semantic role labeling: an introduction to the special issue. Computational Linguistics 34 (2): 145–159.Google Scholar

McCord, M. C., Murdock, J. W., and Boguraev, B. K. 2012. Deep parsing in Watson. IBM Journal of Research and Development 56 (3.4): Berlin: Springer.Google Scholar

McCune, W. W. 1994. OTTER reference manual and guide. Argonne National Laboratory, Illinois.CrossRef Google Scholar

Moldovan, D., Clark, C., Harabagiu, S., and Hodges, D. 2007. Cogex: a semantically and contextually enriched logic prover for question answering. Journal of Applied Logic 5 (1): 49–69.Google Scholar

Moldovan, D. I., and Rus, V. 2001. Logic form transformation of WordNet and its applicability to question answering. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics. ACL ’01, Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 402–409.Google Scholar

Morante, R., and Busser, B. 2007. Memory-based semantic role labelling of Catalan and Spanish. In Proceedings of Conference on Recent Adavances in Natural Language Processing RANLP-2007, Borovets, pp. 388–394.Google Scholar

Muñoz-Terol, R., Martínez-Barco, P., and Palomar, M. 2007. Applying logic forms and statistical methods to CL-SR performance. In Evaluation of Multilingual and Multi-modal Information Retrieval, pp. 766–769. Lecture Notes in Computer Science, vol. 4730. Berlin: Springer.Google Scholar

Nakamura, M., Kimura, Y., Pham, M. Q. N., Nguyen, M. L., and Shimazu, A. 2010. Treatment of legal sentences including itemization written in Japanese, english and Vietnamese towards translation into logical forms. Journal Natural Language Processing 81–100.Google Scholar

Nguyen, M. L., and Shimazu, A. 2014. A semi supervised learning for mapping NL sentences to logical form with ambiguous supervision. In Data and Knowledge Engineering.Google Scholar

Padró, L., and Stanilovsky, E. 2013. FreeLing 3.0: towards wider multilinguality. In Proceedings of the Language Resources and Evaluation Conference (LREC 2012). ELRA, Istanbul, Turkey.Google Scholar

Palmer, M., Gildea, D., and Xue, N. 2010. Semantic role labeling. Synthesis Lectures on Human Language Technologies 3 (1), 1–103.Google Scholar

Pietroski, P. 2009. Logical form. The Stanford Encyclopedia of Philosophy (Fall 2009 Edition), Zalta, E. N. (ed.). http://plato.stanford.edu/entries/logical-form/ Google Scholar

Punyakanok, V., Roth, D., Yih, W.-tau, and Zimak, D. 2004. Semantic role labeling via integer linear programming inference. In Proceedings of the 20th International Conference on Computational Linguistics. COLING ’04, Stroudsburg, PA, USA: Association for Computational Linguistics, p. 1346.Google Scholar

Rus, V. 2002. Logic Form For WordNet Glosses and Application to Question Answering. Ph.D. thesis, Computer Science Department, School of Engineering, Southern Methodist University, Dallas, Texas.Google Scholar

Rus, V. 2004a. Experiments with machine learning for logic arguments identification. In Proceedings of the 15th Midwest Artificial Intelligence and Cognitive Science Conference MAICS 2004. Chicago: Omnipress, pp. 40–47.Google Scholar

Rus, V. 2004b. A first evaluation of logic form identification systems. In Proceedings of Senseval-3: Third International Workshop on Evaluation of Systems for Semnatic Analysis for Text, Barcelona, Spain: Association for Computational Linguistics, pp. 37–40.Google Scholar

Russell, B. 1914. Our Knowledge of the External World: As a Field for Scientific Method in Philosophy, p. 53. New York: Routledge.Google Scholar

Suñer, M. 1982. Syntax and Semantics of Spanish Presentational Sentence-types, Romance languages and linguistics series. Georgetown: Georgetown University Press Washington, DC.Google Scholar

Tatu, M., Iles, B., and Moldovan, D. 2007. Automatic answer validation using COGEX. In Evaluation of Multilingual and Multi-modal Information Retrieval, pp. 494–501. Lecture Notes in Computer Science. Berlin: Springer.Google Scholar

Taulé, M., Martí, M., and Recasens, M. 2008. AnCora: multilevel annotated corpora for Catalan and Spanish LREC.Google Scholar

Todorova, Y. 2009. Answering questions from natural language using A-Prolog. In Logic Programming, pp. 544–546. Lecture Notes in Computer Science, vol. 5649. Berlin: Springer Berlin Heidelberg.CrossRef Google Scholar

Tustison, C. A. 2004. Logical form Identification for Medical Clinical Trials. Ph.D. thesis.Google Scholar

Wenner, C. 2007. Rule-based logical forms extraction. In Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Tartu, Estonia: Tartu University, pp. 402–409.Google Scholar

Zhao, H., Chen, W., Kity, C., and Zhou, G. 2009. Multilingual dependency learning: a huge feature engineering method to semantic dependency parsing. In Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task. Boulder, Colorado: Association for Computational Linguistics, pp. 55–60.Google Scholar

Article contents

A method based on rules and machine learning for logic form identification in Spanish†

Abstract

Information

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests