Skip to main content

A method based on rules and machine learning for logic form identification in Spanish


Logic Forms (LF) are simple, first-order logic knowledge representations of natural language sentences. Each noun, verb, adjective, adverb, pronoun, preposition and conjunction generates a predicate. LF systems usually identify the syntactic function by means of syntactic rules but this approach is difficult to apply to languages with a high syntax flexibility and ambiguity, for example, Spanish. In this study, we present a mixed method for the derivation of the LF of sentences in Spanish that allows the combination of hard-coded rules and a classifier inspired on semantic role labeling. Thus, the main novelty of our proposal is the way the classifier is applied to generate the predicates of the verbs, while rules are used to translate the rest of the predicates, which are more straightforward and unambiguous than the verbal ones. The proposed mixed system uses a supervised classifier to integrate syntactic and semantic information in order to help overcome the inherent ambiguity of Spanish syntax. This task is accomplished in a similar way to the semantic role labeling task. We use properties extracted from the AnCora-ES corpus in order to train a classifier. A rule-based system is used in order to obtain the LF from the rest of the phrase. The rules are obtained by exploring the syntactic tree of the phrase and encoding the syntactic production rules. The LF algorithm has been evaluated by using shallow parsing with some straightforward Spanish phrases. The verb argument labeling task achieves 84% precision and the proposed mixed LFi method surpasses 11% a system based only on rules.

Hide All

This work has been partially funded by the ATTOS project (TIN2012-38536-C03-01) from the Spanish Government and the AORESCU project (TIC 07684) from the Andalucía Government.

Hide All
Agerri, R., and Peñas, A. 2010. On the automatic generation of intermediate logic forms for WordNet glosses. In Gelbukh, A. (ed.), Computational Linguistics and Intelligent Text Processing, pp. 2637. Lecture Notes in Computer Science, vol. 6008. Berlin Heidelberg: Springer.
Ahn, D., Fissaha, S., Jijkoun, V., and De Rijke, M. 2004. The University of Amsterdam at Senseval-3: semantic roles and logic forms. In Mihalcea, R., and Edmonds, P. (eds.), Senseval-3: 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain: Association for Computational Linguistics, pp. 4953.
Alfaraz, G. 2012. Word order as a change in progress: evidence from Cuban Spanish. In Proceedings of the 6th International Workshop on Spanish Sociolinguistics. University of Arizona, published by Cascadilla Proceedings Project, Somerville, MA, USA.
Anthony, S., and Patrick, J. 2004. Dependency based logical form transformations. In Proceedings of the 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain, pp. 5457.
Baker, C. F., Fillmore, C. J., and Lowe, J. B. 1998. The Berkeley FrameNet project. In Proceedings of the 17th International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, Université de Montreal, Canada, pp. 8690.
Bick, E., and Valverde, M. 2009. Automatic semantic role annotation for Spanish. In Proceedings of NODALIDA, Odense, Denmark, pp. 215218.
Carreras, X., Chao, I., Padró, L., and Padró, M. 2004a. FreeLing: an open-source suite of language analyzers. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC’04). Lisbon, Portugal.
Carreras, X., Màrquez, L., and Chrupała, G. 2004b. Hierarchical recognition of propositional arguments with perceptrons. In Proceeding of CoNLL’2004 Shared Task: Semantic Role Labeling. Boston, MA, USA.
Collins, M. J. 1999. Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis, Philadelphia: University of Pennsylvania.
Daelemans, W., and van den Bosch, A. 2005. Memory-Based Language Processing, Cambridge: Cambridge University Press.
Daelemans, W., Zavrel, J., van der Sloot, K., and van den Bosch, A. 2004. TiMBL: Tilburg Memory-Based Learner, version 5.1, Reference Guide. ILK Technical Report 04-02.
Delmonte, R., and Rotondi, A. 2012. Treebanks of logical forms: they are useful only if consistent. In LREC 2012, ISA7 Workshop.
Ferrández, Ó., Terol, R. M., Muñoz, R., Martínez-Barco, P., and Palomar, M. 2007. A knowledge-based textual entailment approach applied to the AVE task. In Peters, C., Clough, P., Gey, F. C., Karlgren, J., Magnini, B., Oard, D. W., Rijke, M., and Stempfhuber, M. (eds.), Evaluation of Multilingual and Multi-modal Information Retrieval, pp. 490493. Lecture Notes in Computer Science, vol. 4730. Berlin Heidelberg: Springer.
Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A. A., Lally, A., Murdock, J. W., Nyberg, E., Prager, J., Schlaefer, N., and Welty, C. 2010. Building Watson: an overview of the DeepQA project. AI Magazine 31 (3): 5979.
Fowler, A., Hauser, B., Hodges, D., Niles, I., Novischi, A., and Stephan, J. 2005. Applying COGEX to recognize textual entailment. In Proceedings of the PASCAL Challenges Workshop on Recognising Textual Entailment. Southampton, U.K.: PASCAL Recognising Textual Entailment Challenge, pp. 6972.
Gildea, D., and Jurafsky, D. 2002. Automatic labeling of semantic roles. Computational Linguistics 28 (3), 245288.
Harabagiu, S. M., Miller, G. A., and Moldovan, D. I. 1999. Wordnet 2-A morphologically and semantically enhanced resource. In Proceedings of SIGLEX, Vol. 99. College Park, Maryland, USA.
Henderson, J., Merlo, P., Titov, I., and Musillo, G. 2013. Multilingual joint parsing of syntactic and semantic dependencies with a latent variable model. Computational linguistics, 39 (4), 949998.
Johansson, R., and Nugues, P. 2008. Dependency-based semantic role labeling of PropBank. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Waikiki, Honolulu, Hawai, USA, pp. 68791.
Lepore, E., and Ludwig, K. 2001. What is logical form?. In Kotatko, P., Pagin, P., and Segal, G. (eds.), Interpreting Davidson, pp. 111142. Stanford: CSLI, 2001.
Màrquez, L., Carreras, X., Litkowski, K., and Stevenson, S. 2008. Semantic role labeling: an introduction to the special issue. Computational Linguistics 34 (2): 145159.
McCord, M. C., Murdock, J. W., and Boguraev, B. K. 2012. Deep parsing in Watson. IBM Journal of Research and Development 56 (3.4): Berlin: Springer.
McCune, W. W. 1994. OTTER reference manual and guide. Argonne National Laboratory, Illinois.
Moldovan, D., Clark, C., Harabagiu, S., and Hodges, D. 2007. Cogex: a semantically and contextually enriched logic prover for question answering. Journal of Applied Logic 5 (1): 4969.
Moldovan, D. I., and Rus, V. 2001. Logic form transformation of WordNet and its applicability to question answering. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics. ACL ’01, Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 402409.
Morante, R., and Busser, B. 2007. Memory-based semantic role labelling of Catalan and Spanish. In Proceedings of Conference on Recent Adavances in Natural Language Processing RANLP-2007, Borovets, pp. 388394.
Muñoz-Terol, R., Martínez-Barco, P., and Palomar, M. 2007. Applying logic forms and statistical methods to CL-SR performance. In Evaluation of Multilingual and Multi-modal Information Retrieval, pp. 766769. Lecture Notes in Computer Science, vol. 4730. Berlin: Springer.
Nakamura, M., Kimura, Y., Pham, M. Q. N., Nguyen, M. L., and Shimazu, A. 2010. Treatment of legal sentences including itemization written in Japanese, english and Vietnamese towards translation into logical forms. Journal Natural Language Processing 81–100.
Nguyen, M. L., and Shimazu, A. 2014. A semi supervised learning for mapping NL sentences to logical form with ambiguous supervision. In Data and Knowledge Engineering.
Padró, L., and Stanilovsky, E. 2013. FreeLing 3.0: towards wider multilinguality. In Proceedings of the Language Resources and Evaluation Conference (LREC 2012). ELRA, Istanbul, Turkey.
Palmer, M., Gildea, D., and Xue, N. 2010. Semantic role labeling. Synthesis Lectures on Human Language Technologies 3 (1), 1103.
Pietroski, P. 2009. Logical form. The Stanford Encyclopedia of Philosophy (Fall 2009 Edition), Zalta, E. N. (ed.).
Punyakanok, V., Roth, D., Yih, W.-tau, and Zimak, D. 2004. Semantic role labeling via integer linear programming inference. In Proceedings of the 20th International Conference on Computational Linguistics. COLING ’04, Stroudsburg, PA, USA: Association for Computational Linguistics, p. 1346.
Rus, V. 2002. Logic Form For WordNet Glosses and Application to Question Answering. Ph.D. thesis, Computer Science Department, School of Engineering, Southern Methodist University, Dallas, Texas.
Rus, V. 2004a. Experiments with machine learning for logic arguments identification. In Proceedings of the 15th Midwest Artificial Intelligence and Cognitive Science Conference MAICS 2004. Chicago: Omnipress, pp. 4047.
Rus, V. 2004b. A first evaluation of logic form identification systems. In Proceedings of Senseval-3: Third International Workshop on Evaluation of Systems for Semnatic Analysis for Text, Barcelona, Spain: Association for Computational Linguistics, pp. 3740.
Russell, B. 1914. Our Knowledge of the External World: As a Field for Scientific Method in Philosophy, p. 53. New York: Routledge.
Suñer, M. 1982. Syntax and Semantics of Spanish Presentational Sentence-types, Romance languages and linguistics series. Georgetown: Georgetown University Press Washington, DC.
Tatu, M., Iles, B., and Moldovan, D. 2007. Automatic answer validation using COGEX. In Evaluation of Multilingual and Multi-modal Information Retrieval, pp. 494501. Lecture Notes in Computer Science. Berlin: Springer.
Taulé, M., Martí, M., and Recasens, M. 2008. AnCora: multilevel annotated corpora for Catalan and Spanish LREC.
Todorova, Y. 2009. Answering questions from natural language using A-Prolog. In Logic Programming, pp. 544546. Lecture Notes in Computer Science, vol. 5649. Berlin: Springer Berlin Heidelberg.
Tustison, C. A. 2004. Logical form Identification for Medical Clinical Trials. Ph.D. thesis.
Wenner, C. 2007. Rule-based logical forms extraction. In Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Tartu, Estonia: Tartu University, pp. 402409.
Zhao, H., Chen, W., Kity, C., and Zhou, G. 2009. Multilingual dependency learning: a huge feature engineering method to semantic dependency parsing. In Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task. Boulder, Colorado: Association for Computational Linguistics, pp. 5560.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 6
Total number of PDF views: 52 *
Loading metrics...

Abstract views

Total abstract views: 1438 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 17th March 2018. This data will be updated every 24 hours.