Skip to main content
×
×
Home

InferPortOIE: A Portuguese Open Information Extraction system with inferences

  • Cleiton Fernando Lima Sena (a1) and Daniela Barreiro Claro (a1)
Abstract

Nowadays, there is an increasing amount of digital data. In the case of the Web, daily, a vast collection of data is generated, whose contents are heterogeneous. A significant portion of this data is available in a natural language format. Open Information Extraction (Open IE) enables the extraction of facts from large quantities of texts written in natural language. In this work, we propose an Open IE method to extract facts from texts written in Portuguese. We developed two new rules that generalize the inference by transitivity and by symmetry. Consequently, this approach increases the number of implicit facts in a sentence. Our novel symmetric inference approach is based on a list of symmetric features. Our results confirmed that our method outstands close works both in precision and number of valid extractions. Considering the number of minimal facts, our approach is equivalent to the most relevant methods in the literature.

Copyright
Corresponding author
*Corresponding author. Email: dclaro@ufba.br
References
Hide All
Akbik, A. andLoser, A. (2012). KrakeN: N-ary facts in open information extraction. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, AKBC-WEKEX ’12. Montreal, Canada: Association for Computational Linguistics (ACL), pp. 5256.
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M. and Etzioni, O. (2007). Open Information extraction from the web. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI’07. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., pp. 2670–2676.
Banko, M. and Etzioni, O. (2008). The Tradeoffs Between Open and Traditional Relation Extraction, vol. 8. Stroudsburg, PA, USA: Association for Computational Linguistics (ACL), pp. 2836.
Bast, H. and Haussmann, E. (2013). Open information extraction via contextual sentence decomposition. In 2013 IEEE Seventh International Conference on Semantic Computing (ICSC). Irvine, CA, USA: IEEE, pp. 154159.
Bast, H. and Haussmann, E. (2014). More informative open information extraction via simple inference. In Proceedings of the 36th European Conference on IR Research on Advances in Information Retrieval, ECIR 2014, vol. 8416. New York, NY, USA: Springer-Verlag New York, Inc., pp. 585590.
Carletta, J. (1996). Assessing agreement on classification tasks: The Kappa statistic. Computational Linguistics 22(2), 249254.
Chang, C.-H., Kayed, M., Girgis, M.R. and Shaala, K.F. (2006). A survey of web information extraction systems. IEEE Transactions on Knowledge and Data Engineering 18(10), 14111428.
Del Corro, L. and Gemulla, R. (2013). ClausIE: Clause-based open information extraction. In Proceedings of the 22nd International Conference on World Wide Web, WWW ’13. New York, NY, USA: ACM, pp. 355366.
Etzioni, O., Banko, M., Soderland, S. and Weld, D.S. (2008). Open information extraction from the web. Communications of the ACM, 51(12), 6874.
Fader, A., Soderland, S. and Etzioni, O. (2011). Identifying relations for open information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 15351545.
Faruqui, M. and Kumar, S. (2015). Multilingual Open Relation Extraction Using Cross-lingual Projection. arXiv preprint. arXiv:1503.06450, abs/1503.06450 (May–June), pp. 13511356.
Gamallo, P. and Garcia, M. (2015). Multilingual Open Information Extraction. Cham: Springer International Publishing, pp. 711722.
Gamallo, P., Garcia, M. and Fernández-Lanza, S. (2012). Dependency-based open information extraction. In Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, ROBUS-UNSUP ’12. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 1018.
Godoy, L. (2008). Os verbos recíprocos no PB: interface sintaxe-semântica lexical. 2008. Dissertation (Mestrado em Estudos Linguísticos)-Faculdade de Letras, UFMG, Belo Horizonte.
Mausam, . (2016). Open information extraction systems and downstream applications. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16. New York, NY, USA: AAAI Press, pp. 40744077.
Mausam, Schmitz M., Bart, R., Soderland, S. and Etzioni, O. (2012). Open language learning for information extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL ’12. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 523534.
Moura Silva, W.D.C.d. (2013). Improving the Corrector Gramatical CoGrOO. PhD Thesis, University of São Paulo.
Neto, P.C. and Infante, U. (2003). Gramática da Língua Portuguesa. São Paulo: Scipione.
Sena, C.F.L., Glauber, R. and Claro, D.B. (2017). Inference approach to enhance a Portuguese open information extraction. In Proceedings of the 19th International Conference on Enterprise Information Systems—ICEIS, vol. 1. Porto, Portugal: ScitePress for INSTICC, pp. 442451.
Soderland, S. (1999). Learning information extraction rules for semi-structured and free text. Machine Learning 34(1-3), 233272.
Wu, F. and Weld, D.S. (2010). Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’10. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 118127.
Xavier, C.C., de Lima, V.L.S. and Souza, M. (2015). Open information extraction based on lexical semantics. Journal of the Brazilian Computer Society 21(1), 114.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Keywords

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed