Hostname: page-component-7c8c6479df-xxrs7 Total loading time: 0 Render date: 2024-03-29T16:00:42.569Z Has data issue: false hasContentIssue false

Neural architectures for open-type relation argument extraction

Published online by Cambridge University Press:  07 December 2018

Benjamin Roth*
Affiliation:
Center for Information and Language Processing, Ludwig Maximilian University of Munich, München, Germany
Costanza Conforti
Affiliation:
Language Technology Laboratory, University of Cambridge, Cambridge, UK
Nina Poerner
Affiliation:
Center for Information and Language Processing, Ludwig Maximilian University of Munich, München, Germany
Sanjeev Kumar Karn
Affiliation:
Center for Information and Language Processing, Ludwig Maximilian University of Munich, München, Germany
Hinrich Schütze
Affiliation:
Center for Information and Language Processing, Ludwig Maximilian University of Munich, München, Germany
*
*Corresponding author. Email: beroth@cis.uni-muenchen.de

Abstract

In this work, we focus on the task of open-type relation argument extraction (ORAE): given a corpus, a query entity Q, and a knowledge base relation (e.g., “Q authored notable work with title X”), the model has to extract an argument of non-standard entity type (entities that cannot be extracted by a standard named entity tagger, for example, X: the title of a book or a work of art) from the corpus. We develop and compare a wide range of neural models for this task yielding large improvements over a strong baseline obtained with a neural question answering system. The impact of different sentence encoding architectures and answer extraction methods is systematically compared. An encoder based on gated recurrent units combined with a conditional random fields tagger yields the best results. We release a data set to train and evaluate ORAE, based on Wikidata and obtained by distant supervision.

Type
Article
Copyright
© Cambridge University Press 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Adel, H., Roth, B. and Schütze, H. (2016). Comparing convolutional neural networks to traditional models for slot filling. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA, pp. 828838.Google Scholar
Angeli, G., Tibshirani, J., Wu, J. and Manning, C.D. (2014). Combining distant and partial supervision for relation extraction. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 15561567.CrossRefGoogle Scholar
Ba, L.J., Kiros, R. and Hinton, G.E. (2016). Layer normalization. CoRR, arXiv:1607.06450.Google Scholar
Bahdanau, D., Cho, K. and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. CoRR, arXiv:1409.0473.Google Scholar
Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In Montavon, G., Orr, G.B., and Müller, K.R. (eds), Neural Networks: Tricks of the Trade, Vol. 7700. Springer, Berlin, Heidelberg, pp. 437478.CrossRefGoogle Scholar
Chen, D., Fisch, A., Weston, J. and Bordes, A. (2017). Reading Wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 18701879.CrossRefGoogle Scholar
Chinchor, N. and Robinson, P. (1997). Muc-7 named entity task definition. In Proceedings of the 7th Conference on Message Understanding. http://anthology.aclweb.org/M/M98/.Google Scholar
Choi, Y., Breck, E. and Cardie, C. (2006). Joint extraction of entities and relations for opinion recognition. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP), Sydney, Australia, pp. 431439.CrossRefGoogle Scholar
Chung, J., Gülçehre, Ç., Cho, K. and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, arXiv:1412.3555.Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K. and Kuksa, P.P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, 24932537.Google Scholar
Culotta, A., McCallum, A. and Betz, J. (2006). Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In NAACL HLT 2006, Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, USA, pp. 296303.Google Scholar
dos Santos, C.N., Xiang, B. and Zhou, B. (2015). Classifying relations by ranking with convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Volume 1, Beijing, China, pp. 626634.Google Scholar
Gülçehre, Ç., Ahn, S., Nallapati, R., Zhou, B. and Bengio, Y. (2016). Pointing the unknown words. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Volume 1, Berlin, Germany, pp. 140149.Google Scholar
Gupta, P., Schütze, H. and Andrassy, B. (2016). Table filling multi-task recurrent neural network for joint entity and relation extraction. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, pp. 25372547.Google Scholar
Hewlett, D., Lacoste, A., Jones, L., Polosukhin, I., Fandrianto, A., Han, J., Kelcey, M. and Berthelot, D. (2016). Wikireading: A novel large-scale language understanding task over Wikipedia. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp. 15351545.Google Scholar
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation 9(8), 17351780.CrossRefGoogle ScholarPubMed
Hoffmann, R., Zhang, C. and Weld, D.S. (2010). Learning 5000 relational extractors. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp. 286295.Google Scholar
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, Volume 37, Lille, France, pp. 448456.Google Scholar
Katiyar, A. and Cardie, C. (2016). Investigating LSTMs for joint extraction of opinion entities and relations. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp. 919929.Google Scholar
Lafferty, J.D., McCallum, A. and Pereira, F.C.N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28–July 1, 2001, pp. 282289.Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K. and Dyer, C. (2016). Neural architectures for named entity recognition. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA, pp. 260270.Google Scholar
Levy, O., Seo, M., Choi, E. and Zettlemoyer, L. (2017). Zero-shot relation extraction via reading comprehension. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, Canada, pp. 333342.CrossRefGoogle Scholar
Mintz, M., Bills, S., Snow, R. and Jurafsky, D. (2009). Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Volume 2, Suntec, Singapore, pp. 10031011.Google Scholar
Miwa, M. and Bansal, M. (2016). End-to-end relation extraction using LSTMs on sequences and tree structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp. 11051116.Google Scholar
Miwa, M. and Sasaki, Y. (2014). Modeling joint entity and relation extraction with table representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 18581869.CrossRefGoogle Scholar
Pennington, J., Socher, R. and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 15321543.CrossRefGoogle Scholar
Pink, G., Nothman, J. and Curran, J.R. (2014). Analysing recall loss in named entity slot filling. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar, pp. 820830.CrossRefGoogle Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K. and Liang, P. (2016). Squad: 100, 000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, pp. 23832392.CrossRefGoogle Scholar
Ren, X., Wu, Z., He, W., Qu, M., Voss, C.R., Ji, H., Abdelzaher, T.F. and Han, J. (2017). Cotype: Joint extraction of typed entities and relations with knowledge bases. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, pp. 10151024.CrossRefGoogle Scholar
Roth, B. (2015). Effective distant supervision for end-to-end knowledge base population systems. PhD thesis, Saarland University.Google Scholar
Roth, B., Barth, T., Chrupala, G., Gropp, M. and Klakow, D. (2014). RelationFactory: A fast, modular and effective system for knowledge base population. In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, pp. 8992.CrossRefGoogle Scholar
Seo, M.J., Kembhavi, A., Farhadi, A. and Hajishirzi, H. (2016). Bidirectional attention flow for machine comprehension. CoRR, arXiv:1611.01603.Google Scholar
Surdeanu, M. (2013). Overview of the TAC2013 knowledge base population evaluation: English slot filling and temporal slot filling. In Proceedings of the Sixth Text Analysis Conference, TAC 2013, Gaithersburg, Maryland, USA. https://tac.nist.gov/publications/2013/papers.html.Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017). Attention is all you need. CoRR, arXiv:1706.03762.Google Scholar
Verga, P., Belanger, D., Strubell, E., Roth, B. and McCallum, A. (2016). Multilingual relation extraction using compositional universal schema. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA, pp. 886896.Google Scholar
Vinyals, O., Fortunato, M. and Jaitly, N. (2015). Pointer networks. In Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., and Garnett, R. (eds), Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, Montreal, Quebec, Canada, pp. 26922700.Google Scholar
Vu, N.T., Adel, H., Gupta, P. and Schütze, H. (2016). Combining recurrent and convolutional neural networks for relation classification. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA, pp. 534539.Google Scholar
Wang, W., Yang, N., Wei, F., Chang, B. and Zhou, M. (2017). Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 189198.CrossRefGoogle Scholar
Xu, Y., Jia, R., Mou, L., Li, G., Chen, Y., Lu, Y. and Jin, Z. (2016). Improved relation classification by deep recurrent neural networks with data augmentation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, pp. 14611470.Google Scholar
Zeng, D., Liu, K., Lai, S., Zhou, G. and Zhao, J. (2014). Relation classification via convolutional deep neural network. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland, pp. 23352344.Google Scholar
Zhang, Y., Chaganty, A., Paranjape, A., Chen, D., Bolton, J., Qi, P. and Manning, C.D. (2016). Stanford at tac kbp 2016: Sealing pipeline leaks and understanding chinese. In Proceedings of TAC.Google Scholar
Zhang, Y., Zhong, V., Chen, D., Angeli, G. and Manning, C.D. (2017). Position-aware attention and supervised data improve slot filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 3545.CrossRefGoogle Scholar
Zheng, S., Wang, F., Bao, H., Hao, Y., Zhou, P. and Xu, B. (2017). Joint extraction of entities and relations based on a novel tagging scheme. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 12271236.CrossRefGoogle Scholar