Evaluate Similarity of Requirements with Multilingual Natural Language Processing

U. Bisang; J. Brünnhäußer; P. Lünnemann; L. Kirsch; K. Lindow

doi:10.1017/pds.2022.153

Evaluate Similarity of Requirements with Multilingual Natural Language Processing

Published online by Cambridge University Press: 26 May 2022

U. Bisang ,

L. Kirsch and

U. Bisang: Affiliation:
Fraunhofer IPK, Germany
J. Brünnhäußer*: Affiliation:
Fraunhofer IPK, Germany
P. Lünnemann: Affiliation:
Fraunhofer IPK, Germany
L. Kirsch: Affiliation:
CONTACT Software GmbH, Germany
K. Lindow: Affiliation:
Fraunhofer IPK, Germany
*: joerg.bruennhaeusser@ipk.fraunhofer.de

Article contents

Abstract
References

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Finding redundant requirements or semantically similar ones in previous projects is a very time-consuming task in engineering design, especially with multilingual data. Due to modern NLP it is possible to automate such tasks. In this paper we compared different multilingual embeddings models to see which of them is the most suitable to find similar requirements in English and German. The comparison was done for both in-domain data (requirements pairs) and out-of-domain data (general sentence pairs). The most suitable model were sentence embeddings learnt with knowledge distillation.

Keywords

artificial intelligence (AI)requirements management information management data-driven design natural language processing

Information

Type: Article
Information: Proceedings of the Design Society , Volume 2: DESIGN2022 , May 2022 , pp. 1511 - 1520

DOI: https://doi.org/10.1017/pds.2022.153 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright: The Author(s), 2022.

References

Ferrari, A., Spagnolo, G. O. and Gnesi, S. (2017), “PURE: A Dataset of Public Requirements Documents”, available at: https://www.researchgate.net/publication/320028192_PURE_A_Dataset_of_Public_Requirements_Documents.Google Scholar

Agirre, E., Banea, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Mihalcea, R., Rigau, G. and Wiebe, J. (2016), “SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation”, in Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), Association for Computational Linguistics, San Diego, California, pp. 497–511.CrossRef Google Scholar

Agirre, E., Cer, D., Diab, M. and Gonzalez-Agirre, A. (2012), “SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity”, in *SEM 2012: The First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), Association for Computational Linguistics, Montréal, Canada, pp. 385–393.Google Scholar

Arora, S., Liang, Y. and Ma, T. (Eds.) (2019), A simple but tough-to-beat baseline for sentence embeddings.Google Scholar

Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T. (2016), Enriching Word Vectors with Subword Information, available at: http://arxiv.org/pdf/1607.04606v2.Google Scholar

Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L. and Stoyanov, V. (2020), “Unsupervised Cross-lingual Representation Learning at Scale”, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, pp. 8440–8451.CrossRef Google Scholar

Conneau, A., Rinott, R., Lample, G., Williams, A., Bowman, S.R., Schwenk, H. and Stoyanov, V. (2018), “XNLI: Evaluating Cross-lingual Sentence Representations”, in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics.Google Scholar

Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019), “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, pp. 4171–4186.Google Scholar

Hinton, Geoffrey, Vinyals, Oriol and Dean, Jeff (2015), Distilling the Knowledge in a Neural Network, available at: https://arxiv.org/pdf/1503.02531.pdf.Google Scholar

Goldberg, Y. (2017), Neural Network Methods in Natural Language Processing, Morgan and Claypool Publishers.Google Scholar

Grave, E., Bojanowski, P., Gupta, P., Joulin, A. and Mikolov, T. (2018), “Learning Word Vectors for 157 Languages”, in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European Language Resources Association (ELRA), Miyazaki, Japan.Google Scholar

Kusner, M.J., Sun, Y., Kolkin, N.I. and Weinberger, K.Q. (2015), “From Word Embeddings to Document Distances”, in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, JMLR.org, pp. 957–966.Google Scholar

Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013), Efficient Estimation of Word Representations in Vector Space, available at: http://arxiv.org/pdf/1301.3781v3.Google Scholar

Reimers, N. and Gurevych, I. (2019), “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks”, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, pp. 3982–3992.Google Scholar

Reimers, N. and Gurevych, I. (2020), “Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation”, in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, pp. 4512–4525.Google Scholar

Roy, U., Constant, N., Al-Rfou, R., Barua, A., Phillips, A., & Yang, Y. (2020). LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 5919–5930. 10.18653/v1/2020.emnlp-main.477.CrossRef Google Scholar

Dutta, Sourav (2021), “Alignment is All You Need”: Analyzing Cross-Lingual Text Similarity for Domain-Specific Applications”.Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017), “Attention is All You Need”, in Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc, Red Hook, NY, USA, pp. 6000–6010.Google Scholar

Liu, Yinhan, Ott, Myle, Goyal, Naman, Du, Jingfei, Joshi, Mandar, Chen, Danqi, Levy, Omer, Lewis, Mike, Zettlemoyer, Luke and Stoyanov, Veselin (2019), RoBERTa: A Robustly Optimized BERT Pretraining Approach, available at: https://arxiv.org/pdf/1907.11692v1.pdf.Google Scholar

Article contents

Evaluate Similarity of Requirements with Multilingual Natural Language Processing

Abstract

Keywords

Information

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests