Hostname: page-component-6766d58669-zlvph Total loading time: 0 Render date: 2026-05-22T23:31:54.190Z Has data issue: false hasContentIssue false

Use case identification of natural language system requirements with graph-based clustering

Published online by Cambridge University Press:  21 July 2025

Simon Schleifer*
Affiliation:
Engineering Design, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Adriana Lungu
Affiliation:
Technical Development, AUDI AG , Ingolstadt, Germany
Benjamin Kruse
Affiliation:
Technical Development, AUDI AG , Ingolstadt, Germany
Sebastiaan van Putten
Affiliation:
Technical Development, AUDI AG , Ingolstadt, Germany
Stefan Goetz
Affiliation:
Engineering Design, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Sandro Wartzack
Affiliation:
Engineering Design, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
*
Corresponding author Simon Schleifer schleifer@mfk.fau.de
Rights & Permissions [Opens in a new window]

Abstract

Due to the ever-increasing complexity of technical products, the quantity of system requirements, which are typically expressed in natural language, is inevitably rising. Model-based formalization through the application of Model-based Systems Engineering is a common solution to cope with rising complexity. Thereby, grouping requirements to use cases forms the first step towards model-based requirements and allows to improve the understanding of the system. To support this manual and subjective task, automation by artificial intelligence and methods of natural language processing are needed. This contribution proposes a novel pipeline to derive use cases from natural language requirements by considering incomplete manual mappings and the possibility that one requirement contributes to multiple use cases. The approach utilizes semi-supervised requirements graph generation with subsequent overlapping graph clustering. Each identified use case is described by keyphrases to increase accessibility for the user. Industrial requirement sets from the automotive industry are used to evaluate the pipeline in two application scenarios. The proposed pipeline overcomes limitations of prior work in the practical application, which is emphasized by critical discussions with experts from the industry. The proposed pipeline automatically generates proposals for use cases defined in the requirement set, forming the basis for use case diagrams.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Sources of system requirements (a) as a basis for product development (b) highlighting the use case identification on the system level as the focus of this article (c).

Figure 1

Figure 2. Framework for use case diagram derivation according to Schleifer et al. (2024) including Functional Requirements (FRs) and Non-Functional Requirements (NFRs).

Figure 2

Table 1. Extract of the results of a feature without extensive ground truth data

Figure 3

Figure 3. Overview of the proposed pipeline for grouping natural language requirements to system use cases.

Figure 4

Figure 4. Detailed view of the pre-processing steps.

Figure 5

Figure 5. Formalization of a requirements specification with a requirements graph.

Figure 6

Figure 6. Detailed view of the use case identification.

Figure 7

Figure 7. Overview of the conducted experiments (Exp).

Figure 8

Figure 8. Metrics for a feature with extensive ground truth data for two different scenarios: $ {r}_{sim}\approx 20\% $ and $ {r}_{sim}\approx 50\% $ simulated semi-supervision ratio.