1. Introduction
Bolted joints, the most versatile machinery element (Reference Roloff, Spura, Fleischer, Wittel and JannaschRoloff et al., 2023), are essential in technical devices and everyday applications, functioning as detachable connections between components (Reference Kloos and ThomalaKloos and Thomala, 2007). These joints facilitate load transmission between connected components (VDI 2230, 2015) by creating a preload force, achieved through the application of torque to either the head of the bolt or the nut (Reference ObergOberg, 2013). In the design process of bolted joints subjected to tensile loads, the preload force is a critical parameter influencing their functional behavior and reliability (Reference BickfordBickford, 1995). The friction coefficients in the head and thread regions of the bolt are equally significant, as the majority of the tightening torque is expended in overcoming friction in these areas (VDI 2230, 2024). This friction not only contributes to the generation of preload force but also provides a self-locking effect, ensuring the joint remains secure under working conditions. As external loads are applied, the joint transitions through elastic and plastic deformation stages, potentially culminating in failure if critical load thresholds are exceeded (Reference Steinhilper, Sauer and FeldhusenSteinhilper et al., 2012). Poorly designed bolts may experience excessive stresses during tightening, increasing the risk of joint failure, equipment malfunction, or operational downtime (Reference Kloos and ThomalaKloos and Thomala, 2007). Thus, understanding the relationships between tightening variables like preload force and functional behavior parameters namely load-bearing capacity, and friction coefficients accurately and efficiently is essential for achieving a reliable design.
1.1. Related work
Bolted joint behavior has traditionally been analyzed using a variety of modeling approaches. Analytical, numerical, empirical, and data-driven methods have all been extensively applied to investigate the behavior of bolted joints. (VDI 2230, 2015; Reference Chiang and ChiangChiang and Chiang, 2023), each offering distinct strengths and facing inherent limitations. This section provides an overview of these approaches, highlighting their contributions and challenges, and identifies the research gap that motivates the proposed hybrid methodology. Analytical methods approximate bolted joint behavior through simplified mechanical representations, focusing on specific aspects of bolted joints. For instance, Reference Lee, Kim and SeokLee et al. (2022) evaluates the tightening torque-clamping force relationship and friction coefficients, validating theoretical calculations through experimental measurements. Reference Zeng, Lu and PaavolaZeng et al. (2020) proposed a mathematical equation, derived from mechanical behavior, to predict the load-bearing capacity of beam-to-column joints in slim-floor composite frames. While analytical methods provide valuable insights into fundamental mechanics, they often lack the ability to capture the complex, non-linear interactions inherent in bolted joints.
To address these limitations, numerical methods such as finite element analysis (FEA), enable detailed representation of nonlinear material behavior, constraints, and complex geometries (VDI 2230, 2014). They have been widely applied to study the mechanical behavior of bolted joints. Reference Fukuoka and TakakiFukuoka and Takaki (1998) focused on tension and torque characteristics during and after tightening through experimental and numerical methods. Reference Yu, Zhou and WangYu et al. (2015) explored how factors like friction, pitch, elastic modulus, and strain-hardening impact the tightening torque and initial load in bolted connections, using FEA to model these influences. Similarly, Reference Liang, Wang, Yin and HaoLiang et al. (2024) examined load capacity and fracture behavior in flange bolts under various load cases, providing insights into joint performance under different conditions. Although numerical approaches like FEA excel at modeling detailed mechanical behaviors, they often focus on idealized conditions, demand significant computational resources and rely heavily on the quality of the model and the expertise of the user to achieve accurate predictions (VDI 2230, 2014).
To bridge the gap between simulation and real-world performance, empirical methods such as Design of Experiments (DoE) have been employed to systematically investigate factors influencing bolted joint behavior. Nassar et al. (Reference Nassar, El-Khiamy, Barber, Zou and Sun2005, Reference Nassar, Ganeshmurthy, Ranganathan and Barber2007) studied the effects of friction coefficients, tightening speed, and coating on the torque-tension relationship and wear patterns, while Reference Holch, Glienke, Dörre and HenkelHolch et al. (2023) examined load-bearing behavior in bolts and lockbolts under combined loading. While DoE offers a systematic approach to studying the effects of multiple factors on bolted joint behavior, and capable of accounting for non-linearities to some degree, its implementation is not without challenges. The method requires carefully controlled experimental conditions to ensure repeatability and reliability of results (Reference Chiang and ChiangChiang and Chiang, 2023). As the number of variables increases, the experimental design becomes increasingly complex, often demanding significant time and resources to execute (Reference Selvamuthu and DasSelvamuthu and Das, 2024). Additionally, external factors can introduce variability, reducing the robustness of the findings. The dependency of results on specific factors further restricts their applicability across different configurations (Reference Wettstein, Kretschmer and MatthiesenWettstein and Matthiesen, 2020). Although their widespread use, they face inherent challenges in accurately predicting the functional behavior variables of bolted joints. Nonetheless, DoE remains a valuable method when used with other approaches to validate findings and enhance model accuracy (Reference Afifi, Kaiser, Andreas, Gisela and MatthiesenAfifi et al., 2024).
Furthermore, data-driven predictive modeling has emerged as a promising alternative to overcome these limitations. By leveraging relationships between input features and output variables, data-driven models can capture complex nonlinear behaviors and make predictions about unseen scenarios, enabling more accurate and efficient predictions and optimizations of bolted joint performance (Reference Montáns, Chinesta, Gómez-Bombarelli and KutzMontans et al., 2019). For instance, Reference Fernández-Ceniceros, Sanz-García, Antoñanzas-Torres and Martínez-de Pisón-AscacibarFernandez-Ceniceros et al. (2012) used a neural network ensemble with FEA data to predict load capacity, while Reference Zhong, Feng and ZhangZhong et al. (2021) combined FEA and neural networks for bearing capacity optimization in aluminum joints. Reference Fei, Pengdong and YongquanFei et al. (2016) modeled bolt force in flanges, analyzing the effects of bending and shear, and Reference Coelho, Machado, Dutkiewicz and TeloliCoelho et al. (2024) used Machine Learning (ML) to classify and quantify torque loss due to vibration. Other studies, such as those by Reference Yýldýrým, Akay, Gülısýk, Çoker, Gürses and KayranYildirim et al. (2019), Reference Ren and SunRen and Sun (2023), and Reference Li, Liu, Xue, Xue, Liao and ZhouLi et al. (2024), employed neural networks and ML algorithms to predict load capacity and other performance metrics in varied joint contexts, leveraging FEA and experimental data. Reference Chen, Zhang, Liu, Zhao, Zhou and ChenChen et al. (2020) and Reference Olejnik and AyankosoOlejnik and Ayankoso (2023) further integrated optimization algorithms with neural networks for enhanced predictive precision, and Reference Atta, Abd-Elhady, Abu-Sinna and SallamAtta et al. (2019) focused on predicting failure stages in bolted joints with a neural network. This approach not only enhances the ability to predict critical parameters but also offers significant advantages in terms of time efficiency, accuracy, and ease of implementation. Yet they require high-quality datasets for accurate training and validation. Recent research highlights the potential of combining empirical data with data-driven methods to address these challenges (Reference Afifi, Kaiser, Andreas, Gisela and MatthiesenAfifi et al., 2024). This study aims to explore this synergy, offering a framework that integrates experimental data with a feed-forward neural network model to enhance predictions of bolted joint performance.
1.2. Problem formulation and task description
The reviewed studies illustrate the potential of data-driven modeling to advance the analysis of bolted joints. While these approaches offer accurate predictions of critical parameters, their integration with empirical data remains underexplored. A significant gap exists in methodologies that effectively combine the strengths of empirical experimentation and data-driven techniques to simultaneously predict load capacity and friction coefficients while capturing the nonlinear behavior of bolted joints.
This research aims to address this gap by leveraging the complementary advantages of empirical data and data-driven predictive modeling to accurately and efficiently predict critical function behavior parameters. It focuses on bolted joint behavior under tensile loading conditions under a torque-controlled tightening process, employing a supervised feed-forward neural network trained on key parameters, including bolt size, strength grade, tightening torque, head torque, thread torque, and preload force, this effectively models nonlinear relationships between inputs and outputs in a computationally efficient manner. This approach overcomes the limitations of traditional methods like analytical models, DoE, and FEA, which are hindered by idealized assumptions, scalability issues, and high computational demands. By integrating empirical data with predictive modeling, the study enhances accuracy, efficiency, and reliability, contributing to practical design applications.
2. Feedforward neural network background
Data-driven predictive modeling aims to uncover relationships between input and output features, enabling accurate predictions on unseen data (Reference Montáns, Chinesta, Gómez-Bombarelli and KutzMontáns et al., 2019). Within this field, ML focuses on developing algorithms capable of learning patterns from data to make informed predictions, with supervised ML specifically predicting target variables by analyzing input-output pairs (Reference ZhouZhou, 2021; Reference LeCun, Bengio and HintonLeCun et al., 2015). Regression, as a key type of supervised learning problem, focuses on estimating numerical values based on input features (Reference Bengio, Goodfellow and CourvilleBengio et al., 2016). Linear regression handles simple relationships, whereas feed-forward neural networks handle complex patterns by processing information in a unidirectional flow, mapping inputs to outputs without cycles (Reference Yadav, Yadav, Kumar, Yadav, Yadav and KumarYadav et al., 2015). These networks feature a layered structure comprising an input layer, one or more hidden layers, and an output layer (Reference ZhouZhou, 2021). Each layer consists of interconnected units called neurons, which represent input and output features (Reference SantrySantry, 2024; Reference WythoffWythoff, 1993). The number of neurons in the hidden layers is determined by the network design and task complexity (Reference Bengio, Goodfellow and CourvilleBengio et al., 2016), enabling the processing of signals as outputs from one layer become inputs for the next (Reference SantrySantry, 2024; Reference Apicella, Donnarumma, Isgrò and PreveteApicella et al., 2021). Each layer‘s output is computed as the weighted sum of inputs plus a bias, with an activation function introducing non-linearity to enable complex mappings (Reference KuhnKuhn, 2013). Commonly used activation functions in neural networks include the sigmoid and Rectified Linear Unit (ReLU), while weight initialization methods often involve random or Xavier initialization (Reference SantrySantry, 2024).
In the context of optimization, the primary objective is to identify the optimal parameters of the network that minimize the loss function, which is a metric used to quantify the cost between the predicted values generated by the model and the actual target values (Reference Montesinos López, Montesinos López and CrossaMontesinos Lopez et al., 2022). Gradient descent is one of the most commonly used algorithms for optimization. It works by iteratively updating the model‘s parameters in the direction opposite to the gradient until reaching a local or global minimum (Reference RuderRuder, 2016). Adam, a gradient-based algorithm for stochastic objective functions, adapts the learning rate by computing individual scaling factors for different parameters based on exponentially moving averages of the first and second moments of the gradients (Reference Kingma and BaKingma and Ba, 2014). The Huber loss represents one of the frequently utilized loss functions (Reference Sadouk, Gadi and EssoufiSadouk et al., 2020). The principal objective of training a neural network is to fit its predicted outputs with the target outputs (Reference SantrySantry, 2024). The testing phase evaluates the trained model‘s accuracy by generating predictions from the input data and calculating the error between the predicted and target outputs. (Reference SantrySantry, 2024; Reference Bengio, Goodfellow and CourvilleBengio et al., 2016). Overfitting arises when a model performs exceptionally well on the training dataset but fails to generalize to unseen data, leading to poor performance on the test set. Conversely, underfitting occurs when a model performs poorly even on the training data set, suggesting that it has failed to capture the essential patterns in the data (Reference Müller and GuidoMüller and Guido, 2016). Generalization is reached, when the error has been reduced to an adequately low level, thereby indicating that the model has acquired the capacity to make accurate predictions (Reference SantrySantry, 2024).
3. Methodology
This section outlines the methodology for predicting key parameters in bolted joint design. Experimental data was processed and essential features were selected. A feed-forward neural network was implemented to model the complex relationships in bolted joint behavior. The fundamental methodology is illustrated in Figure 1.

Figure 1. Task Description
3.1. Data description and feature selection
To establish a rationale for the chosen data, it is essential to consider the factors influencing bolted joint performance. The friction at the bolt surface and the load distribution are influenced by the bolt size, with larger bolts promoting a more uniform load distribution and greater friction generation (VDI 2230, 2015). Additionally, the strength grade of a bolt plays a crucial role in determining its load capacity and how stress is distributed throughout the material (Reference Kloos and ThomalaKloos and Thomala, 2007). Furthermore, head torque and thread torque are both directly proportional to the friction coefficients. The remaining tightening torque contributes to the generation of preload force, with the applied torque magnitude directly affecting the load introduced to the bolt (VDI 2230, 2024). Based on these considerations, the selected input parameters are bolt size, strength grade, preload force, tightening torque, head torque, and thread torque, while the output parameters are those influencing the functional behavior of the bolted joint, specifically load capacity and head and thread friction coefficients.
The experimental data were obtained from two sets of controls with the described experimental setup in Reference Wettstein, Kretschmer and MatthiesenWettstein et al. (2020), each involving the continuous tightening of a bolted joint consisting of a bolt, two connected plates, a washer, and a nut. The initial dataset was gathered for an M10 8.8 ISO 4014 bolted connection, as described by Reference Wettstein, Kretschmer and MatthiesenWettstein and Matthiesen (2020). The second dataset involved an M6 8.8 ISO 4017 bolted joint, obtained with the same methodology. Time series data consisting of preload force, tightening torque, head torque, and thread torque were measured. For the M6 bolt, a preload force of 8 kN was used, with 20 samples collected. The M10 model utilized a preload force of 12.5 kN, with 9 samples available, while the M10 bolt with a preload force of 25 kN included 5 samples. From these values, the friction coefficients of the head and threads and the remaining load capacity were further empirically estimated. The set finally had 34 samples for training and testing the neural network.
3.2. Data preprocessing
The data was presented in tabular format, representing time series measurements of the variables. The dataset was divided into two subsets: a training (80%) and a test (20%) set. Input and output features were subsequently extracted according to predefined feature specifications. Normalization was applied as a scaling method to transform the input features, minimizing the influence of skewness and outliers while ensuring that the data remained within an appropriate range for the model. The scale derived from the training set was retained to ensure consistent application during subsequent stages of analysis. The preprocessing procedures applied to the test set mirrored those of the training set, with a critical distinction: any scaling of the test data was performed using the parameters obtained from the training set. This approach ensured that the model operated on data with a consistent scale, thereby preserving the validity of the training and evaluation processes.
3.3. Model architecture
The input layer contained six nodes, representing bolt size, strength grade, tightening torque, head torque, thread torque, and preload force. The neural network included two hidden layers, with neuron counts matching the input and output layers. The network‘s output layer has three nodes, corresponding to the output features: load capacity, head friction coefficient, and thread friction coefficient (Figure 2).

Figure 2. Visualization of the Used Network Architecture and the Input and Output Features
A total of 136 models were trained throughout the process. Four representative models were selected to illustrate the learning and optimization process. These models were evaluated with variations in key hyperparameters, including the activation function, weight initialization method, number of epochs, scaling method, units for preload force and load capacity, and the number of samples used (Table 1). The activation function determined the scale of outputs, while the weight initialization method affected training speed, convergence, and final accuracy. The number of epochs ensured sufficient training to generalize without overfitting. The scaling method aligned the data within an optimal range, reducing outliers and improving convergence. Choosing appropriate units for preload force and load capacity ensured output compatibility, and a larger dataset size enhanced pattern recognition.
Table 1. Hyperparameters of Chosen Models

The initialization stage involved configuring all hyperparameters prior to the training process, establishing the foundation for learning. These included the activation function, initialization method for weights and biases, network width and depth, proportion of data used for training and testing, number of epochs, batch size, loss function, optimization algorithm, and learning rate. The neural network structure was created by defining the total number of layers and the number of nodes per layer.
The training process commenced following the initialization of the neural network. The training dataset was loaded and shuffled to ensure varied data processing during each iteration, improving efficiency. The loss function, optimization algorithm, and learning rate were then configured. Over a specified number of epochs, the training loop iterated with a defined batch size, through the data. The network generated outputs from the input data in each iteration, calculated the loss, and updated learnable parameters based on the chosen optimization algorithm. Mean accuracy and loss values were computed and plotted after each epoch to track progress and evaluate performance throughout training. This process is described in Figure 3 by the blue and black path and by Algorithm 1.

Figure 3. Visualization of the Implemented Workflow
The testing process involved loading the unshuffled test dataset for evaluation, limited to a single epoch and batch due to the smaller size of the test set, ensuring computational efficiency. The neural network computed outputs based on the input data and model accuracy was assessed for each data point, with the mean accuracy calculated for the overall model accuracy. Predictions were deemed accurate if they deviated by no more than 5% from the target values. To evaluate performance, error metrics such as Mean Absolute Error (MAE), Mean Square Error (MSE), and Root Mean Square Error (RMSE) were computed for each data point. A scatter plot was also generated, displaying predicted outputs against target values, where accurate predictions aligned along a linear function through the origin. This process is described in Figure 3 by the green and black path and by Algorithm 2.
Algorithm 1 Training Framework

Algorithm 2 Testing Framework

The evaluation and hyperparameter tuning process involved analyzing the accuracy and loss curves from training, as well as the error metrics and accuracy curves from testing, to assess the model‘s alignment with the data. Hyperparameters were iteratively adjusted until satisfactory accuracy was achieved. Once optimized, the trained model parameters were saved for future use.
4. Results
As the hyperparameters tuning process progressed, the accuracy of each model demonstrated an incremental improvement, accompanied by an increase in the elapsed time required for completing all steps, from preprocessing to training. The final accuracy attained for Model 4 was 95.24%, with a generation time of 90 seconds (Table 2).
Table 2. Results of Chosen Models

The models were trained using Adam with a learning rate of 0.01, aiming to optimize convergence speed and minimize loss. The Huber loss function was applied to balance sensitivity to outliers and improve stability. The training was conducted with a batch size of 4, enabling incremental updates to the model‘s parameters. The Xavier weight initialization method was used to assign unique weights within a specific range, influencing the learning process by optimizing training speed, convergence rate, final accuracy, and overall model efficiency. Normalization was applied as a scaling method to minimize the influence of skewness and outliers, ensuring that input data remained within an appropriate range for the model. This preprocessing step optimized the activation function‘s performance, facilitating faster convergence and improving control over the data fed into the model. The bias was initialized to a constant value of zero for each neuron. After the hyperparameters tuning, the training curves followed the typical pattern for the loss and accuracy, with the loss decreasing logarithmically almost to zero and the accuracy increasing exponentially to a maximum of around 100% over the epochs (Figure 4). The loss between the predicted and actual values was nearly negligible for all metrics (MAE, MSE, RMSE), except one of the head friction coefficient values, which exhibited a considerable deviation from the other values as shown in Figure 5. Furthermore, in Figure 6 the predicted values for all output parameters aligned closely with the reference line through the origin for the scatter plot of the predicted to the target values. The load capacity and thread friction coefficients achieved 100% accuracy, while the head friction coefficients reached 85.71%, with only one value showing a minimal deviation from the line. All other values are within the 5% range of the blue line.

Figure 4. Results of Model 4: Training (a) Loss per Epoch (b) Accuracy per Epoch

Figure 5. Results of the Model 4: Loss (a) MAE Loss (b) MSE Loss (c) RMSE Loss
5. Discussion
This study achieves significant improvements in predicting load capacity and friction coefficients in bolted joints by addressing challenges in model architecture and preprocessing. A primary challenge was the scale discrepancy between the output variables, specifically load capacity and friction coefficients. Rescaling the load capacity to MN effectively aligned the scales, minimized noise in the loss function, and enhanced the stability of the training process, which ultimately led to the accuracy achieved. Furthermore, the adoption of the sigmoid activation function, appropriately matched to the range of the output variables, facilitated balanced prediction accuracy across all parameters. These results underscore the importance of systematic preprocessing and model optimization in developing robust predictive tools for complex engineering systems.

Figure 6 Results of Model 4: Test (a) Scatter of the Load Capacity (b) Scatter of the Head Friction Coefficients (c) Scatter of the Thread Friction Coefficients
The dataset used in this study, consisting of only 34 samples, presents limitations in terms of gener-alizability and the stability of results across different bolt configurations. Since the model was trained using only two bolt sizes and three preload force cases, its direct transferability to other configurations is not ensured. While the model demonstrated high accuracy for certain cases, the limited diversity of the dataset raises concerns about overfitting, and the performance should, therefore, be interpreted with caution. To improve the generalizability of the model, future work will focus on expanding the data set to include a wider variety of bolt configurations and operational conditions, ensuring that the model can make reliable predictions for bolts in active use. Furthermore, we plan to explore techniques to generate synthetic data and augment the dataset through rule-based formulations of bolt behavior like VDI 2230 (2015) or FEA (VDI 2230, 2014). This data-driven approach could prove valuable, as the acquisition of empirical data is both costly and time-consuming.
Finally, we believe that the accessibility of the model‘s design enables users to achieve accurate predictions without requiring advanced mechanical expertise. With the described methods in Reference Bengio, Goodfellow and CourvilleBengio et al. (2016) and Reference Montesinos López, Montesinos López and CrossaMontesinos Lopez et al. (2022), the parameter evolution remains traceable, incorporating knowledge of the initial parameters. Hence, a mathematical model is established for each individual node, ensuring a structured representation. The final plausibility of the results can be verified using standardized reference tables, like VDI 2230 (2015). This versatility, coupled with the model‘s potential for time-efficient prediction of output variables, underscores its practical application. Nevertheless, expanding the dataset remains a critical next step to improve prediction consistency, generalizability, and robustness across a broader range of bolted joint configurations.
6. Conclusion
This study presented a hybrid approach integrating empirical data with a supervised feed-forward neural network to predict critical performance metrics of bolted joints, specifically load capacity and head and thread friction coefficients. The proposed methodology effectively addresses key limitations of traditional approaches such as analytical simplifications, computational demands of numerical models, and the scalability challenges of empirical methods. By leveraging empirical data and a computationally efficient neural network architecture, the model captured the nonlinear relationships between input parameters, such as tightening torque and preload force, and output parameters namely remaining load-bearing capacity and friction coefficients with a predictive accuracy of 95.24%. The findings highlight the potential of data-driven modeling as a transformative tool for bolted joint design, particularly in its ability to combine experimental rigor with advanced predictive capabilities. However, limitations such as the small dataset size and the reliance on simplified loading conditions pose challenges to the model‘s generalizability across diverse scenarios. These limitations suggest the need for future research to expand the dataset, incorporate varied load cases (e.g., shear and combined loading), and explore alternative neural network architectures or hybrid machine learning models to enhance robustness and accuracy. This research bridges empirical experimentation and predictive modeling, offering a novel approach to bolted joint analysis. It integrates advanced computational techniques into engineering design, enabling more reliable and efficient optimization of bolted joint performance.
Acknowledgements
This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - “Functional modeling of bolted joints under uncertain product conditions for remanufacturing“ - with the project number 525034540.