Hostname: page-component-77f85d65b8-v2srd Total loading time: 0 Render date: 2026-03-28T12:20:28.210Z Has data issue: false hasContentIssue false

Segmentation method of U-net sheet metal engineering drawing based on CBAM attention mechanism

Published online by Cambridge University Press:  28 May 2025

ZhiWei Song
Affiliation:
Ocean College, Zhejiang University, Zhoushan, Zhejiang, 316021, China
Hui Yao
Affiliation:
School of Mechatronic Engineering, Xi’an Technological University, 710032, China
Dan Tian
Affiliation:
School of Mechatronic Engineering, Xi’an Technological University, 710032, China
Gaohui Zhan
Affiliation:
School of Mechatronic Engineering, Xi’an Technological University, 710032, China
Yajing Gu*
Affiliation:
Ocean College, Zhejiang University, Zhoushan, Zhejiang, 316021, China
*
Corresponding author: Yajing Gu; Email: guyj90@zju.edu.cn
Rights & Permissions [Opens in a new window]

Abstract

In this paper, an improved U-net welding engineering drawing segmentation model is proposed for the automatic segmentation and extraction of sheet metal engineering drawings in the process of mechanical manufacturing, to improve the cutting efficiency of sheet metal parts. To construct a high-precision segmentation model for sheet metal engineering drawings, this paper proposes a U-net jump structure with an attention mechanism based on the Convolutional Attention Module (CBAM) attention mechanism. At the same time, this paper also designs an encoder jump structure with vertical double pooling convolution, which fuses the features after maximum pooling+convolution of the high-dimensional encoder with the features after average pooling+convolution of the low-dimensional encoder. The method in this paper not only improves the global semantic feature extraction ability of the model but also reduces the dimensionality difference between the low-dimensional encoder and the high-dimensional decoder. Using Vgg16 as the backbone network, experiments verify that the IoU, mAP, and Accu indices of this paper’s method in the welding engineering drawing dataset segmentation task are 84.72%, 86.84%, and 99.42%, respectively, which are 22.10, 19.09 and 0.05 percentage points higher compared to the traditional U-net model, and it has a relatively excellent value in engineering applications.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use and/or adaptation of the article.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Comparison of our method (a) Skip connection structure of the original U-net model; (b) U-net model with residual network used as the skip connection; (c) U-net model with recurrent structures added to both the encoder and decoder; (d) with skip structure schemes of other models. Dashed lines denote skip connections.

Figure 1

Figure 2. CBAM architecture. This module comprises a channel module and a spatial attention module consecutively. The encoder feeds the double-pooled and convolutional features into this module, and the CBAM generates global features with channel and spatial location information. (n) and (n + 1) respectively represent the encoders of different vertical layers of U-net.

Figure 2

Figure 3. The improved model overall architecture is proposed in this paper (input raw image pixel 512x512). Green squares represent average pooling and two convolution operations ((all abbreviated as ‘Ave’ in the ablation experiments for ease of writing)), Orange squares represent the CBAM attention mechanism module, and indigo squares represent feature cluster integration. Different colored arrows indicate different operations.

Figure 3

Figure 4. Principle of attention mechanism. X is used as input to the multi-layer perceptron (MLP), and the feature X’ with channel attention information is generated through feature cluster multiplication and Softmax operation. X’ output features X” with channel spatial information through a similar operation of the spatial attention module.

Figure 4

Table 1. Training cost analysis and model mean accuracy comparison between Vgg16 and ResNet50 with different model structures

Figure 5

Figure 5. Improve the U-net model by cutting sheet metal specific contour mechanism. Segmentation extracts the specific unit of welding engineering graphics, and the cutting device automatically cuts the corresponding parts on the whole sheet metal relying on vision.

Figure 6

Table 2. Sources of welding engineering datasets and the number of datasets after data enhancement processing

Figure 7

Table 3. Hyperparameter values are used for all training

Figure 8

Table 4. The performance comparison results of various loss functions used by CBAM-U-net to deal with imbalanced datasets

Figure 9

Figure 6. The welding engineering atlas adopts a K-fold cross-training verification process, the training set and verification sets are 4:1, and the stratification factor is K = 5.

Figure 10

Table 5. Technical requirements detail sheet

Figure 11

Figure 7. The loss curve graph during training and the 50th epoch model reaches a state of convergence. When training to 50 epochs, the network starts to unfreeze the evaluation model. The model will be reloaded from its original form, and fluctuations will have no effect.

Figure 12

Figure 8. Visual comparison of segmentation effects between different methods. The original input is a welded structure drawing, and the second column is the ground truth mask. Where ‘Ave’ is denoted as the average pooling and convolution operations as green squares in Figure 3, CBAM is the attention module, as shown in the orange court in Figure 3.

Figure 13

Table 6. Comparison of welding engineering map segmentation by different methods, ‘Ave’ is denoted as average pooling and convolution operations, and CBAM is denoted as attention module

Figure 14

Figure 9. A graph of the loss curve of a continuous convolutional CNN. When the training has gone through 45 epochs, the model reaches the state of convergence.

Figure 15

Figure 10. Visualization of segmentation results for successive convolution operations. ‘Ave’ is represented as the average pooling and convolution operation of the green square in Figure 3, and CBAM is the attention module, as shown in the orange area in Figure 3.

Figure 16

Table 7. In the comparison of different methods for the segmentation results of specific welding engineering units, ‘Ave’ is expressed as the average pooling and convolution operation, and CBAM is the attention mechanism

Figure 17

Table 8. Experimental comparisons and analyses have been carried out using the method of this paper and the current state-of-the-art segmentation technique (SOTA), and the experimental results have been analyzed for different sets of sheet metal welding project drawings

Figure 18

Figure 11. Comparison of confusion matrix results between U-net and CBAM-U-net models ((a) U-net, (b) CBAM-U-net(Ours), (c) CNN, (d) CBAM-CNN).