MVFD-Net: multi-view fusion detection network for occluded underwater dam cracks

Yukai Wu; Xiaochen Qin; Lei Cai

doi:10.1017/S0263574725101835

MVFD-Net: multi-view fusion detection network for occluded underwater dam cracks

Published online by Cambridge University Press: 24 June 2025

and

Yukai Wu: Affiliation:
School of Mechanical and Electrical Engineering, Henan University of Technology, ZHengzhou, PR China
Xiaochen Qin: Affiliation:
School of Mechanical and Electrical Engineering, Henan University of Technology, ZHengzhou, PR China
Lei Cai*: Affiliation:
School of Artificial Intelligence, Henan Institute of Science and Technology, Xinxiang, PR China
*: Corresponding author: Lei Cai; Email: cailei2014@126.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Detecting cracks in underwater dams is crucial for ensuring the quality and safety of the dam. However, underwater dam cracks are easily obscured by aquatic plants. Traditional single-view visual inspection methods cannot effectively extract the feature information of the occluded cracks, while multi-view crack images can extract the occluded target features through feature fusion. At the same time, underwater turbulence leads to nonuniform diffusion of suspended sediments, resulting in nonuniform flooding of image feature noise from multiple viewpoints affecting the fusion effect. To address these issues, this paper proposes a multi-view fusion network (MVFD-Net) for crack detection in occluded underwater dams. First, we propose a feature reconstruction interaction encoder (FRI-Encoder), which interacts the multi-scale local features extracted by the convolutional neural network with the global features extracted by the transformer encoder and performs the feature reconstruction at the end of the encoder to enhance the feature extraction capability and at the same time in order to suppress the interference of the nonuniform scattering noise. Subsequently, a multi-scale gated adaptive fusion module is introduced between the encoder and the decoder for feature gated fusion, which further complements and recovers the noise flooding detail information. Additionally, this paper designs a multi-view feature fusion module to fuse multi-view image features to restore the occluded crack features and achieve the detection of occluded cracks. Through extensive experimental evaluations, the MVFD-Net algorithm achieves excellent performance when compared with current mainstream algorithms.

Keywords

semantic segmentation multi-view fusion network feature nonuniform scattering noise reconstruction and fusion gated adaptive fusion

Information

Type: Research Article
Information: Robotica , Volume 43 , Issue 7 , July 2025 , pp. 2480 - 2503

DOI: https://doi.org/10.1017/S0263574725101835 [Opens in a new window]
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Mucolli, L., Krupinski, S., Maurelli, F., Mehdi, S. A. and Mazhar, S., “Detecting cracks in underwater concrete structures: An unsupervised learning approach based on local feature clustering,” MTS/IEEE Seattle 23(1), 1–8 (2019).Google Scholar

Chen, D., Huang, B. and Kang, F., “A review of detection technologies for underwater cracks on concrete dam surfaces,” Appl. Sci. 13(6), 3564–3578 (2023).CrossRef Google Scholar

Jian, M., Yang, N., Tao, C., Zhi, H. and Luo, H., “Underwater object detection and datasets: A survey,” Intelligent Marine Technology and Systems 2(1), 9–22 (2024).CrossRef Google Scholar

Wu, Y., Li, S., Zhang, J., Li, Y., Li, Y. and Zhang, Y., “Dual attention transformer network for pixel-level concrete crack segmentation considering camera placement,” Autom. Constr. 157(3), 105166–105178 (2024).CrossRef Google Scholar

Beyene, D. A., Tran, D. Q., Maru, M. B., Kim, T. and Park, S., “Unsupervised domain adaptation-based crack segmentation using transformer network,” J. Build. Eng. 15(2), 107889–107990 (2023).CrossRef Google Scholar

Zhang, X., Bai, C. and Kpalma, K., “OMCBIR: Offline mobile content-based image retrieval with lightweight CNN optimization,” Displays 76(6), 102355–102368 (2021).CrossRef Google Scholar

Yang, Z., Zhu, L., Wu, Y. and Yang, Y., “Gated Channel Transformation for Visual Recognition,” 33st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, Seattle, WA, USA, 2020), pp. 11791–11800.Google Scholar

Wang, J., Zeng, Z., Sharma, P. K., Alfarraj, O., Tolba, A., Zhang, J. and Wang, L., “Dual-path network combining CNN and transformer for pavement crack segmentation,” Autom. Constr. 158(64), 105217–105239 (2024).CrossRef Google Scholar

Zhu, Z., Huang, S., Xie, J., Meng, Y., Wang, C. and Zhou, F., “A refined robotic grasp detection network based on coarse-to-fine feature and residual attention,” Robotica. 43(2), 1–18 (2024).Google Scholar

Ke, L., Tai, Y. W. and Tang, C. K., “Deep Occlusion-Aware Instance Segmentation with Overlapping Bilayers,” 34st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, Nashville, TN, USA, 2021), pp. 4018–4027 Google Scholar

Yuan, X., Kortylewski, A., Sun, Y. and Yuille, A., “Robust Instance Segmentation Through Reasoning About Multi-Object Occlusion,” 34st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, Nashville, TN, USA, 2021), pp. 11141–11150.Google Scholar

Zhang, T., Dai, J., Song, W., Zhao, R. and Zhang, B., “OSLPNet: A neural network model for street lamp post extraction from street view imagery,” Expert Syst. Appl. 231(25), 120764–120782 (2023).CrossRef Google Scholar

Gan, H., Menegon, F., Sun, A., Scollo, A., Jiang, Q., Xue, Y. and Norton, T., “Peeking into the unseen: Occlusion-resistant segmentation for preweaning piglets under crushing events,” Comput. Electron. Agric. 219(23), 108683–108693 (2024).CrossRef Google Scholar

Wang, H., Zhu, S., Chen, L., Li, Y. and Cai, Y., “OccludedInst: An efficient instance segmentation network for automatic driving occlusion scenes,” IEEE Trans. Emerg. Top. Comput. Intell. 10(12), 3414948–3414962 (2024).Google Scholar

Yan, X., Wang, F., Liu, W., Yu, Y., He, S. and Pan, J., “Visualizing the Invisible: Occluded Vehicle Segmentation and Recovery,” 32st IEEE/CVF International Conference on Computer Vision (ICCV) (IEEE, Seoul, Korea, 2019), pp. 7617–7626.Google Scholar

Guo, Y., Yu, H., Xie, S., Ma, L., Cao, X. and Luo, X., “DSCA: A dual semantic correlation alignment method for domain adaptation object detection,” Pattern Recognit 150(12), 110329–110345 (2024).CrossRef Google Scholar

Dong, N., Yan, S., Tang, H., Tang, J. and Zhang, L., “Multi-view information integration and propagation for occluded person re-identification,” Inf. Fusion 104(22), 102201–102221 (2024).CrossRef Google Scholar

Xia, Z., Liao, M., Di, S., Zhao, Y., Liang, W. and Xiong, N., “Automatic liver segmentation from CT volumes based on multiview information fusion and condition random fields,” Opt. Laser Technol. 179(65), 111298–111320 (2024).CrossRef Google Scholar

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S. and Bengio, Y., “Generative Adversarial Nets,” 27th International Conference on Neural Information Processing Systems-Volume 2 (NIPS’14) (NIPS, Cambridge, MA, USA, 2014), pp. 2672–2680.Google Scholar

Isola, P., Zhu, J. Y., Zhou, T. and Efros, A., “Image-to-Image Translation with Conditional Adversarial Networks,” 30st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, Honolulu, HI, USA, 2017), pp. 5967–5976.Google Scholar

Zhou, T., Tulsiani, S., Sun, W., Malik, J. and Efros, A., “View Synthesis by Appearance Flow,” Computer Vision–ECCV 2016 (Springer, Amsterdam, The Netherlands, 2016), pp. 286–301.CrossRef Google Scholar

Yang, C., Wang, K., Wang, Y., Dou, Q., Yang, X. and Shen, W., “Efficient deformable tissue reconstruction via orthogonal neural plane,” IEEE Trans. Med. Imaging 43(11), 3211–3223 (2024).CrossRef Google Scholar PubMed

Choy, C. B., Xu, D., Gwak, J., Chen, K. and Savarese, S., “3D-R2N2: A Unified Approach for Single and Multi-View 3D Object Reconstruction,” Computer Vision–ECCV 2016 (Springer, Amsterdam, The Netherlands, 2016), pp. 628–644.CrossRef Google Scholar

Yang, J., Zheng, W. S., Yang, Q., Chen, Y. C. and Tian, Q., “Spatial-temporal graph convolutional network for video-based person re-identification,” 33st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, Seattle, WA, USA, 2022), pp. 3289–3299.Google Scholar

Zhang, X., Zheng, Z., Gao, D., Zhang, B., Yang, Y. and Chua, T., “Multi-view consistent generative adversarial networks for compositional 3D-aware image synthesis,” Int. J. Comput. Vis. 131(26), 2219–2242 (2023).CrossRef Google Scholar

Arooj, S., Altaf, S., Ahmad, S., Mahmoud, H. and Mohamed, A. S. N., “Enhancing sign language recognition using CNN and SIFT: A case study on Pakistan sign language,” J. King Saud Univ.-Comput. Inf. Sci. 36(23), 101934–101962 (2024).CrossRef Google Scholar

Xu, H., Yuan, J. and Ma, J., “MURF: Mutually reinforcing multi-modal image registration and fusion,” IEEE Trans. Pattern Anal. Mach. Intell. 45(23), 12148–12166 (2023).CrossRef Google Scholar PubMed

Hua, Y., Huang, X., Li, H. and Cao, X., “Mobile robot tracking control based on lightweight network,” Robotica, 42(2), 1–19 (2025).Google Scholar

Pan, J., Jia, J. and Cai, L., “Global enhancement network underwater archaeology scene parsing method,” Robotica 39(12), 3541–3564 (2023).CrossRef Google Scholar

Diao, S., Su, J., Yang, C., Zhu, W., Xiang, D., Chen, X. and Shi, F., “Classification and segmentation of OCT images for age-related macular degeneration based on dual guidance networks,” Biomed. Signal Process. Control 84(35), 104810–104830 (2023).CrossRef Google Scholar

Liu, Y., Wang, H., Chen, Z., Huangliang, K. and Zhang, H., “TransUNet+: Redesigning the skip connection to enhance features in medical image segmentation,” Knowl.-Based Syst. 256(23), 109872–109889 (2022).CrossRef Google Scholar

Liu, H., Yang, J., Miao, X., Mertz, C. and Kong, H., “Crackformer network for pavement crack segmentation,” IEEE Trans. Intell. Transp. Syst. 24(14), 9240–9252 (2023).CrossRef Google Scholar

Li, K., Yang, J., Ma, S., Wang, B. and Wang, S., “Rethinking lightweight convolutional neural networks for efficient and high-quality pavement crack detection,” IEEE Trans. Intell. Transp. Syst. 23(16), 237–250 (2024).CrossRef Google Scholar

Badrinarayanan, V., Kendall, A. and Cipolla, R., “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017).CrossRef Google Scholar PubMed

He, Z., Cao, L., Luo, J., Xu, X., Tang, J., Xu, J. and Chen, Z., “UISS-Net: Underwater image semantic segmentation network for improving boundary segmentation accuracy of underwater images,” Aquac. Int. 32(12), 5625–5638 (2024).CrossRef Google Scholar

Article contents

MVFD-Net: multi-view fusion detection network for occluded underwater dam cracks

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests