Hostname: page-component-77f85d65b8-t6st2 Total loading time: 0 Render date: 2026-03-28T06:24:28.649Z Has data issue: false hasContentIssue false

A deep learning method to predict ankle joint moment during walking at different speeds with ultrasound imaging: A framework for assistive devices control

Published online by Cambridge University Press:  06 September 2022

Qiang Zhang*
Affiliation:
Joint Department of Biomedical Engineering, North Carolina State University, Raleigh, NC, USA Joint Department of Biomedical Engineering, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Natalie Fragnito
Affiliation:
Joint Department of Biomedical Engineering, North Carolina State University, Raleigh, NC, USA Joint Department of Biomedical Engineering, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Xuefeng Bao
Affiliation:
Biomedical Engineering Department, University of Wisconsin-Milwaukee, Milwaukee, WI, USA
Nitin Sharma
Affiliation:
Joint Department of Biomedical Engineering, North Carolina State University, Raleigh, NC, USA Joint Department of Biomedical Engineering, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
*
*Author for correspondence: Qiang Zhang, Email: qzhang25@ncsu.edu

Abstract

Robotic assistive or rehabilitative devices are promising aids for people with neurological disorders as they help regain normative functions for both upper and lower limbs. However, it remains challenging to accurately estimate human intent or residual efforts non-invasively when using these robotic devices. In this article, we propose a deep learning approach that uses a brightness mode, that is, B-mode, of ultrasound (US) imaging from skeletal muscles to predict the ankle joint net plantarflexion moment while walking. The designed structure of customized deep convolutional neural networks (CNNs) guarantees the convergence and robustness of the deep learning approach. We investigated the influence of the US imaging’s region of interest (ROI) on the net plantarflexion moment prediction performance. We also compared the CNN-based moment prediction performance utilizing B-mode US and sEMG spectrum imaging with the same ROI size. Experimental results from eight young participants walking on a treadmill at multiple speeds verified an improved accuracy by using the proposed US imaging + deep learning approach for net joint moment prediction. With the same CNN structure, compared to the prediction performance by using sEMG spectrum imaging, US imaging significantly reduced the normalized prediction root mean square error by 37.55% ($ p $ < .001) and increased the prediction coefficient of determination by 20.13% ($ p $ < .001). The findings show that the US imaging + deep learning approach personalizes the assessment of human joint voluntary effort, which can be incorporated with assistive or rehabilitative devices to improve clinical performance based on the assist-as-needed control strategy.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Table 1. Anthropometric characteristics (mean and one standard deviation [SD]) of eight young participants

Figure 1

Figure 1. Experimental setup of treadmill walking. (a) Illustration of treadmill walking experimental setup. The walking was performed at speeds of 0.50, 0.75, 1.00, 1.25, and 1.50 m/s. (1) Instrumented treadmill containing two split belts and in-ground force plates. (2) Participants’ lower body with 39 retro-reflective markers attached for kinematics measurements. (3) Three sEMG channels to measure activities of LGS, MGS, and SOL muscles. (4) An ultrasound transducer for imaging of both the LGS and SOL muscles within the same plane. (5) The ultrasound imaging machine for collection of the ultra-fast radio frequency data. (6) A computer screen to show brightness mode (B-mode) US imaging. (7) Computer screen to show live markers and segment links of the participant. (8) 12 motion capture cameras to track markers’ trajectories. (b) A representative B-mode US image with both LGS and SOL muscle in the same plane, as indicated within the upper and lower polygons. The lateral direction is the distance away from the US transducer longitudinal center, and the axial direction is the depth from the skin surface. Three red dashed square areas represent the three regions of interest with a size of 100 × 100, 200 × 200, and 300 × 300 pixels.

Figure 2

Figure 2. Data collection, pre-processing, and schematic illustration of the proposed deep CNN model calibration. The input to the CNN model includes either the time sequence data of cropped US imaging with different regions of interest or the time sequence data of sEMG spectrum imaging. Thirty-one layers were created in the designed CNN model, including one image input layer at the beginning, one fully connected layer, one regression output layer at the end, and seven sets of intermediate layers. Each intermediate set contained one convolution 2D layer, one batch normalization layer, one rectified linear unit (ReLU) layer, and one average pooling 2D layer.

Figure 3

Figure 3. The architecture of the designed deep CNN for the US image and sEMG spectrum image processing. The output size of each layer is based on the input US image’s ROI size of 300 × 300 pixels.

Figure 4

Figure 4. Individual US imaging frames in CNN training, validation, and prediction procedures across five walking speeds.

Figure 5

Figure 5. The convergence performance of US imaging-based net plantarflexion moment RMSE and loss with the increase of iteration number during the CNN training and validation procedures. The data set shown here is from Participant Sub08.

Figure 6

Figure 6. Ankle joint net plantarflexion moment prediction time sequence on each participant by using US images with ROI of 100 × 100 pixels and the deep learning approach. The red solid and blue dashed curves represent the measurements from inverse dynamics and prediction from the CNN model. For each walking speed, three walking stance cycles are included for prediction, therefore, 15 periodic curves are shown for each participant (with the speed order of 0.50, 0.75, 1.00, 1.25, and 1.50 m/s).

Figure 7

Figure 7. Ankle joint net plantarflexion moment prediction as a percentage of the stance cycle (0% for heel-strike and 100% for toe-off) by using US images with ROI of 100 × 100 pixels and the deep learning approach. The red and blue center curves and shadowed areas represent the mean and standard deviation values (three stance cycles for each curve) of the ground truth and CNN model-based prediction, respectively. Each row subplots represent data from individual participant while each column subplots represent individual walking speed out of five.

Figure 8

Figure 8. Scatter plots between the net plantarflexion moment benchmark and CNN-based prediction from Sub01 by using ROIs of 100 × 100 (left), 200 × 200 (middle), and 300 × 300 (right) pixels.

Figure 9

Table 2. Results of linear regression analysis between net plantarflexion moment CNN-based prediction and ground truth with different ROIs, including mean, standard error (SE), and p-value of slope and y-intercept coefficients, as well as R2 values

Figure 10

Figure 9. The individual net plantarflexion moment prediction RMSE and N-RMSE values of 15 stance cycles across five walking speeds by using the trained personalized CNN model with different ROIs.

Figure 11

Figure 10. CNN-based net plantarflexion moment prediction results summary across eight participants. Left – Prediction RMSE values normalized to corresponding peak plantarflexion moment, Middle – $ {R}^2 $ values between net plantarflexion moment prediction and ground truth observation from inverse dynamics, Right – Prediction time cost for each US image frame. Asterisks *, **, and *** represent the statistically significant difference levels at p < .05, p < .01, and p < .001, respectively.

Figure 12

Figure 11. Comparative results of scatter plots between the net plantarflexion moment benchmark and CNN-based prediction: (a) with sEMG spectrum image on Sub01; (b) with US image on Sub01; (c) with sEMG spectrum image on Sub02; (d) with US image on Sub02.

Figure 13

Figure 12. Comparative results of the net plantarflexion moment prediction N-RMSE and $ {R}^2 $ values by using the proposed deep CNN architecture and the same ROI size US image and sEMG spectrum image.

Figure 14

Table 3. The comparison results of the normalized prediction N-RMSE values among different studies

Supplementary material: PDF

Zhang et al. supplementary material

Zhang et al. supplementary material

Download Zhang et al. supplementary material(PDF)
PDF 113.8 KB