Hostname: page-component-89b8bd64d-z2ts4 Total loading time: 0 Render date: 2026-05-09T07:52:00.672Z Has data issue: false hasContentIssue false

Precision of artificial intelligence in paediatric cardiology multimodal image interpretation

Published online by Cambridge University Press:  11 November 2024

Michael N. Gritti*
Affiliation:
Division of Cardiology, The Labatt Family Heart Centre, The Hospital for Sick Children, Toronto, Ontario, Canada Department of Paediatrics, University of Toronto, Toronto, Ontario, Canada Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
Rahil Prajapati
Affiliation:
Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
Dolev Yissar
Affiliation:
Division of Cardiology, The Labatt Family Heart Centre, The Hospital for Sick Children, Toronto, Ontario, Canada
Conall T. Morgan
Affiliation:
Division of Cardiology, The Labatt Family Heart Centre, The Hospital for Sick Children, Toronto, Ontario, Canada Department of Paediatrics, University of Toronto, Toronto, Ontario, Canada Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
*
Corresponding author: Michael Gritti; Email: michael.gritti@sickkids.ca
Rights & Permissions [Opens in a new window]

Abstract

Multimodal imaging is crucial for diagnosis and treatment in paediatric cardiology. However, the proficiency of artificial intelligence chatbots, like ChatGPT-4, in interpreting these images has not been assessed. This cross-sectional study evaluates the precision of ChatGPT-4 in interpreting multimodal images for paediatric cardiology knowledge assessment, including echocardiograms, angiograms, X-rays, and electrocardiograms. One hundred multiple-choice questions with accompanying images from the textbook Pediatric Cardiology Board Review were randomly selected. The chatbot was prompted to answer these questions with and without the accompanying images. Statistical analysis was done using X2, Fisher’s exact, and McNemar tests. Results showed that ChatGPT-4 answered 41% of questions with images correctly, performing best on those with electrocardiograms (54%) and worst on those with angiograms (29%). Without the images, ChatGPT-4’s performance was similar at 37% (difference = 4%, 95% confidence interval (CI) –9.4% to 17.2%, p = 0.56). The chatbot performed significantly better when provided the image of an electrocardiogram than without (difference = 18, 95% CI 4.0% to 31.9%, p < 0.04). In cases of incorrect answers, ChatGPT-4 was more inconsistent with an image than without (difference = 21%, 95% CI 3.5% to 36.9%, p < 0.02). In conclusion, ChatGPT-4 performed poorly in answering image-based multiple-choice questions in paediatric cardiology. Its accuracy in answering questions with images was similar to without, indicating limited multimodal image interpretation capabilities. Substantial training is required before clinical integration can be considered. Further research is needed to assess the clinical reasoning skills and progression of ChatGPT in paediatric cardiology for clinical and academic utility.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Proportion of questions correct by ChatGPT-4 with and without providing the accompanying image, stratified by image type of multimodal imaging typically performed in paediatric cardiology.

Figure 1

Table 1. Number of correctly answered questions by ChatGPT-4 when provided the accompanying image, stratified by image type

Figure 2

Table 2. Number of correctly answered questions by ChatGPT-4 with and without providing the accompanying image, stratified by chapter of the Pediatric Cardiology Board Review book18

Figure 3

Figure 2. Proportion of questions correct by ChatGPT-4 when the provided the accompanying echocardiogram, stratified by echocardiogram image type.