Hostname: page-component-77c78cf97d-lphnv Total loading time: 0 Render date: 2026-04-24T08:58:07.947Z Has data issue: false hasContentIssue false

A geometric and dosimetric comparison of three AI-based autocontouring packages in the head and neck region

Published online by Cambridge University Press:  06 August 2025

Jussi Sillanpaa*
Affiliation:
Dept. of Radiation Oncology, University of Minnesota, 420 Delaware St SE, Minneapolis, Minnesota, MN USA
Amit Sood
Affiliation:
Dept. of Radiation Oncology, University of Minnesota, 420 Delaware St SE, Minneapolis, Minnesota, MN USA
Margaget Reynolds
Affiliation:
Dept. of Radiation Oncology, University of Minnesota, 420 Delaware St SE, Minneapolis, Minnesota, MN USA
*
Corresponding author: Jussi Sillanpaa; Email: silla032@umn.edu
Rights & Permissions [Opens in a new window]

Abstract

Introduction:

AI-based autocontouring products claim to be able to segment organs with accuracy comparable to humans. We compare the geometric and dosimetric performance of three AI-based autocontouring packages (Autocontour 2.5.6, (“RF”); Annotate 2.3.1, (“TP”) and RT-Mind_AI 1.0, (“MM”)) in the head and neck region.

Methods:

We generated 14 organ at risk (OAR) autocontours on 13 computed tomography (CT) image sets. They were compared with clinical (human-generated) contours. The geometric differences were quantified by calculating Dice coefficients and Hausdorff distances. The autocontours were compared visually with the clinical controus by an expert physician. The autocontour sets were also ranked for accuracy by two physicians. The dosimetric effects were evaluated by recalculating treatment plans on the autocontoured CT sets.

Results:

RF and TP slightly outperformed MM in geometric metrics (the percentage of OARs having mean Dice coefficients > 0.7 was RF 57.1 %, TP 64.3 % and MM 50.0%). The physician judged RF and TP contours to be more anatomically accurate, on average, than the manual contours (manual contour mean accuracy score 2.49, RF 2.28, MM 3.24, TP 1.93). The mean scores given to the autocontours by the two physicians were better for RF and TP, compared to MM (RF 1.86, MM 2.36, TP 1.77). The dosimetric differences were similar for all three programs and were not strongly correlated with the geometric differences.

Conclusions:

The performance of the three autocontouring packages in the head and neck region is similar, with TP and RF slightly outperforming MM. The correlation between geometric and dosimetric metrics is not strong, and dosimetric evaluation is therefore recommended before clinical use of autocontouring software.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Table 1. The mean dice similarity coefficient and standard deviation of the autocontours (N = number of patients with the OAR contoured), the best value for each OAR in bold

Figure 1

Table 2. The mean HD95 and standard deviation [mm] of the autocontours (N = number of patients with the OAR contoured), the best value for each organ in bold

Figure 2

Table 3A. Anatomical accuracy of the contours, compared to clinical contours (a lower number denotes higher accuracy, the best value for each organ in bold)

Figure 3

Table 3B. Anatomical accuracy of the autocontours (mean of scores from two physicians). The best value for each organ in bold

Figure 4

Table 4. Mean changes in DVH metrics for clinical treatment plans recalculated on autocontour sets. The smallest absolute mean change is printed in bold

Figure 5

Figure 1. Parotid, oral cavity and spinal cord contours for a sample patient. Clinical contours: green, RF: blue, TP: yellow, MM: orange.

Figure 6

Figure 2. Comparison of DSC and HD95. Top: HD95 and DSC for RF; Middle: DSCs of MM and TP, compared to RF; Bottom: HD95s of MM and TP, compared to RF.

Figure 7

Figure 3. Comparison of absolute relative changes in dosimetry with DSC and HD95. Top: Change in spinal cord D_MAX, compared to DSC; second from Top: Change in spinal cord D_MAX, compared to HD95; Third from Top: Change in left parotid D50, compared to DSC; Bottom: Change in left parotid D50, compared to HD95.