The authors proposed a novel perimeter-based index (PBI) that was capable of evaluating the accuracy in the appraisal of auto-segmentation software. A quantitative value, that is time saved in editing the auto-segmented contours, was used to compare the effectiveness of two other commonly used indices in this study.
The relationship between the proposed index and the amount of the contouring time that could be saved was studied. The performances of two other commonly used similarity indices, namely Dice similarity coefficient (DSC) and the modified normalised average Hausdorff distance (MNAHD), were also evaluated. Ten nasopharyngeal cases and ten prostate cases that were previously treated with intensity-modulated radiation therapy technique were recruited as the validation cases in this study. Three observers were invited to contour four structures (bladder, rectum, brain stem and parotid gland) on computed tomography images of the validation cases without any aids. The time taken for contouring was recorded as the manual contouring time. By using an atlas-based auto-segmentation software, three sets of contours were generated for each validation case with different library sizes to produce different degrees of similarity level. The values of the three similarity indices of the auto-segmented contours were calculated. The observers were asked to edit the auto-segmented contours and the editing time was recorded.
The correlation between the editing time and the similarity indices was studied. The amount of time saved was calculated by subtracting the editing time from the manual contouring time. The performances of PBI, DSC and MNAHD were evaluated using Pearson correlation coefficient and receiver operating curve (ROC) analysis.
The PBI showed a positive linear relationship with the amount of contouring time saved. Pearson correlation coefficient ranged from 0·73 to 0·86 for the four structures. The PBI had a stronger correlation than the DSC in bladder and parotid gland, while there was no significant difference between the two indices in rectum and brain stem. The MNAHD had an inferior correlation than the proposed index. For the ROC analysis, the cut-off values for the PBI were 0·549, 0·401 and 0·301 for the three levels of contouring time saved, namely 50, 25 and 0%, respectively. The accuracy of PBI was over 77% and the Youden index was >0·6 for all three levels.
The proposed index showed a stronger relationship to the amount of contouring time saved. It was a simple tool that could be used to evaluate the performance of different segmentation algorithms.
Email your librarian or administrator to recommend adding this journal to your organisation's collection.