Introduction
Radiotherapy is used to treat 30% of prostate patients, or ∼16,000 patients/year in the UK 1 A typical radiotherapy workflow includes acquiring a planning CT (pCT) to delineate organs at risk (OAR) and target volumes, 2 followed by radiotherapy plan creation. The OARs routinely delineated are the bladder, rectum and bowel loops. Reference Cantin, Gingras and Lachance3
Radiotherapy treatments are fractionated with a typical prostate regime being 60Gy in 20 fractions. 4 During the course of treatment, the patient’s anatomy may change due to weight loss and/or bladder/rectal filling differences. Reference Ghilezan, Yan and Martinez5 Changes are detected during routine CBCT imaging Reference Mayles, Nahum and Rosenwald6 which are rigidly registered to the pCT to visualise setup uncertainties. Reference Sonke, Aznar and Rasch7 Occasionally, clinically significant anatomical changes reduce target coverage or over-dose OARs, which could lead to reduced local control or increased toxicity. In this case, the treatment is re-planned; a new pCT is acquired and a new plan is produced. This is called offline adaptive radiotherapy (ART). Reference McNair, Franks and van Herk8 At our centre, for prostate patients a visual assessment of the anatomy on the cone-beam CT (CBCT) is carried out, alongside an assessment of the registration with the pCT, to determine if there is an external contour change above our local tolerance of 15 mm. This relates to the build-up region for dose deposition, which could cause discrepancies in the dose distribution. After this assessment, the CBCT is imported into the treatment planning system (TPS) with the external contour copied to the pCT, the outside anatomy density forced to air, and then the plan recalculated, with a 2% change in the D50% planning target volume (PTV) clinical goal being the threshold for ART. In the event of changes that are not due to weight loss, clinical judgement is made based on various factors, such as approximation to OARs. Multiple studies indicate the benefits of ART for a range of sites. Reference Tan, Tanderup and Kirisits9–Reference Meng, Luo and Xu13 For example, a systematic review by Thörnqvist et al. analysed 1219 prostate patients across 43 clinical studies, concluding that ART improves rectum sparing compared with non-ART, including a study that showed a 19% reduction in rectum V65%. Reference Thörnqvist, Hysing and Tuomikoski14
Limitations of ART include the time and resources required. Deciding whether a patient needs ART requires multi-disciplinary teams of physicists, dosimetrists, radiation therapists and oncologists. If CBCT visual assessment determines potentially significant dosimetric changes, further modelling is performed by approximating tissue changes (e.g. weight loss) using the pCT. Reference Posiewnik and Piotrowski15,Reference Stauch, Zoller and Tedrick16 This is an adaptive assessment. Modelling anatomical changes carries uncertainties depending on the modelling technique used and changes they account for. Dose calculations cannot be performed accurately on CBCT data as the intensities do not correlate directly with electron density and the increase in scattered X-rays, leading to poorer image quality. Reference Almatani, Hugtenburg and Lewis17 Until recently this prevented direct CBCT use for adaptive assessments. However, technological developments have made it possible to generate dosimetrically accurate CBCT data through sCTs generated from CBCTs. Reference Gao, Xie and Wu18 sCTs are images with CT-like properties that are generated from another imaging modality. Reference Razi, Niknami and Alavi Ghazani19,Reference Eckl, Sarria and Springer20
Various methods exist for generating sCTs (or dosimetrically accurate CBCTs) such as bulk-density assignments (mapping mass densities to specific tissues based on average densities for that tissue on CT), deformable registration (mapping spatial co-ordinates of the CBCT to the pCT, including deforming the tissues in 3-dimensions) and deep-learning (a sub-set of machine learning that uses neural networks to learn from training datasets). Reference O’Hara, Bird and Al-Qaisieh21 Four methods were assessed by O’Hara et al. Reference O’Hara, Bird and Al-Qaisieh21 All were found to be dosimetrically acceptable, but deep-learning was preferred due to the speed and ability to be fully automated. Reference O’Hara, Bird and Al-Qaisieh21 Dosimetrically accurate CBCTs have the potential to allow automation of adaptive assessments, without the requirement for the current visual assessment methods.
This study aimed to develop and validate an automated adaptive assessment pipeline for prostate treatments, utilising dosimetrically accurate CBCTs, to accurately determine which patients required ART without operator intervention, for the first time. This proof-of-principle study compares the outcomes of the pipeline with clinical decisions, as well as the real clinical justification for why patients have received ART. It was hypothesised that the pipeline’s benefits are in its ability to streamline the ART patient pathway, reduce resources required and limit inter-user variability.
Materials and Methods
Patient selection and data acquisition
Fifty retrospective patients were identified, who were consecutively treated for prostate cancer at Leeds Cancer Centre (LCC). All patients were prescribed 60Gy in 20 fractions to the prostate +/– seminal vesicles without nodal involvement, treated with volumetric modulated arc therapy (VMAT) and planned using RayStation v11A DTK (RaySearch Laboratories AB, Sweden) TPS. All pCT scans were acquired on a Philips Brilliance Big Bore CT (Philips Healthcare, Amsterdam, Netherlands), with acquisition parameters;120 kVp, 106 mAs and 1·2 × 1·2 × 2·0 mm resolution. CBCT images were acquired on an Elekta XVI scanner (Elekta, Stockholm, Sweden), with acquisition parameters; 120 kVp, 20 mAs, 1·0 × 1·0 × 1·0 mm resolution, with an M20 filter. All patients had 2 PTVs; one of which expanded 0·5 cm from the 60Gy clinical target volume (CTV) and one expanded 1 cm in all directions except for 0·8 cm superiorly from the 47Gy CTV, which includes seminal vesicles. A local bladder filling protocol ensures patients have a comfortably full bladder, and rectal filling is regulated using micro-enemas.
Ten patients, who did not receive ART (non-ART) and had limited anatomical change on their CBCTs (visually determined by a clinical scientist), were chosen to develop the pipeline and used for initial testing to ensure the script could be run successfully. These were patients who had not been flagged for anatomical change throughout their full course of treatment. The remaining 40 patients were used to test clinical utility and contour accuracy. The standard CBCT acquisition protocol used prioritised image quality in the target and OAR region, rather than including all patient anatomy within the field of view (FoV). Any patients with lateral anatomy >20 cm from the centre of the CBCT (and therefore outside the FoV) were excluded, reducing the patient cohort to 31. At LCC, patients receive daily CBCT for the first 4 fractions and then weekly; however, this is increased if anatomical changes setup difficulties are observed, with the selected patients receiving up to 12 CBCTs in total.
Pipeline construction
The pipeline script was written in Python, and designed to run within RayStation’s scripting module. It comprised 5 steps: importing the CBCT, converting the CBCT to a dosimetrically accurate sCT, generating contours on the sCT, recalculating the treatment plan and sCT dose evaluation, as shown in Figure 1. A research version of RayStation was used, and the deep-learning sCT generation was performed using a script provided as part of a research agreement with RaySearch; however, other methods of sCT generation are available in the clinical version, which have been validated with comparable dosimetric accuracy. 22

Figure 1. Discrete steps of the script used to generate pipeline, beginning with the introduction of cone-beam CTs (CBCTs), converting CBCT to synthetic CT (sCT) using the deep-learning model, producing contours on the sCT using deformable registration, recalculating the plan on the sCT by computing dose on additional datasets and then performing a dosimetric assessment along with a corresponding traffic light system. Open-source Python packages included Pydicom, Tkinter and time, and the contours were deformably transferred from the planning CT to the sCT.
For Step 1, the script imported images with DICOM unique identifiers that were not already in RayStation, preventing duplication. Step 2 generated the sCT from the CBCT using a deep-learning sCT generation algorithm provided by RaySearch Laboratories, described by O’Hara et al., Reference O’Hara, Bird and Al-Qaisieh21 trained with prostate patient data and validated for dosimetric accuracy prior to this study (Appendix 1). Previous validation of this method included determination of mean absolute error of Hounsfield Units (HU) units and minimum dose gamma index pass rates using head and neck patients. Reference O’Hara, Bird and Al-Qaisieh21 Appendix 1 details the training carried out for prostate patients.
Step 3 created a rigid registration between the CBCT and pCT using translational shifts determined at treatment, obtained from the treatment record and applied to match the CBCT position to the pCT. To generate target and OAR contours on the sCT, the pCT was deformably registered to the sCT with contours transferred according to the deformable image registration (DIR) deformation matrix. Boolean algebra ensured bladder and rectum contours did not extend outside the patient externally, in case of any errors associated with the mapping of structures. The quality of the resultant contours was assessed as described in Section Contour assessment. Step 4 recalculated the dose on the sCT using the same beam parameters and dose grid as the clinical plan.
Step 5 undertook the dosimetric assessment, which assessed mandatory clinical goals for the sCT used routinely at LCC. A pass/fail system highlighted whether a patient required ART. If the goal was met in the original plan, but failed after sCT assessment, a ‘red’ failure result was generated. Any goals that were passed or not met in the original plan produced a ‘green’ pass result. All goals were analysed individually, but for each sCT, if at least one ‘red’ mandatory goal existed, the result overall was ‘red’. If the result was ‘green’, this indicated the patient could continue treatment.
Mandatory clinical goals assessed are in Table 1. CTVs were used rather than PTVs with the D50% clinical goal accepting a change of 2·5% from the result of the original plan. This accounted for the role of the PTV in ensuring the CTV receives its prescribed dose, accounting for random setup errors. If PTVs were used, in most cases, the clinical goals would fail due to routine setup variations rendering the pipeline ineffective. Bladder and bowel-loop constraints were not included, discussed in Section Contour assessment.
Table 1. List of mandatory clinical goals used for analysis, and the frequency at which they failed as part of the pipeline. Developed from the current local clinical protocol, changing planning target volume to clinical target volume (CTV) and D50% to ±2·5%, discussed in Section Conclusions. DX% represents the dose received by an X percentage volume, and VXGy represents the volume receiving XGy of radiation dose

VXGy and XGy: Volume receiving a specific dose (X) in Gray, a specific dose (X) in Gray.
Clinical utility
Thirty-one patients were used to assess the pipeline’s clinical utility. Each CBCT acquired throughout treatment was assessed—230 in total—to establish if the pipeline could identify patients requiring ART versus those who did not. All CBCTs up until re-planning were assessed (all CBCTs for non-ART patients). Of 31 patients tested, 6 received ART.
The number of red sCTs was compared for ART patients versus non-ART. A threshold was defined as the number of sCTs that generate red results, which would trigger a re-plan, and was used to balance the sensitivity and specificity of the pipeline. To determine this threshold, a receiver operator characteristic (ROC) curve was plotted to analyse the pipeline’s predictive power. This is a method of visualising the performance of the pipeline across a range of thresholds. The sensitivity and 1-specificity were plotted for each potential red sCT threshold (0–12). Sensitivity refers to the true positive rate, and 1-specificity is the false positive rate. True positives are sCTs that meet the specified threshold and the patient received ART, and false positives are sCTs that did not meet the threshold, but the patient received ART. Area under curve (AUC) analysis provided quantitative assessment of pipeline accuracy. Sensitivity and specificity were plotted for all potential thresholds, with the intersection point of the sensitivity and specificity curves determining the optimal threshold.
For ART patients, the CBCT fraction when the decision to re-plan was made was compared with the fraction that was indicated as requiring a re-plan when using the optimal pipeline threshold. The clinical re-plan justification was compared to clinical goal failures.
A timing assessment was performed for 1 CBCT for 10 patients. The time taken for the automated pipeline was compared with performing the pipeline manually, and the current clinical process. For the automated pipeline, the time started when the pipeline started running until results were presented. The current clinical process was a dosimetric assessment of anatomical changes in dose distributions, involving transferring external contours from the CBCT to pCT using rigid registrations, forcing the density outside the new external to air to mimic anatomical changes, and recalculating the dose distribution. This current method assumes internal structures remain the same and, therefore, cannot account for internal changes such as bladder filling or tumour shrinkage.
Unpaired, one-tailed t-tests assessed statistical significance of the number of red sCTs per patient, and the timing assessment, comparing the manual processes with the automated pipeline. Statistical significance was selected as p < 0·05 for the number of red sCTs and p < 0·02 for the timing assessment, accounting for the increased likelihood of significance when comparing multiple datasets.
Contour assessment
CBCT contour accuracy was assessed by running the pipeline on the 10 development cohort patients (59 CBCTs in total). Only 19 sCTs contained bowel-loop contours as they are only contoured if near the target. Two medical physics experts (MPEs) independently assessed the CTV, rectum and bowel-loop contours on the sCTs. A Likert scale was used to grade contours from 1 to 4. Score 1 indicated no contour edits required, 2 indicated small edits required, 3 indicated large edits required and 4 indicated not clinically acceptable. Scores 1 and 2 were considered clinically acceptable without manual editing. Bladders were not assessed as no clinical goals affected the pipeline outcome.
After the clinical utility testing, further assessment determined whether the quality of rectum contours was impacting the final pipeline results. Rectum contours for each sCT were manually corrected by a medical physicist and steps 4 and 5 of the pipeline were applied using the corrected contours. Pipeline outcomes were compared to the original results. CTV structures were not re-assessed as they were deemed clinically acceptable by MPEs.
Results
There was a statistically significant increase in red sCTs per patient for ART patients (74·4%; 32/43) versus non-ART (6·4%; 12/187) (Figure 2).

Figure 2. Box plot indicating the percentage of each patient’s synthetic CTs that resulted in red pipeline results, with the circles representing outliers in results. The orange line represents the mean value, the upper and lower edges of the box represent the interquartile ranges and the upper and lower extents of the lines represent the minimum and maximum values in the data. Outliers were determined to be any results outside of 1·5x the interquartile range.
For red sCTs, 17/44 (38·6%) failed the CTV60 D98%, 20/44 (45·4%) failed the CTV D2%, 3/44 (6·8%) failed the CTV47 D98% and 18/44 (40·9%) failed a rectum clinical goal (Table 2).
Table 2. Summary of contours scores for 2 medical physics experts (MPE) across 3 structures; clinical target volume (CTV), rectum and bowel loops, alongside the percentage of the total 59 synthetic CTs (sCTs) (19 sCTs for bowel loops). Likert scores; 1: no contour edits required, 2: small edits required, 3: large edits required, 4: not clinically acceptable

ROC curve analysis (Figure 3) found the AUC to be 0·98, suggesting the pipeline had a high predictive value. The optimal threshold for indicating patients for ART was 1·8 red sCTs (rounded to 2) as it maximised both sensitivity and specificity (Figure 4). A threshold of 2 was used for the remaining analysis regarding whether the pipeline selects the correct patients for ART.

Figure 3. Receiver operator characteristic curve assessing sensitivity and specificity of the pipeline, with the blue point markers indicating thresholds. The thresholds are the number of red synthetic CTs received by each patient that would require a re-plan and vary from 0 to 12, connected by the blue line (some threshold results overlap, therefore only 8 markers can be seen). Sensitivity is the rate of true positives, and 1-specificity is the rate of false positives.

Figure 4. Sensitivity and specificity for a range of red synthetic CT (sCT) thresholds, indicating an optimum threshold of 1·8 red sCTs for indicating adaptive radiotherapy required.
In 5/6 ART patients, the clinical justification for ART was rectum differences between the CT and CBCT. In 1/6 patients, the reason for re-planning was quoted as due to ‘setup issues and dosimetric uncertainties’. However, the first red sCT in the pipeline was caused by a failure in a CTV goal in 5/6 patients, and rectums in 1/6 patients. Therefore, the clinical justification and cause of pipeline failure matched only for 1 case.
Figure 5 shows all patients with ≥1 red sCT result and the fractions they occurred. For a threshold of two red sCTs, patient 2 would not have been indicated for ART, while patients 8, 12 and 13 would have. For the 5 ART patients for whom the pipeline indicated ART, the mean difference in fractions between the pipeline indicating ART and ART being ordered was 2·6 (range 0–6).

Figure 5. Cone-beam CT pipeline results for all patients who had at least 1 red synthetic CT (sCT) to the time point of adaptive radiotherapy (ART) being clinically ordered, where red circles represent red sCTs and green circles indicate green sCTs, as determined by the pipeline. The horizontal green lines indicate the number of fractions completed before ART was clinically ordered (20 for non-ART patients). Patients who had no red sCTs are not shown here.
The mean time for running the automated pipeline (182·5s (SD: 24·5, Range: 91·2)) was statistically significantly quicker than both manual methods (manual pipeline 486·9s (SD: 32·8, Range: 32·8)), current clinical process 556·4s (SD: 84·8, Range: 251·8))).
Contour accuracy was high for CTVs and rectums, with CTVs deemed suitable for clinical use for both physicists in 57/59 cases and 53/59 cases for rectums. In total, 17/19 bowel-loop contours were clinically unacceptable for use and, therefore, were not utilised for the pipeline, discussed below.
The corrected rectum contours had an insignificant impact on pipeline outcomes, where the corrected contours changed the sCT pipeline outcome from red to green in 4/230 (1·7%) sCTs and green to red in 2/230 (0·9%) sCTs. No pipeline outcomes were affected for a threshold of 2 red sCTs.
Conclusions
The automated pipeline successfully determined which patients required ART with high sensitivity and specificity. This was validated by the AUC analysis, Reference Obuchowski and Bullen23 demonstrating a high performance at distinguishing ART and non-ART patients. The ROC analysis identified a threshold of 2 red sCTs that would optimise pipeline sensitivity/specificity, referring to any 2 sCTs rather than consecutive sCTs. However, depending on clinical requirements, a lower threshold would increase sensitivity, reducing false negatives and ensuring all cases are identified. Alternatively, the threshold could be raised where the ART workload produced by the pipeline was high.
The quantitative nature of the pipeline makes the adaptive assessment pathway less subjective and it can identify patients that may be missed with qualitative assessments. For example, within the clinical utility testing, patients 8, 12 and 13 were identified for ART despite not having clinically received ART. Further testing could prove these patients were missed for ART as opposed to producing false negatives.
The pipeline demonstrated potential for identifying patients at an earlier fraction, identifying significant anatomical changes sooner. This earlier intervention could lead to a higher delivered treatment quality, with potential to impact local control and OAR toxicities. Prostate ART has been shown to reduce OAR toxicity, Reference Meyers, Winter and Obeidi24 and this automated method has value in supporting identification of patients who require ART without introducing an inhibiting workload.
Another benefit is its efficiency. It significantly reduces the time required for dosimetric assessment compared with manual processes. This agrees with a study by Almatani et al., who used a multilevel threshold algorithm to perform CBCT-based dose calculations in prostate patients with bilateral hip prostheses, which was shown to reduce resources required from physicists, physicians and radiographers through using an automated pathway. Reference Almatani, Hugtenburg and Lewis17 Furthermore, the clinical time could be reduced to zero if the pipeline were adapted to run in the background without human interaction. This could substantially reduce the time for ART decision-making and inter-user variability associated with visual assessment, resolving the main limiting factors to ART; time, resources and assessment uncertainties. Reference Posiewnik and Piotrowski15
The clinical reason given for ART was different to the reason for pipeline failure in 5/6 ART patients, where the clinical reasons were predominantly rectum change. Instead, the pipeline identified CTV failures. This demonstrates the subjectivity and qualitative nature of the current process. This pipeline gives an alternative, offering quantitative analysis and improving understanding of the dose distribution delivered.
The D50% is a metric used at our centre to assess clinical change, as it assesses the average dose distribution homogeneity. Also, the D50% goal had an extra 2·5% tolerance added. This was justified as D50% of clinical goals largely assess the mean dose. If the remaining CTV goals are met, including D2% and D98% which assess homogeneity, a 2·5% D50% failure would be clinically acceptable. Also, neither the bladder nor bowel-loop contours were used due to poor bowel-loop quality, and bladder goals were deemed not significant for re-planning. However, it is interesting that, without them, the pipeline identified the correct patients for ART, suggesting only rectum and CTV structures were necessary for ART assessment. It may be that extreme cases of bowel-loop change would impact results, requiring further investigation. The contour assessment carried out prior to clinical utility testing aimed to establish the level of contour accuracy. While some inaccuracy was present, it did not impact the accuracy of the pipeline, suggesting contour accuracy was sufficient for the required purposes.
Limitations of the project include its inability to assess patients with anatomy outside of the CBCT FoV, which limits the cohort of patients that it could benefit from. In addition, further work is required to improve the quality of the automated bowel-loop contours, and it is not known how the model would perform if adapted to more complex treatment sites such as head & necks. There are aspects that would need more work to produce a fully realised clinical model; however, these challenges would be addressed by local commissioning teams, and the proof-of-principal has been realised.
One of the main benefits of the pipeline is its adaptability, with a script that can be easily tailored to the needs of a department/treatment site or change to the criteria for re-planning. Prostates were chosen to demonstrate feasibility of the pipeline due to their simplicity and large patient numbers for proof-of-concept. Further work will focus on extending the pipeline to other treatment sites in which a larger proportion of patients receive ART, such as head and neck patients, whom have more consistent weight loss and tumour shrinkage. Reference McNair, Franks and van Herk8 The ability to accurately calculate dose on CBCTs in this automated manner introduces the possibility of further applications such as accurate automated dose accumulation and automated adaptive treatment planning, which have the potential to unlock significant improvements to patient treatments without excessive departmental workloads.
This study has demonstrated an automated pipeline can identify patients requiring ART for prostate radiotherapy from dosimetrically accurate sCTs generated from CBCTs. It has high accuracy and has the potential to identify patients who require ART earlier in their treatment. The pipeline reduces assessment subjectivity and time requirements, reducing departmental workload, which has the potential to increase departmental efficiency and personalisation of patient treatments. Wider benefits for the patient include the potential for ART to be carried out earlier in their treatment or improving patient outcomes, reducing side effects and toxicity to healthy tissues.
Acknowledgements
None.
Financial support
This work was performed under a research agreement between Leeds Cancer Centre and RaySearch Laboratories, and the work was funded by Cancer Research UK for the Leeds Radiotherapy Research Centre of Excellence (RadNet; C19942/A28832).
Competing interests
The authors declare none.
Appendix 1
The deep-learning model for generating sCTs was trained on 39 retrospective prostate patients and then validated on 5 patients. The training involved importing the plan, adding external contours, performing a deformable registration, creating the deformed CBCT, clipping the deformed CBCT from the external contours and cropping the data to within the FoV, then generating final externals. Results below show that all 5 patients had PTV dose differences that were considered to be sufficiently small; therefore, the model is sufficiently trained and aligns with the results found by O’Hara et al. Reference Eckl, Sarria and Springer20 The table shows the maximum dose differences across the whole PTV, yet the majority of the PTV volumes had dose differences considerably less than the number quoted.

 
 






