Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Elias Rüfenacht, Robert Poel, Amith Kamath, Ekin Ermis, Stefan Scheib, Michael K. Fix, Mauricio Reyes

Abstract

Deep learning-based image segmentation for radiotherapy is intended to speed up the planning process and yield consistent results. However, most of these segmentation methods solely rely on distribution and geometry-associated training objectives without considering tumor control and the sparing of healthy tissues. To incorporate dosimetric effects into segmentation models, we propose a new training loss function that extends current state-of-the-art segmentation model training via a dose-based guidance method. We hypothesized that adding such a dose-guidance mechanism improves the robustness of the segmentation with respect to the dose (i.e., resolves distant outliers and focuses on locations of high dose/dose gradient). We demonstrate the effectiveness of the proposed method on Gross Tumor Volume segmentation for glioblastoma treatment. The obtained dosimetry-based results show reduced dose errors relative to the ground truth dose map using the proposed dosimetry-segmentation guidance, outperforming state-of-the-art distribution and geometry-based segmentation losses.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43996-4_50

SharedIt: https://rdcu.be/dnwPv

Link to the code repository

https://github.com/ruefene/doselo

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a clinically-relevant dosimetry-aware loss function for training deep learning segmentation models. A cascaed U-Net model is used for segmentation followed by dosemap prediction. The segmentation model is trained by guiding with dose prediction error due to segmentation variation. The proposed model is validated against MRI sequences of post-operative glioblastoma patients.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well-written.

    I found the idea of dose guidance for segmentation interesting and important for radiation therapy. The problem is well motivated.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The dataset is small and the experimental results are not convincing enough

    2. Data descriptions are not provided.

    3. Some network and implementation details are either missing or confusing.

    4. The main challenge of the work is actual evaluation of dosemap predictor and future scalability of the method.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors placeholder link to their github repo, so I assume the training code will be made public upon acceptance. Reproducibility might be possible with the availability of data and some details of their implementations.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Since private data are used for the model validation, a complete description of the data collection process would be desirable, such as descriptions of the experimental setup, device(s) used, image acquisition parameters, subjects/objects involved, instructions to annotators, and methods for quality control. Moreover, there is no mention of IRB approval.

    2. Evaluation dataset is too small (only 10). There is no mention of the patient anatomy/age/body size variations. And how the proposed method could be scaled to large cohort of patients?

    3. The authors mentioned the dose prediction model was kept fixed during segmentation training. Does it mean the dose predictor was pretrained and the overall method is two-stage? An end-to-end joint training would be interesting.

    4. Fig. 1: The figure could be improved with better illustration of the model. Is the dose error is calculated at image level or organ level? What would be the impact of organ delineation on organ-specific doses [1]?

    5. According to literature, actually deep learning-based organ segmentation might have little or no dosimetric impact. [2][3]

    6. The experimental results are not quite convincing. Almost 50% of the test cases, the dose error (RMAE) doesn’t improve compared to BCE+SoftDice.

    7. How to determine the most significant cases from RT point of view?

    8. What value of λ is used for weighting the dose segmentation loss?

    9. Please use either DOSELO or DSL for referring to the new loss function. It’s better to be consistent throughout the paper.

    10. At one place, it is mentioned that the model’s input is a normalized CT volume and segmentation masks for target volumes and OARs. Then it is also mentioned the presented results are based on 2D models. It seems conflicting. Also, the 3D extension may not be straightforward as claimed by the authors.

    11. How do the imperfect segmentation masks and corresponding dose plans help the dose predictor model? What augmentations were performed?

    12. “Segmentation masks of 13 OARs and the GTV” It is not clear which particular organs are included.

    [1] https://link.springer.com/chapter/10.1007/978-3-030-87202-1_47 [2] https://pubmed.ncbi.nlm.nih.gov/35101068/ [3] https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11312/113124O/Evaluation-of-deep-learning-segmentation-for-rapid-patient-specific-CT/10.1117/12.2550314.short

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Missing details of the dataset, small dataset for validation, generalization and scalability concerns, confusing/conflicting information, and unconvincing results.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors use cascaded U-Nets to first predict a tumour segmentation and then use the predicted segmentation to predict a delivered radiotherapy dose distribution. The predicted dose distributions were integrated into a novel loss function used to train the segmentation model. In this way, the segmentation network was encouraged to correctly segment the regions with the most significant clinical impact for radiotherapy (regions of high dose and dose gradient). The dose-aware loss function was shown to outperform a traditional loss in terms of dosimetric impact.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is very clear and well written. The methodology is clearly described and demonstrated with a good figure. The dose segmentation loss function (DOSELO), which is the papers key contribution, is well-described with appropriate detail. The authors primarily evaluate the produced segmentations using the dose score, a metric which measures the relative difference in the resulting dose maps produced by the predicted and gold standard segmentations. This is claimed to be more clinically relevant for radiotherapy than metrics such as Dice Similarity Coefficient and Hausdorff distance (also reported for reference). The move to clinically-appropriate metrics is an important and relevant topic which is raised in this paper. The proposed method seemingly produces GTV segmentations which reduce the dose error compared to a current standard loss function.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors currently do not explicitly report a statistical test to assess the level of significance of improvement of their approach of the current standard (BCE + SoftDice). An analysis of statistical significance of reported differences in performance between methods would strengthen the paper. The proposed methodology uses a vanilla U-Net architectures for segmentation and dose prediction. I am unsure why the authors did not choose to use current state-of-the-art (SOTA) nnU-Net, despite justifying the use of the BCE + SoftDice loss as the current SOTA loss function by referencing the authors of the nnU-Net architecture (pg. 5 paragraph 4). It would be valuable to use the best possible segmentation architecture in this study to ensure the result remains relevant, i.e. that simply improving the segmentation model used doesn’t nullify the effectiveness of the dose-guidance.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper is fairly reproducible. The authors claim they will make their code publicly available post-anonymization. However, the use of an in-house non-public dataset will limit the ability to reproduce these results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    This is a very interesting and well-presented study. Table 1 showing the dose errors for the proposed and standard loss functions is particularly effective. However, I believe it would also be beneficial to show predicted segmentations and if possible highlight examples of regions which benefitted from the dose guidance methodology. Additionally, as previously mentioned, showing the statistical significance of reported differences would strengthen this paper. Explaining the choice of the basic U-Net or, even better, repeating the experiment with a state-of-the-art segmentation model, such as the nnU-Net, would further enhance this papers relevance. Accurately segmenting the GTV is a challenging task, but additionally simultaneously segmenting the GTV and OARs, mentioned in your possible future work, would be especially interesting.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The dose segmentation loss function (DOSELO), which is the papers key contribution, is deceptively simple but seemingly effective contribution. The use of a cascaded segmentation then dose prediction architecture to enable dose-aware segmentation loss function is particularly novel. In recent years there has been growing awareness of the clinical irrelevance of widely reported auto-segmentation metrics for radiotherapy (particularly DSC). This paper advocates this message but, in addition to assessing predicted segmentations with a more clinically relevant measure, they attempt to directly train a segmentation model with a loss function directly connected to the downstream clinical task (i.e. use of the segmented structures to calculate the spatial distribution of radiation dose delivered to the patient). There are a few minor weaknesses but I believe this paper is of high interest to the MICCAI community.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This is an interesting paper on a topic of growing importance in radiation therapy treatment planing (incorporating dosimetry information into the segmentation model). The main contribution of the paper is a model and new loss function which considers both of these aspects.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is generally well written; and the methodology and validation seem solid. Releasing the code is great. The combination of dose and geometric information seems relatively novel.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Please see review below

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No details of ethics or patient consent included.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • there are a couple of terms in the first paragraph of the introduction: patient-safe (what is a patient-safe segmentation?) and expert-dependent (maybe the authors mean intra-observer differences?)
    • there are three contributions from this paper listed: however the second and third are results from the first (not contributions).
    • What were the actual OAR structures for these treatment plans?
    • Were the four MRI sequences mentioned in 2.2 consistently acquired for all patients? Does T1c mean T1 post contrast? It would be better to define these sequences in 2.2 (rather than in 3.1)
    • Eq (4): what hyperparameter ranges were evaluated?
    • There are no details of ethics or informed consent in 3.1
    • I understand why the GTV was used for segmentation comparison, however isn’t the dosimetry usually performed on an expansion (PTV)? Please clarify this in section 3.1.
    • 3.1: needs a sentence about registration (I assume rigid)
    • How were the volumes resampled with 1mm isotropic spacing and was this employed for all volumes (CT as well)?
    • How were the most significant four cases decided in 3.3?
    • Table 1 is probably a Figure? If the four examples in Table 1 are included in Fig 2 it would be clearer to use the same CaseIDs.
    • Table 1: It would be good to see the GTV/PTV for these cases as well. Also, I’d be interested in seeing the actual predicted dose in addition to the difference images (there should be room to include both). Finally, it would be easier to read with a scalar bar provided in this table.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Interesting and well written paper which addresses a clinically important topic. Most of my comments are relatively minor.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Overall, the reviews for this paper were quite positive with all reviewers noting the clarity and well-organized structure of the paper. The proposed idea of dose guidance for improving segmentation in radiotherapy workflow is a novel concept and is recognized by the reviewers. However the authors are invited to address the different comments related to the paper’s weaknesses, particularly from R#1 with regards to the limited dataset size, descriptions of the datasets and missing implementation details.




Author Feedback

Many thanks to all the reviewers for their constructive comments and responses. Below we include comments on the main points raised.

  • Small dataset, aim of the study clarification, unconvincing results: We agree these are preliminary results. We are currently curating more cases for evaluation, including statistical testing. We note that the evaluation of the dose map predictor had already been done in a separate study. We agree the approach was not superior to the baseline on all cases but note the reported overall improvement in reducing dose error by 42.5%.

  • Selection criteria of displayed cases: The most significant cases were determined by the RMAE metric.

  • Use of nnU-Net as backbone: At this initial stage, we did not consider nnU-Net as a baseline because we wanted to demonstrate the feasibility of our method with minimal tuning and without the influence of additional optimizations, such as adaptive learning rate schedules and fingerprinting. Therefore, we decided to use a vanilla U-Net with minor differences from nnU-Net.

  • How to scale to a large cohort: The main challenge here is to curate the datasets to evaluate the study. Inference time is fast and not a problem to scale.

  • End-to-end dose predictor and segmentation: Indeed, this is a great idea we have discussed. In this first study, we wanted to present its simpler form (fixed dose predictor) to show how dosimetry information can be included for training a U-Net model.

  • Data selection: We selected the patient cohort randomly from the research PACS at our University Hospital. All patients in the dataset suffered from histologically confirmed GBM and underwent maximal-safe surgical tumor resection. We did not include detailed descriptions of the dataset due to the limited number of pages. Further information about the cohort and the experimental setup will be made public in potential upcoming publications.



back to top