Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Wen Li, Dan Zhao, Zhi Chen, Zhou Huang, Saikit Lam, Yaoqin Xie, Wenjian Qin, Andy Lai-Yin Cheung, Haonan Xiao, Chenyang Liu, Francis Kar-Ho Lee, Kwok-Hung Au, Victor Ho-Fun Lee, Jing Cai, Tian Li

Abstract

This study aims to investigate the clinical efficacy of AI generated virtual contrast-enhanced MRI (VCE-MRI) in primary gross-tumor-volume (GTV) delineation for patients with nasopharyngeal carcinoma (NPC). We retrospectively retrieved 303 biopsy-proven NPC patients from three oncology centers. 288 patients were used for model training and 15 patients were used to synthesize VCE-MRI for clinical evaluation. Two board-certified oncologists were invited for evaluating the VCE-MRI in two aspects: image quality and effectiveness in primary tumor delineation. Image quality of VCE-MRI evaluation includes distinguishability between real contrast-enhanced MRI (CE-MRI) and VCE-MRI, clarity of tumor-to-normal tissue interface, veracity of contrast enhancement in tumor invasion risk areas, and efficacy in primary tumor staging. For primary tumor delineation, the GTV was manually delineated by oncologists. Results showed the mean accuracy to distinguish VCE-MRI from CE-MRI was 53.33%; no significant difference was observed in clarity of tumor-to-normal tissue interface between VCE-MRI and CE-MRI; for the veracity of contrast enhancement in tumor invasion risk areas and efficacy in primary tumor staging, a Jaccard Index of 76.04% and accuracy of 86.67% were obtained, respectively. The image quality evaluation suggests that the quality of VCE-MRI is approximated to real CE-MRI. In tumor delineation evaluation, the Dice Similarity Coefficient and Hausdorff Distance of the GTVs that delineated from VCE-MRI and CE-MRI were 0.762 (0.673-0.859) and 1.932mm (0.763mm-2.974mm) respectively, which were clinically acceptable according to the experience of the radiation oncologists. This study demonstrated the VCE-MRI is highly promising in replacing the use of gadolinium-based CE-MRI for NPC delineation.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43990-2_51

SharedIt: https://rdcu.be/dnwL6

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    The authors investigate the comparability of virtually contrast enhanced (CE) facial T1w MRI with conventional CE T1w MRI in patients with biopsy proven nasopharyngeal carcinoma (NPC). Training (n=288) of a previously described multimodality, synthetic neural network (MMgSN-NET) was performed using images from three different institutions, then tested on a small collection of held-out patients (n=15) from the same three institutions. Randomized and blinded qualitative comparisons were made between virtual and conventional images test pairs by clinical oncologists, as well as quantitative comparison of gross tumor volume segmentations. The results show highly comparable qualities with respect to tumor delineation, involvement of high risk areas and staging.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of this work is a rigorous set of evaluation methods for the virtual MRI output images. The author’s should be applauded for the randomized, blinded presentation of image pairs to readers, and the apparent presentation of images in 3D as would be expected with a clinical PACS. The results are important and impressive–there appears to be a small difference in some qualitative outcome measures, however, there is a discrepancy in staging based on virtual images with an accuracy of only 86%. It would be interesting to investigate these initial findings in a larger, completely held-out test set.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    With the advent of group II gadolinium contrast agents and screening methods the risk of NSF is extremely low, even in patients with impaired renal function. With the exception of institutions where group II gadolinium is unavailable, the clinical need for virtual CE MRI is unclear.

    It is relatively uncommon for clinical oncologists to formally interpret neuroradiology imaging nor generate tumor segmentations. It would be helpful to clarify if the clinical oncologists were radiation oncologists and if image interpretation and tumor delineation is a part of their typical clinical responsibilities.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    There is limited reproducibility as the authors acknowledge in their checklist–code is not available and there are no details on model parameters nor hyperparameters, nor training schema.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Additional advantages to virtual CE MRI that can be mentioned is speed–the ability to acquire less sequences and forego IV placement. In the age of group II gadolinium agents and screening methods NSF is an extremely rare concern practically.

    The most clinically valid outcome measures would be comparison of tumor segmentations to that of board-certified neuroradiologists (at least 2), and assessment of qualitative features also by a neuroradiologist. Contouring for radiotherapy is commonly performed by a radiation oncologist or medical physicist and investigation with these professionals may hold greater clinical generalizability.

    As the authors note, the test set is very small, and it is possible that typical, not uncommon, cases have been omitted from analysis.

    Availability of code and brief mention of model architecture and training parameters would enhance the reproducibility of this work.

    With regards to the clinical meaning of the small discrepancy between virtual and conventional T1w contrast MRI, the author’s inclusion of staging and GTV is important and a strength. Providing further insight on how these discrepancies might change consequential radiotherapy use, or planning/dosage, would be a next level of clinical rigor to explore.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The outcome measures were thoughtfully developed and rigorous. Training was performed with multi-institutional data which bodes well for generalizability, though testing on a larger, completely held-out set/institution would strengthen the work further.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    The paper investigates the clinical efficacy of AI-generated virtual contrast-enhanced MRI in gross-tumor volume (GTV) delineation. Two board-certified oncologists evaluated the VCE-MRI in image quality and effectiveness in primary tumor delineation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Clinical evaluation of the effectiveness of synthetic AI-generated MRI is highly important.
    2. The patient data were retrospectively collected by three clinical centers with different imaging protocols.
    3. Image quality assessment of VCE-MRI was considered in various clinical evaluation criteria.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. It is not clear why clinical oncologists, not radiologists, were chosen to assess the image quality of VCE-MRI. The authors need to provide justification for their experience in MRI.
    2. One of the main issues in VCE-MRI is the inaccuracies, including hallucination, of synthesized AI-generated MRI. This may not be clearly identifiable when contrast-free T1 and T2 have sufficient information for the delineation of GTV. It was not clear how the primary GTV delineation was determined with consideration of CE- and VCE-MRI.
    3. The GTV evaluation should include three sets i) contrast-free T1 and T2, ii) contrast-free T1, T2, and CE-MRI, and iii) contrast-free T1, T2, and VCE-MRI. In particular, if T2 has sufficient information, the GTV delineation would be sufficiently achieved regardless of the accuracies of VCE-MRI.
    4. The mean-based evaluation would not provide a great insight into the hallucination problem, as this can happen in a random fashion. Outlier detection or out-of-distribution analysis would be more suitable here.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The evaluation was done with multi-institutional data with two readers. The reproducibility is very high.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. I suggest that the authors include justification for the selections of readers and inter-reader evaluations.
    2. The evaluation needs to be focused on evaluating frequencies of out-of-distribution cases, instead of average performance.
    3. The evaluation design needs to include MRI without having (virtual or real) contrast-enhanced MRI.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Major modification is required in comparison and evaluation designs.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors proposed a pipeline for clinical evaluation of AI generated virtual contrast enhanced (VCE) MRI for primary gross-tumor-volume (GTV) delineation in patients with nasopharyngeal carcinoma (NPC). They used an established GAN-based model for VCE-MRI generation and have oncologists evaluating the image quality and effectiveness of tumor delineation in VCE-MRI. Image quality evaluation includes 1) distinguishability between real contrast-enhanced MRI (CE-MRI) and VCE-MRI, 2) clarity of tumor-to-normal tissue interface, 3) veracity of contrast enhancement in tumor invasion risk areas, and 4) efficacy in primary tumor staging. For primary tumor delineation, the GTV was manually delineated by oncologists in CE-MRI and VCE-MRI. Dice and Hausdorff distances were computed.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors proposed two interesting ways for image quality assessment: 1) have radiologists look at 25 tumor invasion risk areas and decide if these areas are at risk of being invaded. Then they compare the results between VCE-MRI and CE-MRI with Jaccard index. 2) They compared the primary tumor staging between VCE-MRI and CE-MRI.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The number of patients for evaluation is 15, which is pretty small.
    2. For the tumor delineation evaluation, the authors claimed the Dice and Hausdorff distance between CE-MRI and VCE-MRI were clinically acceptable. it is not clear how the authors define ‘clinically acceptable’.
    3. For the experiment of ‘Distinguishability between CE-MRI and VCE-MRI’, the oncologists should be trained to look at VCE-MRI and CE-MRI and knowing the ground truth while training.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors should provide more details about how the radiologists viewed the images. For example, did they use any window/level? What kind of monitor was used for viewing? How to make sure the viewing conditions are calibrated across different monitors?

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The details of staging of primary tumors should be included.
    2. A better discussion of the results could be provided. The image quality assessment results from institution A are always worse than B and C. Why is this?
    3. More details about how the radiologists view the images should be provided. For example, did they use any window/level? What kind of monitor is used for viewing? How to make sure the viewing conditions are calibrated across different monitors?
    4. Radiologists should be trained to look at CE-MRI and VCE-MRI.
    5. The authors could conduct inter-observer agreement to further assess the differences between radiologists.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors proposed two interesting ways for image quality assessment: 1) have radiologists look at 25 tumor invasion risk areas and decide if these areas are at risk of being invaded. Then they compare the results between VCE-MRI and CE-MRI with Jaccard index. 2) They compared the primary tumor staging between VCE-MRI and CE-MRI. However, the authors should expand their test set, improve on reproducibility and justify the tumor delineation results.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Investigates the clinical efficacy of AI-generated virtual contrast-enhanced MRI in gross-tumor volume (GTV) delineation in nasopharyngeal cancer

    • Clear summary of the field and motivation for the current clinical evaluation of virtual contrast across centers
    • Should explain why oncologists used in study vs radiologists
    • Good rigor in evaluating image quality
    • Sources of error or disagreement should be discussed




Author Feedback

N/A



back to top