Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xiaodan Xing, Jiahao Huang, Yang Nan, Yinzhe Wu, Chengjia Wang, Zhifan Gao, Simon Walsh, Guang Yang

Abstract

The destitution of image data and corresponding expert annotations limit the training capacities of AI diagnostic models and potentially inhibit their performance. To address such a problem of data and label scarcity, generative models have been developed to augment the training datasets. Previously proposed generative models usually require manually adjusted annotations (e.g., segmentation masks) or need pre-labeling. However, studies have found that these pre-labeling based methods can induce hallucinating artifacts, which might mislead the downstream clinical tasks, while manual adjustment could be onerous and subjective. To avoid manual adjustment and pre-labeling, we propose a novel controllable and simultaneous synthesizer (dubbed CS2) in this study to generate both realistic images and corresponding annotations at the same time. Our CS2 model is trained and validated using high resolution CT (HRCT) data collected from COVID-19 patients to realize an efficient infections segmentation with minimal human intervention. Our contributions include 1) a conditional image synthesis network that receives both style information from reference CT images and structural information from unsupervised segmentation masks, and 2) a corresponding segmentation mask synthesis network to automatically segment these synthesized images simultaneously. Our experimental studies on HRCT scans collected from COVID-19 patients demonstrate that our CS2 model can lead to realistic synthesized datasets and promising segmentation results of COVID infections compared to the state-of-the-art nnUNet trained and fine-tuned in a fully supervised manner.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_1

SharedIt: https://rdcu.be/cVRYE

Link to the code repository

https://github.com/ayanglab/CS2

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a method to synthesize images and labels for medical image segmentation. The method is technically sound. It is compared with three strong baseline methods on both in-house and public datasets and delivers promising results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. It is important to explore means to reduce the cost of medical image annotation. The presented method, CS2, is an promising approach. It utilizes 30 labeled images for data synthesize while its baseline counterparts require 1000 labeled images.

    2. The presented method is technically sound. Its key components are clearly explained and figures are provided to understand the method.

    3. This paper presents good experiments. The proposed method and three popular baseline approaches are compared on both in-house and public datasets. Promising quantitative and qualitative results are reported with detailed discussions.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I don’t see any major weakness of this paper.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors promise to release code but not the in-house data. Hyper-parameters for model training are not provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The mentioned supplementary file is not provided.

    2. Font size in figures is too small to be read.

    3. Is data separated in patient-wise?

    4. Why only use one image from each CT volume?

    5. Why only 10 volumes are used to fine-tune nnUNet while 30 are used by the proposed method?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper studies important problem, presents technically sound approach, and reports promising results. These are major factors lead to the positive rate.

  • Number of papers in your stack

    7

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The submitted rebuttal addresses the problems arised by the reviewers and the authors promise to make corresponding modifications in the revision by 1) adding new visual examples of rare pathologies, and 2) adding more details and justifications of the proposed method.

    The paper is original good and the rebuttal is promissing, therefore, I would like to suggest to accept it as a good paper with moderate weaknees.



Review #2

  • Please describe the contribution of the paper

    The paper describes a novel approach for synthetic CT generation. There are two key ideas. First, it is important to combine both masks and noise vectors into the generation process, that on one hand, allows controllable synthesis, and on the other, can learn a large variability of images. Second, by using modified Hounsfield unit (HU) maps as a replacement for traditional binary/class maps, more structural information is encoded into the model. Experiments demonstrate that the proposed method creates more anatomically correct CTs and demonstrates competitive results with respect to semantic segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The key strength of the paper is the clever idea of incorporating HU value maps into an image generation network for CTs. By adding controlled manipulation into the more traditional V2I network, the resulting approach combines the best of both V2I and M2I methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While I found the presented idea quite innovative, it would be very valuable to expand upon the quantitative experiments and discussion. Please also include a thorough description of limitations and failure cases of the approach.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Yes, it should be easy to reproduce the approach from the details included in the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    First, please explain whether the approach is applicable to all CT acquisition methods, or only certain types of CT where HU values are reliable.

    It is not clear if the presented approach improves upon the lack of variability issue in M2I methods. It would be helpful to add an experiment or otherwise discuss this.

    For the downstream segmentation experiment, I would have expected to see results where the model is pre-trained on the synthetic data and fine-tuned on the small amount of real data, as synthetic data is typically used to augment smaller training datasets.

    Section 2.2: in the section that states “MSE between the HU value map and our synthesized CT images”, is the CT thresholded, or otherwise how is it post processed? Some sections of the paper reference details in the supplementary material, which is not included.

    Please also include some sample failure cases of the model. Is it possible that the HU maps and the generated CT do not correspond?

    Small detail: please check grammar in the abstract (“in this study” does not follow sentence flow).

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Certain aspects of the approach were unclear to me, however, the main idea is quite innovative. I hope the authors can address some of the technical questions (please see above) during the rebuttal period.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors have addressed the main issues raised in the reviews: namely description of generalization ability and limitations. I think that this paper demonstrates an interesting idea that be of value to the community.



Review #3

  • Please describe the contribution of the paper

    The authors propose a new medical image synthesis method based on AdaIn GAN and average HU value assignment. With their contribution, the authors claim that less number of annotated images are required to produce more realistic synthetic images that can aid final segmentation output.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel idea to use less human-labeled data to generate more synthetic images.
    • Comprehensive literature review of existing synthetic image generation methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Weak systematic analysis of the proposed method: It is not clear why the proposed method should be of attention to the readers in the field. Does the method really generalize? If so, why? Does it only apply to certain lung nodules in CT images that have distinctive HU values?

    • Experiments are not enough to justify the proposed method: Lung segmentation is regarded as a relatively easy task that could be even done without any supervised ML method, not to mention the necessity of synthetic images. GGO segmentation too, is limited in demonstrating the novelty of the proposed method. The experiments do not fully show the strength of the method - there is only box plots comparing the proposed method and M2I for lung and GGO segmentation.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper seems to be reproducible, checking all the boxes.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • Try to do more systematic analysis of why the proposed method should work. What is the role of the AdaIn GAN? What features do we need from an existing network? Is the method generalizable? Does it depend on HU values?

    • Do more thorough experiments We need more comprehensive comparison tables showing the strength of the proposed method. Two box groups of box plots comparing the proposed method and M2I for lung and GGO segmentation is not enough.

    • More compelling visual examples Visual examples showing why the proposed method generates more viable synthetic images compared to other existing method would be good - Perhaps in a scenario where the ground-truth annotations are scarce or it’s hard to obtain such.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Experiments and analysis don’t back the motivation of the paper: Synthetic image generation is beneficial for rare disease that do not have large number of human annotation readily available, or some pathologies that require multi-modal images to conduct proper analysis and therefore are hard to be annotated even by human experts. Lung segmentation and GGO segmentation are not.
  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The aim of this paper is to use small number of labeled data to generating synthetic learning based augmentation. The reviewer recommendations are conflicting. The idea of modeling HU intensity into synthesis is interesting. However, there are concerns regarding both the rigor of the evaluation and real clinical value in terms of generalizability and difficulties of the task, which need to be carefully addressed before the paper can be further considered. Please see the reviewer comments for further details. Here are important points to address in the rebuttal:

    The concerns about the rigor and generalizability beyond lung and GGO segmentation

    The reviewer’s concern about details in methodology

    The reviewer’s concern for the results

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4




Author Feedback

Our CS2 method can synthesize realistic medical images controllably with unsupervised structural guidance. In addition, our method can also synthesize accurate segmentation masks using only 30 human labeled cases by leveraging features in the generative model. All reviewers highlighted the novelty of our study. Via comprehensive literature review (R3) and experiments (R1), our CS2 method demonstrates promising results (R1, R2) on annotated image synthesis. We noticed reviewers’ concerns about the generalizability (R2, R3) and implementation details (R2, R3) of our method. Here are our responses:

Generalizability and Additional Validation Our work is of significant clinical importance because it can largely reduce the work of manual labeling via image/annotation synthesis. Our method can be generalized for other imaging modalities and diseases. The imaging features (especially HU values) in our model are not modality- or organ-specific, advocating the strong generalizability of our model. We use pixel values to sort unsupervised clusters and inherit the pixel value pattern from the original images for the unsupervised masks. Thus, our synthetic model is not dependent on or sensitive to any organ-specific HU values. Besides, synthetic images for relatively rare pathologies such as Idiopathic Pulmonary Fibrosis CT and high-grade gliomas MR are added in the revision, to showcase the potential of our method for different pathologies and modalities. The synthetic annotated images are validated by experienced clinicians.

Experiments R1 and R2 agreed that our experiments on lung and GGO segmentation on COVID-19 patients have proven the efficacy of our method. R2 recommended further discussion on any failed cases. R3 may have misunderstood our experimental results as “box plots comparing the proposed method and M2I”, and may we highlight that Fig. 5 (a) has addressed the compatible performance of our method compared with transfer learning-based algorithms, and Fig. 5(b) has addressed the powerful synthesis performance from our unsupervised masks compared to fully supervised masks quantitively. As for the limitations of our work (R2), we obtained our segmentation masks from a pixel-wise classifier, so the segmentation masks are fuzzy and have a few (~1%) scattered wrongly labeled pixels. Post-processing algorithms on segmentation masks such as connected component analysis can remove these artifacts easily. We have added this description to our revision. For the question of datasets by R3, although our algorithm leverages minimal human intervention, we require a large dataset with ground truth annotations to compare its effectiveness with other fully supervised methods quantitively. For example, 1000 annotated data were used for M2I and V2M2I models. Thus, we chose GGO segmentation for our comparison experiments. Regarding the details of our experimental settings, all of our CT images were thresholded by a HU value ranging from [-1500,150]. We compared different numbers of annotated samples (maximum 30) for our method with nnUnet finetuned by 10 volumes. According to our clinicians, the time used for labeling 10 3D volumes is compatible with labeling 30 2.5D CT images.

Network Design The proposed method is technically sound (R1) and “combines the best of both V2I and M2I methods” (R2). R3 suggested further justification for the architecture of the proposed model, especially for the AdaIN component. A reference CT image helps the synthesis to learn the style of the whole dataset. According to our experiments, unpaired reference CT images can also increase the variability of synthetic images, which also answers the variability problems questioned by R2. More justifications are added to Section 2.2.

Minor Issues We apologize that our over-length supplementary file was removed by the Chair after submission. Implementation details (dataset and super parameters) are added in the revision. Grammar and writing have been checked thoroughly.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    In the first-round of review, two reviewers provide positive recommendations. I think the rebuttal has addressed the concerns about the limited validation cohorts, rigor of the method, and innovation. In my opinion, this method is a valuable effort on an important, yet long-lasting, issue. For these reasons, the recommendation is toward acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    interesting idea of HU value maps into an image generation network for CT images to reduce human annotation efforts

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The idea of the paper is interesting, and the rebuttal has addressed the main comments of the reviewers. I think the paper would be an interesting contribution to MICCAI 2022.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



back to top