Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Kun Han, Yifeng Xiong, Chenyu You, Pooya Khosravi, Shanlin Sun, Xiangyi Yan, James S. Duncan, Xiaohui Xie

Abstract

Acquiring and annotating sufficient labeled data is crucial in developing accurate and robust learning-based models, but obtaining such data can be challenging in many medical image segmentation tasks. One promising solution is to synthesize realistic data with ground-truth mask annotations. However, no prior studies have explored generating complete 3D volumetric images with masks. In this paper, we present MedGen3D, a deep generative framework that can generate paired 3D medical images and masks. First, we represent the 3D medical data as 2D sequences and propose the Multi-Condition Diffusion Probabilistic Model (MC-DPM) to generate multi-label mask sequences adhering to anatomical geometry. Then, we use an image sequence generator and semantic diffusion refiner conditioned on the generated mask sequences to produce realistic 3D medical images that align with the generated masks. Our proposed framework guarantees accurate alignment between synthetic images and segmentation maps. Experiments on 3D thoracic CT and brain MRI datasets show that our synthetic data is both diverse and faithful to the original data, and demonstrate the benefits for downstream segmentation tasks. We anticipate that MedGen3D’s ability to synthesize paired 3D medical images and masks will prove valuable in training deep learning models for medical imaging tasks.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_72

SharedIt: https://rdcu.be/dnwdT

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes the MC-DPM to generate multi-label mask sequences with high fidelity and diversity adhering to anatomical geometry and produces realistic 3D medical images that align with the generated masks. It is the first to address the challenge of synthesizing complete 3D volumetric medical images with their corresponding masks. Experiments on 3D thoracic CT and brain MRI datasets show that the synthetic data is both diverse and faithful to the original data, and demonstrate the benefits for downstream segmentation tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper has high novelty with abundant experiments. The research topic of generating images from masks has strong practical application value.
    • MC-DPM is able to generate mask sequences directly from random noise or conditioning on existing slices. The relative position of slices is utilized as a condition to guide the MC-DPM in generating subsequences of the target region and control the length of generated sequences.
    • The experiments show that the model proposed in the paper achieves better results in both the evaluation of generated image quality and downstream segmentation tasks.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • To my knowledge, LPIPS is used to measure the similarity between two images, where a smaller value indicates a higher similarity. However, this metric seems to be inconsistent with the results presented in the paper.
    • According to Table 1, the proposed method has a slightly lower FID score compared to DDPM and the authors think this is reasonable because DDPM is trained on 2D images without explicit anatomical constraints. I am doubtful why does the model perform better when the anatomical constraints are removed?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Key resources are available and sufficient details are described such that an expert should be able to reproduce the main results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • As mentioned above, I have doubts about the evaluation metric of LPIPS. A smaller LPIPS value indicates a greater similarity between the two images.
    • Please explain why DDPM trained without explicit anatomical constraints performs better than the proposed method.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea is interesting. The paper is written clearly and the experimental results demonstrate the effectiveness and practicality of the proposed model.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper proposed a 3D medical image synthesis algorithm that able to produce whole volumetric medical images as well as corresponding segmentation masks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    In this paper, a novel approach was taken in which 3D medical images were treated as 2D video sequences. To facilitate the synthesis, a 2D sequence generation algorithm was utilized.

    To generate volumetric segmentation masks, a novel MC-DPM module was proposed.

    In addition, a diffusion fine-tuning module was introduced to further enhance the performance of image synthesis.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While this paper does not have any major weaknesses, the reviewer does have some concerns regarding the justification of the method used. Please refer to the comment section for more details.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors promise to release codes

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The synthesis pipeline is well-illustrated in Fig. 1 and treating 3D medical images as videos is a plausible solution for whole volumetric medical image synthesis. The reviewer found the results in the supplementary file to be fascinating. However, it would be beneficial to provide videos from other directions in addition to the axial plane.

    The reviewer is not convinced that the usage of such a complex video synthesis scheme for 3D mask generation is well-justified. Since the organs/objects in the segmentation masks are relatively large, it would be more efficient to synthesize down-sampled segmentation masks and then up-sample them in downstream tasks. Synthesizing whole volumetric yet low-resolution segmentation masks seems like a better approach for segmentation mask generation as straightforward 3D synthesis can guarantee consistency among slices. Thus, it would be valuable to compare the efficiency of the proposed sequential-based method with the straightforward synthesis of down-sampled 3D segmentation masks.

    In the medical image generation process, the final result is the average of 3D images from three different models. It would be valuable if the author could provide synthetic images from SDM-A, SDM-C, and SDM-S separately. If the synthesized images from these three models vary significantly, the final synthesis result could be blurred by the average operation. The reviewer suggests providing descriptions on whether there is a blurring issue and, if there is, how the authors solved the blurring issue.

    Since the synthesis pipeline is trained on a small dataset, the reviewer is concerned about the variety of synthetic images produced.

    The segmentation masks for brain MRI images were obtained from off-the-shelf annotating tools such as FreeSurfer. The reviewer is curious to know what kind of segmentation mask would be obtained if FreeSurfer were applied once again on the synthetic brain MRI images.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The fascinating results presented in the supplementary files and the use of fair methods are two major reasons for acceptance. However, the paper could be further extended with additional justification for their methods and clinical validation experiments as suggested in the recommendation section.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    This work continues to hold value as a robust algorithm for the 3D medical image synthesis task and meets the qualifications for a MICCAI paper.

    However, the reviewer remains unsatisfied with the response regarding the blurring issue. This raises a contradiction: if the images from SDM-A, SDM-C, and SDM-S exhibit significant variation, averaging them can result in severe blurring. However, if the images from SDM-A, SDM-C, and SDM-S do not differ significantly, the inclusion of three blocks may appear redundant.



Review #3

  • Please describe the contribution of the paper

    The authors propose a method called MedGen3D, based on an initial GAN (Vid2Vid), to generate paired 3D medical images and corresponding masks. Their method relies on Vid2Vid initially, and then uses a semantic diffusion refiner to improve the synthetic images quality.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Generation of paired datasets of 3D images and anatomical masks. The authors demonstrate the applicability of the method on 2 imaging modalities, MRI and CT, and on different anatomical structures/areas, brain and thorax. The method behind MedGen3D seems to be novel, in the way that is combines an already existent 3D GAN (VId2Vid) with a module based on diffusion models, which are known to perform better than GANs.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    State of the art: the text in this section makes some claims which are incorrect/not accurate, and it misses some current state of the art results. For instance, this work reflects the application of diffusion models to generate paired medical data (https://arxiv.org/pdf/2207.08208.pdf). And this work (https://ieeexplore.ieee.org/abstract/document/9893790) disproves the main contribution #1 suggested by the authors, that they are the first to address 3D volumetric image generation with corresponding masks. This reflects that the application of the model is not novel, even though the model itself seems to be.

    2D vs 3D segmentation: it is not very clear why the authors compare results from 2D segmentation models with the ones from 3D models. Averaging a 2D method on 3 views and compare it with an actual 3D model is prone to introduce some error % in the pipeline, influencing the final generated images. Which are then used for segmentation, simply propagating the error throughout the whole pipeline. It’s not clear why the authors would compare 2D segmentation models with 3D ones, in this work where they are trying to generate 3D images.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility details could be further explained, namely the type of GPU used and the training time the whole model took.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    My concerns are mostly focused on: -The segmentation task, since it is not clear why the authors compare 2D models with 3D ones.

    • To avoid the usage of an initial 3D GAN (Vid2Vid) and bring more novelty to the paper, why not using a 3D diffusion model from the beginning? -It seems that the mask the authors used were also synthetically generated. Why not using masks from anatomical models? As in (https://ieeexplore.ieee.org/abstract/document/9893790) or (https://ieeexplore.ieee.org/document/9324763).
    • In figure 3, doesn’t the averaging operation influence the frame consistency? This could, and should, be further explored. Since this aspect is relevant to assess the quality of volumetric 3D data.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work doesn’t seem to be well structured. Some of the experiments seem to be odd, such as comparing 2D with 3D segmentations. If the authors address the main concerns and restructure the manuscript it would improve readability and also its novelty.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    If the authors address the concern regarding the comparison of the proposed method with a DDM, their model will likely perform slightly poorly (as the authors mentioned in their feedback). Therefore, the state of the art results and novelty the authors claim will lose strength.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposed a 3D medical image synthesis algorithm that able to produce whole volumetric medical images as well as corresponding segmentation masks. Experiments on 3D thoracic CT and brain MRI datasets show that the produced synthetic data is relevant, and demonstrate the benefits for downstream segmentation tasks.

    Strengths: high novelty with abundant experiments and strong practical application value.

    Weaknesses:

    • State of the art put forward in the paper does not seem to be fully accurate and thus may impact the novelty of the proposed approach, as noted by Rev #3,
    • additional justification on certain choices are missing,
    • Comparison of 2D and 3D method is not clearly justified.

    Regarding the related articles noted by Rev#3, publication https://arxiv.org/pdf/2207.08208.pdf, it seems to be only on arxiv and not peer-reviewed and published yet.

    Despite the great interest raised by this paper, there are some questions that should be addressed in the rebuttal to strengthen the paper:

    • is there any reason why https://ieeexplore.ieee.org/abstract/document/9893790 is not in the state of the art, otherwise please add if it is an omission
    • comment on the possible blurriness of the final result is the average of 3D images 
    • why such a complex video synthesis scheme is needed for 3D mask generation, why not synthesizing down-sampled segmentation masks?
    • specify the definition of LPIPS
    • justify why 3D and 2D methods are compared
    • other questions raised by Rev #1 and #2 notably, if space is left.




Author Feedback

We thank reviewers for their recognition of the novelty (R1, R2, R3, Meta-R) and comprehensive experiments of our study. Next, we address their major comments.

Q1: Down-sampled 3D Mask Generation with Diffusion. (R2, R3, Meta-R) A1: (1) Our method reduces memory consumption in diffusion models by decomposing 3D volumetric masks into subsequences and generating masks autoregressively under positional guidance. This enables high-quality image synthesis while preserving anatomical structures. (2) We experimented to generate 3D masks (64x64x64) downsampled from 96x320x320 for Thorax and 192x160x160 for Brain, to accommodate NVIDIA A6000 constraints. However, large downsampling ratios could damage small organ structures (esophagus, trachea, CSF) and cause disconnected regions in complex brain structures. Additionally, 3D generation itself is a challenging task. As a result, the FID of the synthetic images (78.2 and 80.6) is significantly worse than our proposed method (39.6 and 40.3). We will include these new experimental results in the revision.

Q2: Compare 2D and 3D Segmentation Models. (R3, Meta-R) A2: We compared Unet2D and Unet3D to emphasize the superiority of 3D segmentation models and the importance of 3D labeled data. This emphasizes the value of our 3D generative model. Additionally, more advanced segmentation models further demonstrate the benefits of 3D synthetic data. We will clarify this point.

Q3: Comments on related works. (R3 & Meta-R) A3: R3 pointed out two references regarding SOTA. (a) For the arXiv preprint, the task is different. Our work focuses on labeled 3D image generation. However, the arXiv paper focuses on 2D image translation between two image modalities. (b) For the IEEE Access paper, this work also addressed labeled image generation with several key differences. (1) Ref work utilized existing anatomical models from [1] and primarily focused on image generation. Our work proposed a diffusion module (MC-DPM) to synthesize 3d masks with learned data distribution and enhanced flexibility, followed by 3d volume image generation. (2) The anatomical model used in the Ref work is specifically designed for a single continuous shape (heart), whereas our work requires learning spatial relationships among separated organs. For brain with multiple complex anatomical structures, LDDMM (Large deformation diffeomorphic metric mapping) used in [1] could not yield accurate templates and mapping in our experiments. (3) Since there is no existing anatomical model that can adequately model the statistical shapes of complex structures and relationships among individual masks, we cannot generate synthetic masks utilizing anatomic models as in IEEE ref work for the experiment. We acknowledge the contribution of prior works and will reference them in our revision.

Q4: Possible blurriness from averaging operation. (R2, R3, Meta-R) A4: During the image generation stage, the refinement is conducted slice-by-slice to reduce artifacts, which may introduce slight inconsistencies between slices. To mitigate this, we refine volumes from three different directions and then average the refined results. The potential blurriness from averaging is minimal due to the small refinement steps based on the initial results, as shown in supp video. We will introduce this issue and conduct further research.

Q5: Definition of LPIPS. (R1, Meta-R) A5: LPIPS measures similarity between two images. In our experiments, we used LPIPS to evaluate the diversity of generated images. Higher LPIPS values indicate better diversity [2][3]. We will make it clear.

Q6: Comparison with 2D DDPM. (R1) A6: Considering the difference with our method, we speculate that DDPM might provide better results because of the implicitly learned anatomical structure and high flexibility. We will revise the statement.

[1]Linking statistical shape models and simulated function in human heart [2]Palette: Image-to-Image Diffusion Models [3]Pluralistic Image Completion




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    dfgdfg



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper is quite borderline, with a wide range of scores. All reviewers have serious concerns but are also intrigued by the paper and results. Overall, the rebuttal helped a tiny bit, for one reviewer going from R to WR while still having concerns. Overall, I think the scores and reviews point to this paper having enough interesting content that it’s worth discussing at MICCAI, so I am leaning towards accept.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work presents MedGen3D, which is a deep generative framework that generates paired 3D medical images and masks. It can help segmentation tasks by producing sufficient labeled data. The novelty of the work is very good, and this opinion is also shared by the reviewers. However, although the rebuttal has addressed most of the reviewers’ questions very well, I have one remaining concern, which seems quite serious. In particular, the results of DDPM are better than those of MedGen3D, and in the segmentation task, there was no comparison with DDPM. In this case, it is not clear whether MedGen3D makes a substantial contribution to the field. If DDPM cannot generate both image and mask, then a method that can perform such data augmentation should be considered in the experiment. The paper in its current form lacks proper comparison with existing methods, and I feel this serious weakness outweighs the strength.



back to top