Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Minh Hieu Phan, Zhibin Liao, Johan W. Verjans, Minh-Son To

Abstract

Medical image synthesis is a challenging task due to the scarcity of paired data. Several methods have applied CycleGAN to leverage unpaired data, but they often generate inaccurate mappings that shift the anatomy. This problem is further exacerbated when the images from the source and target modalities are heavily misaligned. Recently, current methods have aimed to address this issue by incorporating a supplementary segmentation network. Unfortunately, this strategy requires costly and time-consuming pixel-level annotations. To overcome this problem, this paper proposes MaskGAN, a novel and cost-effective framework that enforces structural consistency by utilizing automatically extracted coarse masks. Our approach employs a mask generator to outline anatomical structures and a content generator to synthesize CT contents that align with these structures. Extensive experiments demonstrate that MaskGAN outperforms state-of-the-art synthesis methods on a challenging pediatric dataset, where MR and CT scans are heavily misaligned due to rapid growth in children. Specifically, MaskGAN excels in preserving anatomical structures without the need for expert annotations. The code for this paper can be found at https://github.com/HieuPhan33/MaskGAN.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43999-5_6

SharedIt: https://rdcu.be/dnwjg

Link to the code repository

https://github.com/HieuPhan33/MaskGAN

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a cost-effective framework that enforces structural consistency by using automatically extracted coarse masks, which consists of a mask generator and a content generator. The experiments show the effectiveness of proposed method on MR-to-CT synthesis.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is very well written and the presentation is clear.
    2. As far as my knowledge goes, respectively generating each component in CT image is an interesting idea.
    3. The experimental comparison are sufficient and comprehensive.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. There is a parameter N in Section 2.1. What is the value of N utilized in experiments? Would this parameter N influence the performance a lot when it is too small or large?

    2. What are the values of hyper-parameters \lambda_mask and \lambda_shape in this work? I thought they are not defined in paper, or maybe I missed it.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors would provide the code, but would not provide the data.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Add the parameter settings in experiments. Please refer to the weaknesses for details.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written and interesting. The experiments are comprehensive.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper presents a CycleGAN-based framework for MRI-CT image translation. The generator networks consist of content and mask generation branches. The mask branch predicts multiple masks representing brain structure, while the content branch generates texture guided by these masks. By employing basic image operations, a coarse mask is obtained to guide CycleGAN training, ensuring the synthesis of structurally consistent images without requiring segmentation masks for training. The paper also introduces two loss functions applied to the generated masks to maintain consistency.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) This method eliminates the need for ground truth segmentation for structural consistency loss.

    2) The introduced mask and shape consistency losses enhance the overall performance of the method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) In the method section, the authors claim that each channel A_i in the mask tensor focuses on different anatomical structures, but there is no empirical evidence provided to support this statement. This claim should be substantiated with visual samples generated by the network.

    2) The process for generating coarse masks is not well-defined. No hyperparameters related to thresholding, morphology, or normalization are provided. These details are essential for ensuring reproducibility.

    3) The dataset consists of 262 samples, divided into 249/1/12 (train/eval/test) subsets. It appears that these splits remain unchanged throughout the experiments. The caption for Table 2 indicates that the experiments are repeated five times. The authors should explain why k-fold cross-validation is not used, as the test set seems small and may not adequately represent the overall dataset distribution. Employing a k-fold approach could help address this issue.

    4) Table 2 reports high PSNR values and low SSIM values (assuming ssim values are scaled to [0, 100] range). The authors should provide an explanation for this phenomenon, which typically occurs when the network generates overly smooth (blurry) outputs. The visual results suggest a lack of detail. Could this be attributed to pre-processing and/or normalization?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The manuscript provides sufficient details to implement the proposed architecture. However, the paper does not offer complete information regarding coarse mask generation, which could potentially hinder reproducibility. Nevertheless, the authors have stated that they will release the code after the paper is accepted.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    1) SSIM values in Table 2 need to be checked. They should be between 0 and 1 but the table shows values around ~50. Fig 1 has examples with the correct SSIM range.

    (see the weaknesses part for more details.)

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper demonstrates promissing results and the method eliminates the need for a segmentation mask to enhance the generation outcomes. However, the evaluation of the method relies on a very small test subset, and the visual results appear to be less than satisfactory.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper developed an unsupervised MaskGAN model for MR-to-CT translation. The data used for model training and testing are pediatric patients less than 2 years. Results show that the proposed model outperformed other GAN models, including CycleGAN, AttentionGAN, sc-CycleGAN, and shape-CycleGAN.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of this paper are: (1) This study proposed a new GAN-based method for MR-to-CT translation, which takes the structure and shape information into consideration. (2) The dataset used in this study has great clinical value. This study included 262 MR and CT pairs of pediatric patients with age between 6-24 months. Due to the anatomy difference between children and adult, the model trained with adult images may not perform well in pediatric patients, this study trained a model that tailored to the needs of pediatric patients.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Despite the advantages of this manuscript, there are still some main concerns: (1) The performance of the proposed model and comparison methods looks unsatisfied, especially for bone structures near the nose, as indicated in Fig. 1 and Fig. 3. I wonder if the authors could add some supervised models for comparison to see if the unsatisfied performance also occurs in supervised models. (2) The dataset with pediatric patients is of great clinical value. I wonder if all the patients are with tumor? Tumor related samples are preferred for the model training since the application of this technique is tumor related. The authors also should show the performance of synthetic CT in tumor regions. (3) The authors used 249 patients for model training, 1 patient for validation, and 12 patients for testing. Could the authors provide some justification about this? Why only use 1 patient and 12 patients for validation and testing respectively, is that enough for the model development and evaluation?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Good reproducibility with potential open source code as indicated in abstract.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    In the further study, the axial view and coronal view are recommended to be illustrated in visual comparisons. Then, some clinical evaluations such as dose calculation and treatment planning are recommended to perform to demonstrate the clinical value of this study due to there are already lots of MR-to-CT translation networks have been developed. In addition, the clinical value of this study (tailored for pediatric patients) could be further enhanced in introduction.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novelty, application related.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors addressed my questions and I concur with their response that performing MR-to-CT translation for pediatric patients is a challenging task, mainly due to the difficulty of obtaining paired data. Nevertheless, it is encouraging to see ongoing efforts to explore this area for the benefit of pediatric patients.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Assessment: A structure consistency preservation based cycleGAN approach for synthesizing CT from brain MRI images. The novelty of the approach is that it doesn’t require any segmentations to guide the synthesis and can be used in an unsupervised way. The presented results indicate slight improvements over compared methods. However, there are a few details that need to be addressed including presenting evidence for how the different mask tensors focus on different anatomical parts of the brain, providing more information regarding the hyperparameter selection and tuning, as well as method describing the coarse mask generation. It would also be useful to show the impact of poor quality mask generation in the presence of MRI artifacts and motion for example - how much does that limit the accuracy. The manuscript also seems to miss some related works. For example, a similar structure preserving approach was developed by Jiang et.al, “PSIGAN: Joint probabilistic segmentation …” in IEEE Trans Med Imaging. It would be good to contrast the proposed approach to explain its improvements over the work of Jiang et.al. Finally, as pointed out by reviewers, an explanation for why the approach achieves higher PSNR but low SSIM should be discussed. Also, is the approach only applicable to images without tumors or can it also generalize to situations where tumors are present in MRI - this is an important use case for synthetic CT generation for MR guided radiotherapy workflows. Furthermore, clinically relevant metrics such as dose volume metrics could be included to show how the method works for clinical application.

    To address in rebuttal: Please provide more details of the method, in particular how coarse mask generation is done, how accurate the coarse mask generation is and what is its impact on the accuracy. Please provide details and justification for the hyperparameter selection. Also explain/show why or how the different mask tensors attend to specific anatomical regions. An explanation for why the approach achieves higher PSNR but low SSIM should be discussed. Please discuss the use case and limitations of the approach such as its applicability for images with tumors and include clinically relevant metrics such as dose volume metrics.




Author Feedback

We are grateful for valuable comments and glad that AC and reviewers found our experiments comprehensive and sufficient (R1) with promising results (R2), and recognized the paper’s interesting idea (R1) and great clinical value (R3).

  1. Small test set (R2.3, R3.3): We only have 13 paired volumes for evaluation; other 249 volumes are unpaired, precluding k-fold validation. Acquiring pediatric datasets is complex due to sedation requirements and sensitivity to ionizing radiation. Despite collecting data from the largest public imaging provider in a 25M+ populated country, limited MR-CT pairs met the criteria (within 3 months, correct MRI sequence performed). Yet, our test set is still comparable to similar MRI-CT studies: 15 in MICCAI-W [14], TMI2020 [15], 10 in ISBI2019 [4], and 8 in CBM2022 [1*].
  2. Visual results (R2.4, R3.1): Unlike adult datasets in [14,15], pediatric datasets are easily misaligned due to children’s rapid growth between scans. Thus, suboptimal visual results can be expected, especially in unsupervised training. The lack of paired pediatric data also prevents us from adding supervised methods (R3).
  3. Low SSIM (R2.4): Low SSIM scores are not uncommon in challenging unpaired MR-to-CT datasets: ~0.55 in [1] and ~0.25 in [2]. SSIM of 0.58 on our misaligned pediatric data is unsurprising. Yet, transfer learning (TL) can boost SSIM [1*]. We pre-train models on 100 paired T1-to-T2 brain samples from Brats2018 using both supervised/unsupervised losses. Transfer-learning to our dataset, our MaskGAN achieves 0.66 SSIM, surpassing second best, shape-CycleGAN (0.61). Our visual results improve after TL (R3.1).
  4. Mask generation (R2.2): After normalizing into range [0,1], we apply a binary threshold of 0.1 (selected by inspecting the histogram). Binary opening with a 3x3 structuring element is applied to remove small artifacts. After connected component analysis, we fill in small holes with volumes < 1cm^3 (tuned empirically from 0.1-2) in the masks. Our performance is robust to error-prone mask generations (Meta), as shown in Fig. 4.
  5. Anatomical region focus (R2.1): Constrained by rebuttal’s format, we quantify attention scores per region. We roughly divide the image into skull-S (using edge detection with 3-pixel width on binary masks), frontal-F (from midline to leftmost edge) and posterior-P brain regions (from midline to rightmost edge). We average scores in each A_i for 3 regions. Each mask focuses on different regions. A_2 has the highest score in S (0.41), compared with 0.26-F, 0.33-P (normalized to [0,1]). A_1, A_4’s highest are P: 0.65 and 0.57. A_3, A_5’s are F: 0.48, 0.63. Visual results will be included in Suppl.
  6. Tumor performance (R3.2): Our dataset has mixed pathologies (tumor, vascular, trauma), but the scarcity of pediatric neuro-radiologists precludes annotations. Yet, armed with attention mechanisms, MaskGAN is aware of different structures in the image and potentially preserves pathological structures if present. We’ve procured tumor annotations for future evaluations.
  7. Novelty (R3): Though many MR-CT works exist, unpaired MR-CT research is limited. Ours is the first to address structural misalignment in unpaired MR-to-CT without requiring segmentation annotation, whose contribution is endorsed by R1 and R2.
  8. Clinical evaluation (R3): We acknowledge the importance of clinical evaluations. Yet, it is beyond our current scope, which centers on novel theoretical development (see 7). We are mobilizing clinical teams to facilitate contouring, treatment planning, and clinical evaluations for journal extension.
  9. Parameter selection (R1): lambda parameter tuning was presented in Table 1,2 in Suppl. MAE is insensitive to N when N>4: 22.63 [N=1], 22.45 [N=3], 21.56 [N=5], 21.59 [N=8]. [1] Alaa et al., Paired-unpaired Unsupervised Attention Guided GAN with transfer learning for bidirectional brain MR-CT synthesis, CBM2022. [2] Jin et al., Deep CT to MR Synthesis Using Paired and Unpaired Data.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Post rebuttal: The key contribution is a cycleGAN approach to synthesize CT images in pediatric patients from MRI images without requiring any structure segmentations, allowing for a fully unsupervised approach that is practically applicable for clinical use cases. The presented results indicate slight improvements over compared methods. The authors addressed the reviewers’ critiques. Assuming the authors will incorporate the responses including the explanation of the method, reason for accuracy metrics variabilities as asked by the reviewers as well as clearly clarify the scope of the paper with respect to the limitation for clinical evaluation in its current form, the paper should be considered for publication. It addresses an important and challenging problem of handling pediatric images, which do depict large variabilities due to growth related changes.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The idea of this paper is not new. There were papers 3-4 years ago that solved the same problem using very similar strategies (e.g., https://ieeexplore.ieee.org/document/8759529). The experiments are relatively limited - synthesizing MR to CT in brain is not a very challenging task. Other body parts should be considered as well.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper addresses an important challenge in pediatric neuroimaging and presents an unsupervised image synthesis method. The authors have addressed all the comments.



back to top