Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Pinxian Zeng, Luping Zhou, Chen Zu, Xinyi Zeng, Zhengyang Jiao, Xi Wu, Jiliu Zhou, Dinggang Shen, Yan Wang

Abstract

To obtain high-quality positron emission tomography (PET) scans while reducing potential radiation hazards brought to patients, various generative adversarial network (GAN)-based methods have been developed to reconstruct high-quality standard-dose PET (SPET) images from low-dose PET (LPET) images. However, due to the intrinsic locality of convolution operator, these methods have failed to explore global contexts of the entire 3D PET image. In this paper, we propose a novel 3D convolutional vision transformer GAN framework, named 3D CVT-GAN, for SPET reconstruction using LPET images. Specifically, we innovatively design a generator with a hierarchical structure that uses multiple 3D CVT blocks as the encoder for feature extraction and multiple 3D transposed CVT (TCVT) blocks as the decoder for SPET restoration, capturing both local spatial features and global contexts from different network layers. Different from the vanilla 2D vision transformer that uses linear embedding and projection, our 3D CVT and TCVT blocks employ 3D convolutional embedding and projection instead, allowing the model to overcome semantic ambiguity problem caused by the attention mechanism and further preserve spatial details. In addition, residual learning and a patch-based discriminator embedded with 3D CVT blocks are added inside and after the generator, facilitating the training process while mining more discriminative feature representations. Validation on the clinical PET datasets shows that our proposed 3D CVT-GAN outperforms the state-of-the-art methods qualitatively and quantitatively with minimal parameters.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16446-0_49

SharedIt: https://rdcu.be/cVRTI

Link to the code repository

https://github.com/Aru321/CVTGAN

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper aims to reconstruct standard-dose PET (SPET) images from low-dose PET (LPET) images, via a 3D Convolutional Vision Transformer GAN (3D CVT-GAN), which extends the 2D Convolutional Vision Transformer under the framework of 3D Transformer-GAN, with the extended 3D CVT and TCVT blocks equipped into both generator and discriminator of GAN.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper extends the 2D CVT to 3D CVT, which is useful and helpful for 3D data.
    2. This work builds a 3D CVT-GAN for reconstructing SPET images from LPET images, and the experimental results are satisfactory.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The writing and organization need to be improved. Specifically, the methodology can be simplified while the visual comparison of ablation study can be appended.
    2. Except for the number of parameters, the FLOPs can be shown for further evaluation of model complexity.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This work is reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The expression and figures can be improved for better understanding and clear illustration.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work has novelty on extending 2D CVT to 3D CVT for building a 3D CVT-GAN to reconstruct SPET images from LPET images, which is helpful for diagnosis. Thus I recommend to accept now.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    The paper proposed a 3D convolutional vision transformer-GAN model for low-dose PET synthesis.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The method is clear and makes sense.
    • The results include comparison with multiple state-of-the-art low-dose PET synthesis methods.
    • The method was evaluated on real low-dose PET instead of simulated low-dose PET.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • There are so many transformer-based methods. And the method is not considered to be novel anymore.
    • The performance improvement is subtle.
    • The dataset is too small.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code is not provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • The quality of the low-dose PET in Fig.2 is actually quite good. It leads to the question whether this application is meaningful. This can be proved by either the similar performance on a much higher dose reduction rate (visually worse image quality) or the comparison of the diagnosis (e.g. classification of NC vs MCI) on low-dose and each synthesized standard-dose PET.
    • Even for a patch-based model, a dataset of 16 subjects is still very small.
    • The improvement in Table 1 is very subtle. In table 1 and 2, add statistical test to show significance.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty is the main concern.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    Though the t-test suggested the significant improvements, the visual improvement is still subtle. If the author can provide sound clinical reading results in the final version, I would suggest weak accept. But due to the limited visual difference in the paper, I doubt if there will be difference in the reading.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a novel 3D convolutional vision transformer GAN framework to reconstruct standard-dose PET image from low-dose PET image. Specifically, they utilized 3D CVT blocks as the encoder for feature extraction and 3D transposed CVT as decoder for final SPECT reconstruction. Furthermore, the proposed 3D CVT and TCVT blocks employ 3D convolutional embedding and projection instead of using linear embedding and project which overcomes the semantic ambiguity problem.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Ablation study is provided to evaluate the effectiveness of each proposed modules, including patch-based discriminator, 3D CVT blocks, 3D TCVT blocks and convolutional embedding and projection.
    2. Comparisons with state-of-the-art methods show the superiority of proposed method.
    3. This paper is clearly written and easy to follow.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Although the proposed method outperforms other methods in terms of PSNR, SSIM and NMSE, there is no too much difference in the visualization results as shown in Fig.2 for method of 3D-cGAN Transformer. It’s better to provide some zoom-in regions to highlight the better result got by proposed method.
    2. Statistic evaluations such as PSNR and SSIM may not be sufficient enough for proving the reconstruction effectiveness in clinical aspect, this paper can consider adding reader study for clinical diagnosis of NC and MCI using either GT SPECT or reconstructed SPECT respectively.
    3. There is no explanation why the model size of proposed method is smaller than other methods.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Not sure, if the authors agree to share the code to the community, I would consider change my rate

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    It’s better to provide more clinical evaluations. Consider to use a bigger dataset.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    A lot of similar methods have been proposed to tackle the standard-dose PET reconstruction from corresponding low-dose PET images, improvements on PSNR and SSIM are not the key factors to show the effectiveness of proposed method. The authors should consider how to evaluate the robustness and generalizability of proposed method on a larger dataset.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors have addressed most of my comments, I maintain the weekly accept decision.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The presented work is important to the clinical community. The approach is sound. However, reviewer 2 also raised clear concerns regarding the lack of important details and significance of the obtained results over prior work. I invite the authors to carefully read all the reviewers’ critiques and provide answers to the most important questions during the rebuttal.

    Investigating Table 1 I do not see a statistically significant improvement (PSNR, SSIM, NMSE) over prior work especially compared to [21]. This is a major concern that needs to be addressed during the rebuttal. For example, reported SSIM values are the same. Is this metric not more important than the PSNR or NMSE? Would such a small boost in PSNR/NMSE provide improved diagnostic performance? This ties into the other suggestion by reviewer 2 which is using the reconstructed images as an input to a classification method.

    Small dataset size and its effect on the robustness of the method should also be discussed.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    9




Author Feedback

Thanks to all the reviewers (R1, R2, R3, Meta-R) for acknowledgement about our methodological contribution, and their constructive comments for further clarification. We will add the p-values and the GFLOPs below into our final paper.

Q1: Performance improvement. (R2&Meta-R) A1: Paired t-test is conducted on the results of our model and the second-best performer (Transformer-GAN) in Table 1. Our model achieves the p-values (PSNR/SSIM/NMSE) of 0.006/0.047/0.021 for NC, and 0.036/0.005/0.045 for MCI, which are consistently less than 0.05, showing our improvement is statistically significant. Moreover, the number of parameters of our method are much less than the compared methods (Table 1), which is a significant advantage of our model in addition to performance.

Q2: Small dataset size and clinical impact of method. (R2&R3&Meta-R) A2: Please note that, the number of subjects (16) in our dataset is comparable to that of similar works in this field, e.g., 9 subjects used in [6] in MICCAI 2020 and 7 subjects used in [15] in MP 2021. More importantly, despite 16 subjects in total, we can provide a sufficient number of samples to train a good model by extracting 729 large patches from the whole image. Also, to enhance the stability of the model with limited samples, we used the leave-one(subject)-out cross-validation (LOOCV) strategy and the averaged performance is reported to avoid potential bias. In this manner, the total training samples are increased from 15 to 10935, which is sufficient to train our model. It is also noteworthy that, despite 16 subjects, our dataset caters for the variation of both the pathology and the healthy. We agree that applying the real SPET and synthesized SPET to disease classification is more meaningful and makes more clinical sense. However, due to the time limitation, we are sorry for not being able to provide timely results. We will supplement this experiment in the final paper.

Q3: Small difference in Fig. 2. (R1&R2&R3) A3: LPET (4x dose reduction) in Fig. 2 is very noisy compared to SPET, as reflected by the large magnitude in the corresponding error map (second row). The average PSNR of LPET images are only 20.684dB for NC subjects and 21.541dB for MCI subjects, much lower than our synthesized SPET images (24.893 dB and 25.275 dB, respectively). To enlarge the visual difference in Fig. 2, we will provide more high-resolution figures and zoom in critical regions in the final paper.

Q4: Novelty of the proposed method. (R2) A4: We would like to highlight our main novelty as follows: First, our method explores a smart integration strategy of CNN and Transformer in a 3D manner. Different from previous Transformer-based methods, our method tactically embeds 3D convolution operation into the transformer and forms several 3D CVT blocks to tackle semantic ambiguity problem caused by the attention mechanism of transformer, thus further preserving local spatial details for PET reconstruction. Moreover, such combination of CNN and Transformer is more efficient and lighter in model complexity. Second, our method incorporates the transformer in the decoding layer and embeds several TCVT blocks as well, forcing the decoder to focus on global context when performing pixel-level reconstruction.

Q5: Comparison of FLOPs. (R1) A5: By the suggested computation, we obtain the GFLOPs of 3D U-NET, 3D-cGANS, Transformer-GAN and 3D CVT-GAN (proposed) as 40.50, 70.38, 20.78 and 23.80, respectively, demonstrating that our method is computationally efficient.

Q6: Explanation of the number of parameters. (R3) A6: Unlike the comparison methods, extra CNN layers are not employed in our model for performing down/up-sampling as we incorporate it into the embedding stage. Also, the positional encoding and linear layers for embedding/projection are removed as well. Both factors contribute to the fewer parameters than other methods.

Q7: Reproducibility. (R3) A7: Our code will be released at https://github.com/Aru321/CVTGAN.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I am happy with the strong rebuttal. The final version of the paper should include the important details provided in the rebuttal.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes a 3D transformer-based GAN network for generating standard-dose PET images from low-dose PET images. The reviewers raised major issues regarding the novelty of the paper, improvement of experimental results, and clinical validation. After rebuttal, two reviewers indicated acceptance of the paper. R2 revised the scores but still raised concerns about the clinical validation experiments. AC considers that the feedback letter provides effective responses to queries related to methodological innovation and experimental results, and demonstrates the number and complexity of network parameters. Overall, the strengths of the paper outweigh the weaknesses. AC recommends acceptance of this paper and also hopes that the authors will add relevant clinical trials in the final version.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Overall the paper is a well written description of a GAN+Transformer block architecture to reconstruct PET. Improvements are subtle/marginal in many instances but as the model has orders of magnitude fewer parameters than comparison models, this authors are presenting a more compact model with comparable results of the state of the art.

    While there are some concerns over novelty, the size of the dataset and an interest in explaining why the proposed model has fewer parameters. However, I think the rebuttal overall did a decent job of responding to most of the concerns and is scientifically sound with a small overall contribution to the image reconstruction field.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8



back to top