Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Rui Hu, Huafeng Liu

Abstract

Positron emission tomography (PET) image reconstruction is an ill-posed inverse problem and suffers from high level of noise due to limited counts received. Recently deep neural networks especially convolutional neural networks (CNN) have been successfully applied to PET image reconstruction. However, the local characteristics of the convolution operator potentially limit the image quality obtained by current CNN-based PET image reconstruction methods. In this paper, we propose a residual swin-transformer based regularizer (RSTR) to incorporate regularization into the iterative reconstruction framework. Specifically, a convolution layer is firstly adopted to extract shallow features, then the deep feature extraction is accomplished by the swin-transformer layer. At last, both deep and shallow features are fused with a residual operation and another convolution layer. Validations on the realistic 3D brain simulated low-count data show that our proposed method outperforms the state-of-the-art methods in both qualitative and quantitative measures.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_18

SharedIt: https://rdcu.be/cVRvJ

Link to the code repository

https://github.com/RickHH/TransEM

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The main contribution of this work is the introduction of swin transformer into PET reconstruction. The image reconstruction is done in the ML-EM iterative style, following exactly the framework in [11], with the key update equation 9 being the same as the equation 6 in [11]. The reconstruction regularisation operates in the image domain, and is done by applying the swin transformer, a newly (relatively) developed model in the NLP-> computer vision field, which can efficiently capture long-range dependencies in an image. This is expected to improve the PET image reconstruction results in the proposed work.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work assesses the performance of incorporating swin transformers into PET image reconstruction.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. This work uses the swin transformer as the image regularisation model in an existing reconstruction framework. Demonstration of the impact of incorporating the swin transformer needs to be stronger. It is hard to see the advantage given the low image resolution involved in the work. The swin transformer is supposed to capture long-range dependencies in the image, however this benefit is not clearly shown in the results. Also, with the swin transformer as an image quality improving tool, the authors did not justify the need to incorporate it into the reconstruction, given that it can be used post reconstruction as well.

    2. This work is done in 2D, where PET image reconstruction is a 3D problem.

    3. Only simulated data were used to train and assess the proposed model. As the authors mentioned in the discussion, clinical evaluation will greatly strengthen the impact of this work.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have given enough details to reproduce this work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    How is the variance-bias analysis?

    Fig2. looks like the bias is really high from the ground truth Fig4. why does DeepPET show a different slice?

    Typos: Page 4: “and The LayerNorm”, “The Whole process”

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The impact of introducing the swin transformer into PET image reconstruction needs to be strengthened in this work. The authors’ intention of trying a ‘SOTA’ model from another field is understandable, and it can be convincing with appropriate demonstration.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The work proposes a novel TransEM method (an image reconstruction method based on MLEM and residual swin-transformer) for PET image reconstruction. Compared to the traditional convolutional neural networks-based methods, TransEM owns the strong ability in modeling long-range dependencies of the measurement data. It is able to reduce noise without compromising on image details. This is validated with simulated human brain data, and the robustness analysis is also well performed.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The swim-transformer is relatively new in the computer vision field. It is a good attempt to apply swim-transformer to the PET reconstruction field. The article is well organized. The authors analyze the robustness of the proposed TransEM on down sampled cases and perform the experiments to investigate the generalization ability of different models, which validates the excellent performance of the proposed method. The experiments demonstrate the claims set out by the authors.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) Limited novelty: the proposed work is a special case of the swin-transformer, the current work has no obvious innovation in network architecture, except the additional shortcut added before and after the convolutional layer. 2) Without clinical evaluations: All the experiments are performed with simulated data. It is hard to check whether the proposed method is applicable in real applications. In addition, for practical use, TransEM may encounter difficulties in collecting abundant clinical raw data for the network training.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    NA

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    1) The authors should present a concise explanation about Patch Embedding layer. 2) The authors should state the choice of the hyper-parameters more clearly, such as the choice of patch size M.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is the first attempt to apply the swim-transformer in PET image reconstruction. Authors combine MLEM and vision transformer together to reduce the noise which is not compromised on image details.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #4

  • Please describe the contribution of the paper

    The paper describes an already presented approach combining an EM update and a neural network to solve the PET reconstruction inverse problem. The authors change the CNN network to a swin-transformer, which is the main contribution of the paper. The presented method is the best over other studied methods (some classical PET reconstruction algorithms and Deep learning based methods).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The introduction well steers the proposed method, which is well divided into three parts. We understand why the authors want to use transformers in PET reconstruction. Moreover, starting from a recent state of the art idea (FBSEM) and combining with recently proposed transformers instead of CNN is a relevant combination of recent methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weaknesses are more about clarity and organization rather than scientific aspect.

    There are lots of mistakes which catch the eye when reading: some misprints (“reconstrution”, page 3, “Possion”, page 3, equation (1)), “the” missing several times, capital letters in the middle of sentences, commas missing to better understand the structure of the sentence, spaces missing between words and their acronyms or after colons.

    There are lots of imprecision in the chosen values (cf detailed constructive comments) and it seems that there is confusion between training set and validation set when the authors talk about hyperparameters.

    The analysis of the results is light, sometimes unclear (the authors said TransEM does not perform so good, and then, it is the best in addition to DeepPET), not argued and not scientific enough: “relative not so good results”, or subjective: “lots of excellent works”.

    The different number of counts in the experiment is unclear. The experiment seems to be a “low count” simulation. The network is trained with “high dose” images, which seems to correspond to the “high count” description from the authors. A second level of count is presented in the “robustness analysis” part, but a table presenting those results is shown above. In this part, it is not clear whether the number of data is the same as in the low count simulation, or if the data are “downsampled” or not. Moreover, the PSF of the high count simulation is different from the low count one, and not specified for the robustness analysis, whereas the scanner is always the same, which defines a unique PSF. All of this should be clarified.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors made an effort to make their paper reproductible, which lots of setup values.

    I did not find in the paper or in supplementary material a way to access the code, whereas it seems to be done according to the reproducibility checklist filled in by the authors. Maybe this is not possible at this stage.

    The authors detailed the hyperparameters values of the neural network, of the method, and the initialization of the image, which is very important. However, were several runs make ? Or was the initialization of the neural network weights fixed ?

    About the PET simulation, some simulation setup is missing, like the system matrix modelling, the number of line of responses.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The introduction structure is good, but be sure to make more link between the different methods you talk about. You still can refine a bit the state of the art / the explanation of each pros and cons for each category to be more accurate :

    • “represented by filtered back-projection(FBP [2]) and maximum-likelihood expectation maximization(ML-EM [3])”. Why “represented” by these 2 methods ? What does it mean ?
    • “solve this problem well”: what problem ? I think you meant the modeling of physical properties, but it is not clear whether it is that or the noise problem.
    • Examples on post processing: talk about the more common one : gaussian filter.
    • I don’t know this paper : “Machine learning in PET: from photon detection to quantitative image reconstruction”. You may only cite [8] from Reader et al., which is a wide overview of deep learning in PET reconstruction.
    • Gong et al. [10] is not a unrolled network from what I know, maybe I am wrong.

    Maybe you should skip the part with historical methods and focus more on the DIP learning methods, or be sure to be very clear and accurate.

    The equation in the method part requires slight modifications to be very clear : I was a bit confused about equations (4) and (5). I did not know the FBS algorithm, but when reading it for the first time, I understood that theoretically, it was equivalent to the optimization problem (2) because you say “used to split the objective function”. I did not understand directly that it corresponds to the equations of an iterative algorithm.

    “where the goal is to perform the pixel to pixel fusion” : you should replace it by something like “can be viewed as a pixel to pixel fusion.

    “The hyper-parameter α was learned from training data”. Did you mean from the validation set ? Otherwise I do not understand how you can learn it.

    The fact that the number of unrolled blocks is hand-crafted should be said at the same time the other hyperparameters are learned from training data.

    Ambiguities which need to be rephrased :

    • Prior is ambiguous : you can talk about prior information as additional or anatomical information, used in kernel methods. But prior in the Bayesian point of view can be used for maximum a posteriori, which is a category of Penalized Log-Likelihood (PLL) methods. Please do not merge these 2 different methods as one called “prior-incorporative methods”. Anatomical information enable to improve the image quality by adding information. Penalized Log-Likelihood (PLL) methods are used to decrease the noise, directly in the optimization process.
    • What do you mean by “ablation study” ? Removing RC ? “ablation” is a medical term which should not be used in your case.

    Number of counts in your table is smaller than 1.

    Figures 2 and 4 show normalized images, which does not enable the reader to do a fair comparison between the different methods. You should show them with the same contrast.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is a very good idea, based on two recent state of the art methods. This is totally relevant to try combining them as both are at the cutting edge of PET reconstruction (unrolled methods) and feature extraction (transformers). The work just needs to be more thorough, with a clean way of writing and no mistakes, and more detailed discussion on the results. It lacks a bit of work which is noticeable when reading, but this can be improved to the conference.

  • Number of papers in your stack

    1

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors replied rather well to the rebuttal. The results analysis is clearer, and without subjective remarks. I agree with R1 on perceptual image quality measures and the variance bias trade-off. The number of counts and simulation setup is clear now, but I still do not understand why the psf is changed for the robustness analysis as the scanner is the same. The 𝛼 hyperparameter is optimized with the training data. I do not understand how, but the authors are confident in it, and this is the same as in “FBSEM”. The paper becomes acceptable for the conference, but I hope the chosen words will be less ambiguous in the revised version to make the paper easier to read.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper addresses the inverse PET reconstruction problem by combining state-of-the-art iterative optimization algorithm with a recent swin vision transformer. The three reviewers recognize the idea as sound and potentially interesting. Validation includes experiments on simulated brain data and a robustness analysis. R3 and R4 acknowledge the work as reproducible.

    Points to address in the rebuttal and the revised version are:

    • Novelty: discuss the challenges of incorporating a transformer within the iterative reconstruction and the points that make this integration not straightforward. Can you confirm this the first attempt paper to try such integration? How is the integration better than using the transformer as a post-processor?
    • A weakness raised by R1 and R3 is the lack of validation with clinical data. Is there a reason why such evaluation is not done?
    • While R1 mentions it is difficult to see the improvements associated with transformers’ long-range properties, R3 states the proposed method has “excellent performance” and “the experiments demonstrate the author’s claims, while R4 mentions the analysis of the results is light. Restate: what are the quantifiable benefits of the proposed approach showing the advantage of transformers over other methods? (point to tables/results in the paper).
    • Explain how hyperparameters are chosen / learnt (R3, R4)
    • Clarify the patch embedding layer, the number of counts, the uniqueness/or not of the PSF, and the simulation setup
    • Is code going to be provided?
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6




Author Feedback

A1(Novelty): We confirm that this is the first paper attempt to apply the swin-transformer layer (STL) based model in PET image reconstruction. We would like to clarify that this work(TransEM) is not a simple integration of STL and iterative reconstruction. Directly using STL as regularization will cause poor convergence, data-hungry and paying too much attention to deep features while ignoring the details. As in medical imaging, the shallow detail is important and the large amounts of clinical training data are difficult to obtain. So we proposed to add a conv-layer before and after STL to extract both shallow and deep features of images and shortcuts for better convergence. This is not a trivial combination, but a proven subtle design. Without significantly increasing parameter numbers, the learning ability is greatly improved. Unlike other transformer approaches, our method does not require large amounts of data, just a few hundred slices of training data were used as mentioned in the paper. And the convergence is greatly improved with only 23 epochs run in model training. For post-processing, the final results are sensitive to the pre-reconstruction algorithms. Network design with transformer for post-processing requires a large number of parameters and training data. The final results of post-processing are less reliable and interpretable sometimes, while the interpretability of TransEM is strong and the parameters are less. TransEM is also supposed to require less training data. A2(clinical evaluation): This paper is mainly about exploring the potential of using Transformer in PET image reconstruction, and the easy-to-access simulation data has been proved to be effective and reliable in “FBSEM”. At the same time, we have completed the protocols and ethical considerations required to access clinical data and will include clinical results in future work. A3(Advantage): TransEM has excellent performance as shown in Fig.2/Table1. The long-distance dependency (LDD) is reflected in the correlation between different tissues in PET images. With the modeling of LDD, TransEM achieves better performance in both global structure and detail, which is particularly proved by SSIM as shown in Table1. We first verified TransEM under 1/10 downsampled data(count is 5e5), the training label is reconstructed by OSEM with high count(5e6) data. In robustness analysis, we verified TransEM under 1/4(count is 1.25e6),1/100(count is 5e4) downsampled data with the same training label. Each experiment involves retraining and testing. As shown in Table1, TransEM beats all comparison methods at different counts except DeepPET in 1/100 downsampled situation, while we would like to emphasize that it looks like DeepPET got pretty good PSNR and MCRC, in ultra-low count situation, due to lack of physical constraints, the over-fitting of DeepPET is severe and the results are not very reliable which is proved true when we trained the three learning methods with transverse slices and tested with sagittal slices. A4(hyperparameters): α: To eliminate the need of hand-crafting the α which is often done in traditional iterative framework, we made it a network parameter that learned from the training data; Window size M: We chose M as 4 based on the image size and the GPU memory and have tested that larger M loses details and smaller M loses the structure. A5 (clarifying): Patch embedding(PE): We thank the #R3 for pointing out this issue. We apologize that Fig1 is an earlier version that has not been updated. PE is not necessary when using shifted window attention mechanism, we have corrected this in the revised version. The number of counts: please refer to A3. In high count sinogram(used to reconstruct the label image) generation, PSF is 2.5mm, while in three downsampled sinograms, it is 4mm. The system matrix is simulated with Siddon projection. The LOR_num is 172(radial)*252(angular). A6(Code): The code and models are available at github.com/RickHH/TransEM




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal has clarified points about simulation and number of counts, as well as provided a complementary analysis of the results (which should be also included in the paper in case of final acceptance). Some questions remain about the choice of metrics (Perceptual instead of quantitative) and the clarity of learning alpha and the PSF change. Despite these and the experimental validation not being thorough, I support the acceptance of this paper as I am convinced it will raise interesting discussions within the community, and it will be of interest to those working at the interface of inverse problems and deep learning.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The work proposes a novel TransEM method, an image reconstruction method based on MLEM and residual swin-transformer, for PET image reconstruction. Compared to the traditional CNN-based methods, TransEM owns a solid ability to model long-range dependencies of the measurement data. It can reduce noise without compromising on image details. This is validated with simulated human brain data, and the robustness analysis is also well performed. In addition, the authors replied rather well to the rebuttal. As a result, the analysis of the results is more straightforward and without subjective remarks. Hence, I recommend accepting this submission.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposed to embed Swin-Transformer as a regularizer into the ML-EM iterative framework for standard-dose PET image reconstruction from low count sinogram. The technical contribution seems to be incremental. The choice of Swin-Transformer was not well justified. It was only argued that it explores long-range dependency of features, which is common to all transformers, but not explained why Swin-Transformer is preferred over other types of transformers, such as the conventional ViT. Moreover, the experiment was only conducted on simulated data. The ablation study on RC (claimed to be a non-trivial contribution) is insufficient without any quantitative results.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    12



back to top