Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Huidong Xie, Bo Zhou, Xiongchao Chen, Xueqi Guo, Stephanie Thorn, Yi-Hwa Liu, Ge Wang, Albert Sinusas, Chi Liu

Abstract

Cardiovascular disease (CVD) is the leading cause of death worldwide, and myocardial perfusion imaging using SPECT has been widely used in the diagnosis of CVDs. The GE 530/570c dedicated cardiac SPECT scanners adopt a stationary geometry to simultaneously acquire 19 projections to increase sensitivity and achieve dynamic imaging. However, the limited amount of angular sampling negatively affects image quality. Deep learning methods can be implemented to produce higher-quality images from stationary data. This is essentially a few-view imaging problem. In this work, we propose a novel 3D transformer-based dual-domain network, called TIP-Net, for high-quality 3D cardiac SPECT image reconstructions. Our method aims to first reconstruct 3D cardiac SPECT images directly from projection data without the iterative reconstruction process by proposing a customized projection-to-image domain transformer. Then, given its reconstruction output and the original few-view reconstruction, we further refine the reconstruction using an image-domain reconstruction network. Validated by cardiac catheterization images, diagnostic interpretations from nuclear cardiologists, and defect size quantified by an FDA 510(k)-cleared clinical software, our method produced images with higher cardiac defect contrast on human studies compared with previous baseline methods, potentially enabling high-quality defect visualization using stationary few-view dedicated cardiac SPECT scanners.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43999-5_16

SharedIt: https://rdcu.be/dnwwu

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #5

  • Please describe the contribution of the paper

    In this paper, the authors propose to use a 3D Transformer-based Dual-domain network (named TIP-Net) to reconstructs few-view cardiac SPECT using a two-stage process. To do that, they first reconstruct 3D images from the projection data using transformer reconstruction network. They then combined with the original few-view reconstruction for further refinement. The model was validated on phantom, porcine and human data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Myocardial perfusion imaging with SPECT is commonly used to assess the presence and extent of myocardial ischemia.

    • Enhancing SPECT image quality is highly beneficial as it can improve image interpretation and analysis.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The proposed architecture lacks clarity on its novelty and improvement over the state-of-the-art, particularly in utilizing 3D information, as it solely relies on slice-by-slice (2D) data.

    • The authors have made several unverified assumptions without referencing experiments.

    • Additionally, there are concerns about statistical power, as transformer-based methods require large datasets and may perform poorly on small ones. The training dataset consisted of only eight porcine and two physical phantom samples, and it is unclear how the data was split for training, validation, and testing.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper lacks detailed information about the architecture, making it difficult to replicate it.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The paper lacks clarity on the limitations of previously proposed methods and how the current work improves upon them. The authors claim limitations without providing citations or justifications.

    2. The authors should compare their method with state-of-the-art techniques such as Xie et al., AUTOMAP, Würfl et al., etc., and demonstrate how their approach addresses those limitations.

    3. The authors argue that previous methods have limitations in adapting to 3D data, but their own method does not utilize 3D data. Instead, it relies on slice-by-slice (2D) information and iterates 50 times to generate a 3D volume. Therefore, the advantage of the proposed method compared to the state-of-the-art is not clear, as 3D information is not effectively explored.

    4. In each loop of the 50 iterations, the authors use different trainable parameters, resulting in the complete loss of 3D information in the model.

    5. The paper lacks information on how the data was split into training and validation sets. It is unclear how the best model was chosen and how hyperparameters were optimized. Additionally, the stopping criteria for training is not specified. I suggest to clearly describe which set of data was used in which part of the network and clearly define the testing set for evaluation.

    6. Given the data’s small size and the convergence requirements of transformer models, it is unclear how the authors prevented overfitting and ensured the model learned a useful and generalizable representation.

    7. The results presented in Table 1 appear to include data used during training. It is crucial for the authors to evaluate their method on a separate and complete hidden dataset (testing data) to demonstrate the robustness of their proposed approach.

    8. The authors propose a novel “3D Transformer-based Dual-domain network,” but it is not evident what is truly innovative about the architecture, especially considering that the network is effectively 2D rather than 3D. An explanation from the authors would be helpful.

    9. According to the results in Table 1, both 3D-CNN and Dual-3D-CNN perform as well as TIP-Net. It is unclear what advantages the transformer-based model offers in comparison.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Based on the lack of comparison with state-of-the-art methods, lack of describing what is effectively novel in this work and statistical power.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #7

  • Please describe the contribution of the paper

    The paper: Transformer-based Dual-domain Network for Few-view Dedicated Cardiac SPECT Image Reconstructions presents a novel dual-domain (projection and image) transformer-based architecture for SPECT reconstruction, demonstrating improved performance compared to previous works.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This work proposes a novel transformer-based architecture for projection-to-image reconstruction, which is smart in the sense of capturing long-range correlation.
    2. The authors present solid experiments and results (both quantitative and visualization) on both phantom and human studies.

    Overall, a solid paper.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The system matrix S is huge, I assume you don’t compute it by naive matrix multiplication, can you elaborate on how you apply it?
    2. This work also incorporate an adversarial loss. I wonder what is the contribution of the adversarial loss, sharpening the image? It’s a well-know problem the adversarial loss can bring artifacts and hallucinations, can you elaborate on your findings regarding the adversarial loss?
    3. The transformer architecture is interesting, I wonder have you done ablation studies on replacing the transformer with other architecture while keeping the two-stage framework? Would be curious on learning the contribution of the transformer.
    4. [OPEN discussion] This work uses a feed-forward network. I wonder have you considered using the type of unrolled network to combine the system equation S with deep learning in an iterative manner (since for SPECT recon, its an linear inverse problem). This has been widely used in MRI reconstruction, which demonstrates great success, could you elaborate on your thoughts?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Yes, the authors claimed that they will release the code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Solid work, I would appreciate some analysis on the loss function and network architecture design, e.g., ablation studies.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall a solid paper in my mind, well-designed experiments, novel network architecture, would appreciate some thoughts on the questions I raised.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    In this work, the authors proposed a dual-domain network to generate better SPECT image. In-vivo datasets were utilized to evaluate the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main novelty is the combining projection-to-image and image-to-image networks. Transformer-based network was utilized in the projection-to-image step.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Description of the reference methods is missing. Do not know whether the comparision with other methods was fair or not; Based on the supplement figure 2, it seems the output of the projection-to-image is very blurred. Do not understand why the proposed method can be better than reference methods if information was lost during the projection-to-image step.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reasonable.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Describe the reference methods and abalation study in details; Further analysis of the intermediate output to check what’s the advantage of utilizing the projection-to-image network instead of utilizing MLEM output.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In-vivo datasets utilized in the work; Combination of projection-to-image and image-to-image networks.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #6

  • Please describe the contribution of the paper

    The authors have proposed a novel 3D transformer based dual domain (projection and image) network called TIP-net to directly reconstruct 3D SPECT volumes from 3D projections obtained from the GE Alcyone scanner. On human studies, validation was done with cath images, interpretation by a nuclear cardiologists, and cardiac deft quantification using an FDA cleared software.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Interesting work on directly reconstructing 3D SPECT images from 3D projection data obtained from a GE scanner.
    2. A good overview of the prior literature has been provided.
    3. The pipeline that is proposed in this paper is novel and clinically meaningful.
    4. Validation experiments were conducted on porcine studies, phantoms, and on human studies.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The number of studies that were used for this task seem limited (20 human, 2 phantom, and 8 porcine studies). Since the human studies were retrospectively acquired, were there any issues with the acquisition of more than 20 human subjects?
    2. In this work, given an input of 1-angle data (19 projections), and the training target was 4-angle reconstruction (19 projections x 4 = 76 projections), it is not clear if the pipeline was run 4 times to obtain reconstructions for each of the 4 angles. As the introduction section frames the problem as a few-view reconstruction problem, are the authors endeavoring to reconstruct a 3D SPECT volume with 1-angle projection data in contrast to using all 4-angle projection data? This is topic is not referred to again in the results/discussion section, and it would help clarify the role of the TIP-net better.
    3. What is the purpose of the 3D CNN 1 and 2 in the pipeline? These networks are simply introduced without their roles being explained.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Without the data (at least phantom/porcine studies), the method may not be reproducible despite the code being publicly available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The details about the training and testing are scattered through the different sections in the Methods. Consider combining the sentences into a single “Datasets + Implementation” section.
    2. What is the purpose of the 3D CNN 1 and 2 in the study? This has not been described by the authors. For example, in the P-Net block, once the slice-by-slice 3D reconstruction has been generated, what purpose does the 3D CNN 1 solve?
    3. As the introduction section frames the problem as a few-view reconstruction problem, are the authors endeavoring to reconstruct a 3D SPECT volume with 1-angle projection data in contrast to using all 4-angle projection data? This is topic is not referred to again in the results/discussion section, and it would help clarify the role of the TIP-net better.
    4. While the figures of stresses from the human studies are appreciated, it would have been useful to know the performance of the model in reconstructing a 3D SPECT volume when the patient has no cardiac defects. Consider adding this to the supplementary material section.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes a novel approach to reconstruct 3D SPECT volumes from 1-angle (19 projections) data. The experimental design is sound, despite the lack of data used for training the model.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviewers are aligned that the paper presents well-designed experiments and a novel network architecture and transformer networks. I support the overall rating of provisional acceptance. The points raised by the reviewers should be incorporated in the final version of the paper.




Author Feedback

Reviewer 2: All the comparison networks were trained using the same dataset with the same training procedure to ensure a fair comparison. For the blurred output from P-net. The proposed method combines information obtained from both P-net and I-net. Even though the output from P-net is blurred, the I-net combines both outputs from P-net and MLEM for a better reconstruction. Because MLEM may also results in a potential loss of information, the intuition of the proposed method is to combine reconstruction from a deep network (i.e., P-net) and reconstruction from iterative algorithms (i.e., MLEM) for improved results. For the advantage of utilizing the projection-to-image network. The Dual-3D-CNN is a network that shares the same structure as the proposed method but without any projection-related input. Presented results in the paper showed superior results over Dual-3D-CNN, especially in terms of defect visualizations in the human studies.

Reviewer 5: Reviewer 5 mentioned that the proposed method is solely relied on slice-by-slide 2D data. We would like to clarify this point. All the 2D slices in the P-net were reconstructed from the entire 3D projection volume. Therefore, the reconstructions of 2D slices in the P-net are relied on 3D information from the volumetric projection data. We may simply project the output from the transformer network to an entire 3D volumetric data (just like AUTOMAP). But it results in a memory issue. The goal of reconstructing images in a slice-by-slice manner in the P-net is to alleviate the memory burden. The I-net combines both reconstructions from MLEM (iterative few-view 3D recon) and reconstructions from P-net (network-based recon using 3D information) to produce the final reconstructions. For the limited amount of training data, we mentioned in the paper that we pretrained the network with 250 volumes of simulated XCAT phantom. We then fine-tuned the network with the limited real data. During fine-tuning, we used 1 study as testing set, and other studies for fine-tuning, and a few additional simulated XCAT phantom as validation (not used in pretraining). We repeated this process 10 times (8 pigs and 2 physical phantoms) to obtained testing results for all the real studies. By doing so, we make sure the testing real study is never seen by the network. For comparison with other methods like AUTOMAP. The reason why previously proposed methods cannot be applied directly for our problem is already described in the paper.

Reviewer 6: For the reconstruction pipeline. The scanner was not designed for multi-angle acquisition. The goal of this paper is to improve stationary reconstruction results using a neural network (1-angle to 4-angle) since multi-angle acquisition is difficult in reality for this scanner. For limited data availability. The use of more patient data for this research needs to be approved by the university/hospital. For the roles of 3D_CNNs. In the p-net, to alleviate memory burden, we tried to reconstruct the 3D volume slice by slice (using 3D projection data). We introduced 3D-CNN-1 to remove inconsistency between slices. 3D-CNN-2 was introduced to combine information from iterative recon (few-view 3D MLEM) and network recon (P-net). For patients without cardiac defects. For normal patients, we showed that the proposed method maintains uniformity of the myocardium without increasing the measured defect sizes.

Reviewer 7: For the system matrix S. Because S is a matrix with mostly zeros, we performed sparse matrix multiplication to save time and memory. We found that by incorporating adversarial loss, the defect contrast improves, especially on human studies. The intuition of using a transformer is because of its global attention mechanism. So that the network can observe information in the entire volumetric 3D projection data. If we replace transformer with other architecture, the P-net will have smaller receptive field, resulting in sub-optimal results.



back to top