Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yiqun Lin, Zhongjin Luo, Wei Zhao, Xiaomeng Li

Abstract

Sparse-view cone-beam CT (CBCT) reconstruction is an important direction to reduce radiation dose and benefit clinical applications. Previous voxel-based generation methods represent the CT as discrete voxels, resulting in high memory requirements and limited spatial resolution due to the use of 3D decoders. In this paper, we formulate the CT volume as a continuous intensity field and develop a novel DIF-Net to perform high-quality CBCT reconstruction from extremely sparse (≤10) projection views at an ultrafast speed. The intensity field of a CT can be regarded as a continuous function of 3D spatial points. Therefore, the reconstruction can be reformulated as regressing the intensity value of an arbitrary 3D point from given sparse projections. Specifically, for a point, DIF-Net extracts its view-specific features from different 2D projection views. These features are subsequently aggregated by a fusion module for intensity estimation. Notably, thousands of points can be processed in parallel to improve efficiency during training and testing. In practice, we collect a knee CBCT dataset to train and evaluate DIF-Net. Extensive experiments show that our approach can reconstruct CBCT with high image quality and high spatial resolution from extremely sparse projection views within 1.6 seconds, significantly outperforming state-of-the-art methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43999-5_2

SharedIt: https://rdcu.be/dnwjc

Link to the code repository

https://github.com/xmed-lab/DIF-Net

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a DIF-Net for extremely sparse-view CBCT reconstruction. The DIF-Net extracts features directly from the projection data and aggregates them according to the CT geometry. Experiments on a knee CBCT dataset show its good performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    –The proposed methods operate directly on the 2D projection images, which is efficient and suitable for the extremely sparse-view CBCT reconstruction problem. –The authors borrow the idea from NeRF-based methods and model a CT volume as a continuous intensity field. Thus, the training and testing can be performed in a voxel-by-voxel way, which is memory-efficient and flexible. It also allows partial image reconstruction and arbitrary resolution reconstruction. –Experiment showed that reasonable reconstruction results can be obtained with less than 10 projection views.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    –The authors overemphasized the relation between the proposed DIF-Net and the NeRF-based methods and it could be confusing. In my opinion, the DIF-Net can be regarded as a data-driven approach to replace the projection filtering operation of the traditional FBP algorithm. On the other hand, the NeRF-based methods (e.g., the NAF method) resemble an iterative reconstruction algorithm, which fits a neural network to a CT volume using data from only one scan. They are two ways to achieve CT image reconstruction and have their own advantages. Although these two types of methods share the similar idea of image representation, the differences should be also clarified. –Based on the analysis above, the DIF-Net is also similar to the image deblurring/denoising method (e.g. FBPConvNet). It will be interesting to see if DIF-Net outperforms these well-studied methods under the setting of extremely sparse-view scans. –The SART method also achieved acceptable results. Why did not the authors involve the compressed sensing based methods for comparison? –The experiment results of the PatRecon method are misleading. The authors did not specify the training strategy used for it. In Section 3.2, the authors mentioned that ‘PatRecon produces meaningless results because it is designed for patient-specific reconstruction and should be trained on several CT scans of the same patient…’. According to the original paper, the PatRecon method is adaptive to anatomy changes to some extent. Did the authors train the model on a suitable dataset and performed the same data augmentation strategy? Otherwise, the PatRecon should not be involved as a comparison method.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have described their method clearly.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    –The authors should clarify the differences between the proposed method and the NeRF-based methods. –The selection of the comparison methods may be confusing. Please refer to the ‘‘main weakness’’ part for detailed comments.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is novel and it is suitable for extremely sparse-view CBCT problem. The experimental results demonstrate the effectiveness of the method.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors have addressed most of the concerns with additional experiments and explanations. It is a good study but the writing should be improved. I will not change my rating based on the reasons above.



Review #3

  • Please describe the contribution of the paper

    This work presents a Deep Intensity Field Network (DIF-Net) to reconstruct 3D CT volume from extremely sparse 2D projections. On a clinical dataset, the proposed method achieves higher accuracy and speed compared to traditional model-based methods and unsupervised INR-based methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper attempts to address the crucial subject of extremely sparse-view CBCT reconstruction.
    2. The application of INR for CBCT reconstruction is an intriguing approach.
    3. The proposed method outperforms conventional model-based methods and unsupervised INR-based methods.
    4. This paper is well-written and easy to understand.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The projection encoding and view-specific querying modules in the method are not novel. Similar designs were proposed in pixelNeRF [1] in 2021.
    2. The experiments in this paper are not convincing because the proposed method follows a supervised training paradigm, while all the baselines are not supervised. Specifically, FDK and SART are conventional model-based algorithms, NAF is an unsupervised INR-based method, and PatRecon is patient-specific, in which all training data is from the same patient.
    3. The authors claim that the proposed method is superior in reconstruction efficiency and report related results such as test time, parameters, and memory footprint in Table 2. However, the training time is not provided. As a supervised method, the training consumption cannot be ignored.
    4. All experiments are conducted on the same dataset. It is common for supervised methods to suffer from out-of-distribution (OOD) problems. Thus, more validations on various datasets are needed.
    5. The current available literature on INR-based CTCB reconstruction works is limited. Some works such as SNAF [2], NeRP [3], and IntraTomo [4] may not have received as much attention in the literature. Please note that I am not an author of any of these works.

    [1] Yu, Alex, et al. “pixelnerf: Neural radiance fields from one or few images.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. [2] Fang, Yu, et al. “SNAF: Sparse-view CBCT Reconstruction with Neural Attenuation Fields.” arXiv preprint arXiv:2211.17048 (2022). [3] Shen, Liyue, John Pauly, and Lei Xing. “NeRP: implicit neural representation learning with prior embedding for sparsely sampled image reconstruction.” IEEE Transactions on Neural Networks and Learning Systems (2022). [4] Zang, Guangming, et al. “IntraTomo: self-supervised learning-based tomography via sinogram synthesis and prediction.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have provided a detailed outline of their experimental procedure and parameters.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    As mentioned in the weakness section, it would be more convincing if the authors could compare the proposed method with other supervised methods.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper makes a good attempt to apply INR to the CBCT reconstruction problem. However, the experiments are unfair and not convincing.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    3

  • [Post rebuttal] Please justify your decision

    I appreciate the authors’ thorough rebuttal and the inclusion of supplementary experiments. However, in my opinion, the current version of the paper is not yet prepared to be considered for MICCAI. Therefore, I maintain my original score.



Review #4

  • Please describe the contribution of the paper

    Based on similar ideas as NeRF, the study developed a method that maps CBCT volumes as implicit representations. Instead of using an iterative scheme to update the neural network weightings through projection matching, the study proposes to directly extract the feature values from cone-beam projections to regress the value of the queried CBCT coordinate points with minimal time overhead.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The idea of using extracted features from cone-beam projections to regress the CBCT voxel values is novel and can greatly accelerate the speed of NeRF-based methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The fusion & regression-based method led to the loss of CBCT geometry information during inference, and is unlikely to be accurate for scenarios with changing projection scan angles between training and testing.
    2. The method is proposed to be population-based, but should be more appropriate as a patient-specific model due to the high uncertainty of the fusion & regression step, with compounding factors of extremely few projections and varying anatomies among patients. The knee reconstruction example can be over-simplified, and it is uncertain if the model just memorized information from very similar anatomies.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    N.A.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. For the method to be potentially useful, the study needs to validate that it can work for different training/testing projection scan angles, or to validate that it can truly serve as a population-based model that can handle training/testing anatomy differences.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The study presents a new idea to improve NeRF-based image reconstruction methods, but the robustness and real-world applicability of such a method are not clear. The advantage and accuracy of the method are over-exaggerated.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This work introduces a Deep Intensity Field Network (DIF-Net) for efficient, sparse-view CBCT reconstruction leveraging continuous intensity field modeling. One reviewer supports this work, but there are several important questions raised as well. Therefore the authors should be given a chance to address the major points before this work can be accepted. Specifically, please consider the weakness points raised by the reviewers summarized below.

    • Overemphasis on the relationship with NeRF-based methods, causing potential confusion.
    • Comparison with inappropriate methods and lack of important comparisons such as compressed sensing-based methods and other image deblurring/denoising methods.
    • Need for validation on different datasets and potential patient-specific model limitations due to varying anatomies.




Author Feedback

We thank the reviewers for their valuable feedback. Overall, reviewers consider that the proposed method is novel (R1, R4), intriguing (R3), and efficient (R1, R4), and also appreciate the good performance (R1, R3) and fast speed (R3, R4). The major concerns are method comparison (Q3), robustness analysis (Q4/Q5), and novelty (Q6).

– Below we first clarify important points summarized by the meta-reviewer.

[Q1-R1] The relationship with NeRF-based methods. Similarity: we share the same data representation (implicit neural representation) of CT. Differences: self-supervised NeRF-based methods (e.g., NAF, NeRP, IntroTomo) are robust to unseen samples, require no additional training data, and perform well with enough (20~50) views; however, they perform worse with limited views and need to be optimized per patient, resulting in inefficient reconstruction. Our DIF-Net, as a supervised data-driven approach, performs well with even limited (<10) views and benefits from fast reconstruction speed. We will clarify this in the revised version for better understanding.

[Q2-R1&R3] Inappropriate comparison with PatRecon. We will remove the comparison with PatRecon as suggested by R1 & R3.

[Q3-R1&R3] Comparison with supervised works on compressed sensing, deblurring/denoising. Due to the increase in dimensionality (2D to 3D), such methods should be equipped with 3D conv/deconvs for a dense prediction when extended to CBCT reconstruction. This leads to extremely high computational costs and low resolution (≤64^3). For a fair comparison (10-view, 256^3 resolution), a possible solution is to use FDK to obtain an initial result and use 2D methods for slice-wise denoising. Experiments (PSNR/SSIM/Time) show DIF-Net (29.3/.92/1.6s) outperforms all compared supervised methods: CSGAN [1] - 25.7/.76/1.1s; CSNet [2] - 27.3/.85/1.2s; FBPConvNet [3] - 26.7/.84/1.7s; conditional DDPM [4] - 25.8/.74/30min.

[1] Task-aware compressed sensing with generative adversarial networks. AAAI 2018. [2] Image compressed sensing using convolutional neural network. TIP 2019. [3] Deep convolutional neural network for inverse problems in imaging. TIP 2017. [4] Denoising diffusion probabilistic models. NeurIPS 2020.

[Q4-R3&R4] Experiments on different datasets. The knee dataset is not simple, as the data is collected from different ages/genders and the scanning ranges of knees vary greatly among patients. To further demonstrate the generalization of our DIF-Net, we conducted experiments (PSNR/SSIM, 10-view, 256^3) on two datasets of more complex anatomies: LUNA (luna CT) - 28.6/.92; MSD-Pancreas - 28.3/.91. For patient-specific reconstruction, our DIF-Net can be pre-trained to learn anatomy-specific prior and then fine-tuned on data from the patient to learn patient-specific differences and further improve the accuracy.

– Then, we address other concerns raised by reviewers.

[Q5-R3] Novelty. Although the querying operation is not new, our contributions mainly focus on proposing a novel and technical formulation for this challenging task, which opens a door for effective and efficient CBCT reconstruction and encourages more and better methods in the future.

[Q6-R4] Robustness to varying projection parameters. DIF-Net can be equipped with max-pooling for cross-view fusion and trained with varying projection parameters to improve the robustness in real scenarios. The following experiments have validated the above claim. Training with random 6~10 views, random start angle, and fixed angle span (30/26/23/20/18 degrees for 6/7/8/9/10-view). Testing of a single trained model with different projection parameters: [PSNR/SSIM, 256^3, #view - angle_1 | angle_1 | angle_3] 6-view - 27.1/.89 | 27.1/.89 | 27.0/.88; 8-view - 28.2/.90 | 28.1/.90 | 28.2/.90; 10-view - 29.1/.92 | 29.2/.92 | 29.1/.92.

[Q7-R3] Other concerns. Training time: 18/15/10h for the training with 10/8/6-view. Literature review: we will cite suggested papers in the revised version.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors propose a Deep Intensity Field Network (DIF-Net) that extracts features directly from projection data for efficient and accurate reconstruction. There seems to be a consensus among the reviewers about the novelty and potential of the proposed method. Discrepancies exist primarily in the assessment of the paper’s experimental design and the comparisons made. The authors have addressed most of the reviewers’ concerns effectively. They clarified the relationship between their DIF-Net and NeRF-based methods and promised to remove inappropriate comparisons. Additional comparisons with supervised methods were provided, supporting the superiority of DIF-Net. Moreover, they gave more details about their experiments on different datasets and how they plan to improve robustness. However, the point about the need to report training time has not been completely addressed, although the authors provided some information in their rebuttal.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    A comparable approach was introduced in pixelNeRF, which addresses the sparse-view issue as mentioned by the author. Although the time issue is addressed by the author, there are still notable weaknesses in the experiment section that remain unresolved. All experiments are exclusively conducted on a single knee joint dataset, which is anatomically simple compared to other CBCT datasets. Furthermore, the question of supervised versus unsupervised learning is unanswered.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work adapts the implicit neural representations for efficient, sparse view CBCT reconstruction. Rather than using self-supervised NeRF based methods which do not perform well with limited views (< 10), the proposed DIF-Net used a supervised approach to cater to limited views with fast reconstruction speed. The authors have addressed all the questions from the meta-reviewer and the reviewers with additional experiments, including comparisons with other supervised approaches, performance on two other datasets (LUNA and MSD-Pancreas), and performance of DIF-Net on varying projection parameters. With all the concerns addressed, I recommend acceptance.



back to top