Authors

Ruyi Zha, Yanhao Zhang, Hongdong Li

Abstract

This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction (Cone Beam Computed Tomography) that requires no external training data. Specifically, the desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network. We synthesize projections discretely and train the network by minimizing the error between real and synthesized projections. A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details. This encoder outperforms the commonly used frequency-domain encoder in terms of having higher performance and efficiency, because it exploits the smoothness and sparsity of human organs. Experiments have been conducted on both human organ and phantom datasets. The proposed method achieves state-of-the-art accuracy and spends reasonably short computation time.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16446-0_42

SharedIt: https://rdcu.be/cVRTB

Link to the code repository

https://github.com/Ruyi-Zha/naf_cbct

Link to the dataset(s)

https://drive.google.com/drive/folders/1BJYR4a4iHpfFFOAdbEe5O_7Itt1nukJd

Reviews

Review #1

Please describe the contribution of the paper

This work adapts the Neural Radiance Fields idea for 3D reconstruction to the clinically relevant CBCT modality. A key element is a recently proposed network architecture which includes a learning-based encoder. Overall, the strategy resembles an iterative CT reconstruction, where the reconstructed 3D function is represented as a neural network. The method is evaluated on both simulated patient data and measured phantom data. It is compared against multiple state-of-the-art methods and performs competitively.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

To my knowledge the first paper which shows a variation of a CT reconstruction algorithm inspired by Neural Radiance Fields on measured data and in the practically CBCT acquisition geometry.

Very good presentation, easy to follow. Also very good graphics to illustrate the method.

Well prepared and fair evaluation against many competitive methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The clinical need for low-dose CT using sparse-view acquisition is not very pressing. Currently the motivation for this technology is driven by advances in reconstruction methods, becoming better in dealing with this setup. However, manufacturers currently prefer to use lower intensity measurements because the resulting denoising problem can be handled well with current algorithms. In addition, while radiation dose should be kept as low as possible as a general principle, current clinical applications are rarely hindered by dose considerations.

The simulation of CBCT datasets in the Chest and Jaw images seems to suffer from the so-called truncation artifact which is caused by projections not covering the entire acquisition volume which causes additional artifacts which are especially noticable in the FDK reconstruction. Also the FDK reconstruction seems to suffer from a bad handling of the sparse-view setup leading to a multiplicative constant distorting the attenuation values. This leads to a much worse image impression compared to e.g. SART which may simply be eliminated by scaling.

While the method is interesting and the resulting images show better numerical results a visual examination suggests that the reduction in conventional artifacts is paid for by introduction of a new class of artifact which is characterized by noisy boundaries and a novel noise appearance exclusive to this method. In addition no more distinctive features of the dataset become visible than in the classical methods. This is also in line with theoretical expectations since the method does not introduce an explicit prior over the data distribution by incorporating information from multiple CT scans. It rather resembles a novel variant of a reconstruction method which should be theoretically as limited as any other iterative reconstruction without regularization.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The method should be very well reproducible with the provided information and the code is promised to be released.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

As described above I assume for the Chest and Jaw datasets projections were truncated left and right. This may highlight another advantage of your method, but is confusing if not explicitly mentioned in the article. I would recommend resimulating the dataset with a wider detector or a different parameter setting to avoid this. Alternatively this could be briefly discussed.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

8
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This work is the first one I have seen which demonstrates the NERF concept applied to practically relevant CT with a measured dataset. The evaluation is very good and its very well presented. I admit to believe, low-dose CT using sparse-view CT is irrelevant. However, sparse-view CT also arises in motion-compensated CT and is therefore still a very relevant problem.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

This paper proposes to learn attenuation coefficients in CBCT via an implicit function parameterized by a fully-connected deep neural network. A learning based hash coding method is utilized to help the network capture high-frequency and edge details of human organs.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The organization and presentation is clear. The proposed neural attenuation field extends the application of implicit function for CBCT reconstruction. A tailored solution of hash encoder is adopted for position encoding for human organs. Experiments demonstrate promising results of this proposed method in terms of reconstruction quality and computation cost compared with iterative based methods and another baseline of implicit function based method.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The loss in eq. (3) is a standard reconstruction loss. The claim of self-supervision is incorrect in this paper. The proposed method is claimed to be useful for sparse-view CBCT reconstruction. However, only one setting of # of views is provided in the experiments. It is necessary to study the performance against different # of views to understand the sparsity of views.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The training of the hash encoder is not clear. Is the hash function trained jointly with the fully-connected deep neural network? The setting of the sampling quantity is not clear.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

see weakness
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper proposes a new application of using implicit function to model the attenuation coefficient in CBCT reconstruction. This paper also tailors the position encoding with a hash encoder to adapt to the scanned human organs. This paper also shows promising experimental results.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #4

Please describe the contribution of the paper

A self-supervised model for sparse CBCT reconstruction is proposed.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

As claimed by the authors, no external data or priors is required using the proposed self-supervised model.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Comparison results did not significantly demonstrate the superior performance of the proposed method. Network structure and training process are not clearly stated.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors followed the ethical rules.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

1 The authors are expected to provide details on the network architecture. 2 The performance of the method may be further improved by fine tuning the parameters within the network. The current incremental image quality may not support the statement of ‘clinical use’. 3 Computational efficiency is not provided.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

As sparse reconstruction is a popular topic in CT reconstruction, the image quality is the major concern using various schemes. The authors need to demonstrate the superior image quality prior to the complicated network design.
Number of papers in your stack

6
What is the ranking of this paper in your review stack?

5
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper has sufficient originality related to using neural attenuation fields for sparse-view CBCT reconstruction. Reviewers think that the paper has very good presentation and is easy to follow. Graphics to illustrate the method is well-done, and the evaluation against many competitive methods is fair and thorough. However, the paper does have some weaknesses, including incorrect claim of self-supervision, comparison results did not significantly demonstrate the superior performance of the proposed method, and network structure and training process are not clearly stated. We recommend accepting this paper for publication after considering all input.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

7

Author Feedback

We thank Reviewers R1, R2, R3 and Meta-Reviewer (M) for their valuable feedback. All comments were carefully considered and will be reflected in the final version.

R1: sparse-view CBCT: Our work tries to introduce a new deep learning framework for the computed tomography problem. In this paper, we choose the sparse-view CBCT as a topic (though the clinical need may not be pressing) and we believe that our method can be extended to other tomographic problems as well. R1: Truncation Artifact. After checking the dataset we did observe minor truncation artefacts in the chest and jaw dataset. We will slightly increase the detector size to solve the problem. The evaluation results and conclusions remain the same. R1: No regularization. Our method is a novel deep learning variant of the reconstruction framework without regularisation. We would like to prove its ability to handle various types of CT models without any external training data or prior knowledge. It is easy to tailor our method to specific CT applications by adding regularisation or other prior knowledge. Considering the page limit, we will leave it to future work. M & R2: Self-supervision. We use the term ‘self-supervision’ to highlight that our method does not require external data to pre-train the network such that it learns some prior knowledge. Instead, we only use the X-ray projections of the object itself to supervise the network training. To avoid misunderstanding, we will describe this clearly in our final version. R2: # Views. In CBCT settings, sparse-view usually means less than 100 views. We set # of views as 50 based on some previous paper. We will add a diagram to illustrate the performance of our method under different # of views. M, R2, & R4: Details on Network Structure, Hash Encoder, and Training Processes. Considering the page limit, we put the architecture and hyperparameter settings in the supplementary material. We will add more words in the final version to state our training process and network design. We will release our codes as well. M & R4: Performance. Our method achieves better results without fine-tuning and regularization (2-4dB greater than baseline methods). Since dB is measured in terms of power, 2-4dB can be considered a significant improvement. We believe that the performance can be further improved if we add regularization or pre-train the network with external datasets. Considering the page limit, we would like to leave it to future work. We will show more details about computational efficiency in the final version.

back to top

NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction