Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Sihwa Park, Seongjun Kim, In-Seok Song, Seung Jun Baek

Abstract

Panoramic radiography is a widely used imaging modality in dental practice and research. However, it only provides flattened 2D images, which limits the detailed assessment of dental structures. In this paper, we propose Occudent, a framework for 3D teeth reconstruction from panoramic radiographs using neural implicit functions, which, to the best of our knowledge, is the first work to do so. For a given point in 3D space, the implicit function estimates whether the point is occupied by a tooth, and thus implicitly determines the boundaries of 3D tooth shapes. Firstly, Occudent applies multi-label segmentation to the input panoramic radiograph. Next, tooth shape embeddings as well as tooth class embeddings are generated from the segmentation outputs, which are fed to the reconstruction network. A novel module called Conditional eXcitation (CX) is proposed in order to effectively incorporate the combined shape and class embeddings into the implicit function. The performance of Occudent is evaluated using both quantitative and qualitative measures. Importantly, Occudent is trained and validated with actual panoramic radiographs as input, distinct from recent works which used synthesized images. Experiments demonstrate the superiority of Occudent over state-of-the-art methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43999-5_36

SharedIt: https://rdcu.be/dnwwP

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    The authors proposed to use neural implicit function for 3D teeth reconstruction, and designed a novel strategies to fuse tooth class and shape information into implicit functions. The model performance is good.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Applying implicit neural representation for 3D tooth reconstruction is novel.
    2. A segmentation network is used to extract individual teeth information.
    3. The proposed method has potential in real world application, despite its clinical use is not significance now without further evaluation.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The code is not provided.
    2. The method is built upon the training segmentation label.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code is not provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    (1) Segmentation labels for each tooth is prerequisite in this method. I would like to see more discussion on this matter, especially on the conditions where such training labels are not available. Besides, it would be good to make clear if compared methods use segmentation label.

    (2) Despite the proposed method achieved the optimal results compared with other methods. The IoU is still relatively low (around 0.65). Is it worthy to tolerate such inaccuracy for imaging expense or dose reduction.

    (3) It would be great to provide some discussion on the possible generalization of this method to broader medical imaging applications.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The presentation is clear and easy to follow. The idea is novel. The experimental results are good. However, the model performance is still not suitable for real world use.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    The paper presents a method to generate 3D teeth shape from a single panonamic radiograph. The method uses a multi-class segmentation network to segment the teeth into 32 classes and then use the neural implicit functions framework to generate the 3D shape. The proposed method outperforms other state-of-the-art methods in both subjective and objective metrics.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This is the first work that uses neural implicit function for 3D teeth reconstruction using a single panoramic radiograph. The proposed framework consists of the segmentation network and the 3D reconstruction network. The segmentation network is the same as the multi-label segmentation network in [17], thus the novelty is limited. The 3D teeth reconstruction network is based on neural implicit representation, used in the occupancy network [15] with several modifications: (a). leveraging the segmentation tooth patch and tooth class to generate the conditional vector, used in the conditional eXcitation (CX) network, (b). the CX network which uses the conditional vector for excitation derivation, thus separating conditioning from batch normalization. The proposed network outperforms other state-of-the-art methods in both subjective and objective metrics.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The proposed Occudent network uses 32 classes, based on the number of maximum number of teeth that a human has. However, these 32 teeths belong to four types of teeth: premolar, molar, canine and incisor. What is the performance of the algorithm if only 4 classes are used?

    Given a panoramic radiograph, the four types of teeth can be determined based on its location. How can one use this spatial information to improve the classification and 3D reconstruction network?

    How does the 3D reconstruction algorithm perform for overlapping teeth?

    There are 32 teeth, so the size of each teeth is around 256/4 x 768/323/5 = 64 x 24.6 ~ 64 x 16. This is based on the occupancy of the teeth area in the PX image. Is it enough for classification?

    What is the output size? Is it the same as that of CBCT? If not, how to compute the metric?

    Even though it is an interesting and challenging research problem to reconstruct 3D teeth structure using panoramic radiography, the current result is not good enough for clinical usage such as producing model for retainers or invisalign treatment.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Dataset is not available which limits the reproducibility of the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    From the Ablation study in the Supplementary materials, it seems that Tooth class (with or without) has more significant improvement comparing to CBN vs. CX. There are only four types of teeth: Premolar, Molar, Canine and Incisor. What is the performance of the algorithm if one only classify to these four types?

    There is no performance analysis on the segmentation network.

    Please see the comments in Section 6 on Weakness of the paper for more detailed comments and suggestions.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall improvement on both objective and subjective metric. The proposed approach yields 3D reconstruction with better details.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper has proposed a new way for 3D tooth reconstruction from 2D panoramic image.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper use real panoramic image in the experiments.
    2. The paper shows improvement compared with previous work.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The performance of X2Teeth listed in this paper is much lower than the original paper, 0.59 vs 0.68. Although these use different datasets, the author should at least show some explanation or visualized comparison to explain the gap.

    2. The experiments are mainly conducted on cases with healthy teeth conditions. However, there could be missing teeth (like older people) or abundant teeth (like children) in most clinical applications. And there could be cases where part of the tooth, like the tooth root/crown, is damaged. However, the current method seems not be able to solve this problem.

    3. Oral-3D is the first work to explore 3D tooth reconstruction from the panoramic X-ray, and the model is also evaluated on real images. An important reason for using simulation data is that the paired images are difficult to collect, as the patient would not take the PX and CBCT at the same time, especially for diseased cases. The author should add more background information in the introduction part and also give a description of how the reconstructed image and the ground truth are aligned since real panoramic X-ray images are used.

    4. As the model is not trained end-to-end, where teeth segmentation and reconstruction are learned individually. So how much will the performance of the segmentation model affect the reconstruction model? What is the performance of the segmentation model? How is the paired data (tooth class and patch) used in training the reconstruction model?

    5. And what are the loss functions used in training these two models?

    6. Could the method reconstruct the density distribution, instead of just the teeth shape, from the X-ray image? The density details are important in clinical applications.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I think the work could be reproduced given the dataset. Well it would better to release the segmentation model.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I expect to see the method conducted on abnormal cases, such as patients with more/less teeth.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper has shown a new approach to solve 3D teeth reconstruction and shows improvement against existing methods.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes a deep learning based method to reconstruct 3D teeth shapes from a single panoramic radiograph using a multi-class segmentation network and neural implicit functions. The proposed method outperforms other state-of-the-art methods in both subjective and objective metrics, making it a promising tool for 3D teeth reconstruction. However, several limitations and questions should be addressed. For instance, the segmentation network is not evaluated, and it is unclear how the performance of the segmentation model affects the reconstruction model. It is also not clear what loss functions are used in training. The method’s performance is limited to cases with healthy teeth conditions. Additionally, the method requires segmentation labels for each tooth, which may not be available in all cases. It is also unclear how the method performs for overlapping teeth or cases with missing or damaged teeth. Authors are encourage to provide clarifications regarding the aforementioned concerns in the camera ready version.




Author Feedback

Thanks for the valuable comments and suggestions. Below we respond to the comments. Meta-Reviewer & Reviewer #3

The segmentation network is not evaluated, and it is unclear how the performance of the segmentation model affects the reconstruction model.

A. The intersection over union (IoU) of our segmentation model was 81.46%. Since we used a pre-trained segmentation model for training the reconstruction model, it is difficult to evaluate how the performances of the models are correlated. However, a high-quality segmentation model is indeed essential for achieving accurate reconstruction of 3D tooth shapes. We will report the IoU of the segmentation model in the revision.

Meta-Reviewer & Reviewer #3

It is also not clear what loss functions are used in training

A. The loss function for the segmentation model is described in Section 3 “Experiments - Implementation Details.” As for the reconstruction model, we utilized binary cross entropy (BCE) loss to determine if a point is occupied (assigned a value of 1) or not (assigned a value of 0).

Meta-Reviewer & Reviewer #1 & Reviewer #3

It is also unclear how the method performs for overlapping teeth or cases with missing or damaged teeth.

A. Regarding cases of overlapping and missing teeth, the segmentation model is designed to handle such scenarios by classifying the panoramic image into 32 tooth classes. Each channel of the model’s output corresponds to a specific tooth class. Consequently, overlapping teeth do not pose a problem, because a single input image pixel can be classified into multiple tooth classes. Similarly, in cases of missing teeth, if the segmentation output for a particular channel is empty, we can exclude those channels from being input to the reconstruction model. This is why we consider our segmentation task as a multi-label classification with 32 classes. In addition, the training dataset includes patients with metal tooth crowns, and our model exhibits a degree of robustness toward damaged teeth. For future work, we are actively collecting diverse cases to further enhance the model’s performance in handling teeth with various damages and conditions.

Reviewer #1

There are only four types of teeth: Premolar, Molar, Canine, and Incisor. What is the performance of the algorithm if one only classifies to these four types?

A. It is an interesting idea to consider the four types of teeth. Below we provide the IoU (%) results for 4 different types of teeth. | method | incisor | canine | premolar | molar | | X2Teeth | 52.33 | 57.13 | 59.99 | 64.93 | | OccNet | 57.40 | 55.66 | 62.55 | 64.99 | | Occudent (Ours) | 62.14 | 63.43 | 66.18 | 67.24|

Reviewer #1

What is the output size? Is it the same as that of CBCT? If not, how to compute the metric?

A. Our model does not have a specific output size, because it estimates an implicit function. However, the training was based on points within a unit cube of size [-0.5, 0.5]^3. Therefore, one could consider the output size to be a unit cube, which differs from the size of CBCT. As mentioned in the paper, we trained all the models based on the unit cube. For instance, we voxelized the unit cube to a size of 128^3 for training the R2N2 model. As a result, the output is consistent with the normal cube size, allowing us to compute metrics using the same framework.

Reviewer #3

The performance of X2Teeth listed in this paper is much lower than the original paper, 0.59 vs 0.68. Although these use different datasets, the author should at least show some explanation or visualized comparison to explain the gap.

A. As stated in the paper, we believe this is because X2Teeth used synthesized PX images instead of real-world PX images which have more artifacts and noises, and may exhibit significant variation across patients. X2Teeth has not been tested under these conditions, and perhaps relying solely on an encoder-decoder approach for 3D reconstruction may not adequately capture the variations observed in real PX images.



back to top