Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

SaeHyun Kim, In-Seok Song, Seung Jun Baek

Abstract

Accurate segmentation of teeth is crucial for effective treatment planning. Previous approaches attempted to segment a tooth as a whole, which has limitations because most treatments involve internal structures of teeth. In this paper, we propose fully automated segmentation of internal tooth structure, including enamel, dentin, and pulp, which is the first attempt to the best of our knowledge. The task is challenging, because a total of 96 classes of tooth structures need to be identified from a CBCT image. We design a 3-stage process of coarse-to-fine segmentation of tooth structures without compromising the original resolution. We propose Dual-Hierarchy U-Net (DHU-Net) in order to capture hierarchical structures of teeth, and to effectively fuse encoder and decoder features from higher and lower hierarchies. Experiments demonstrate that our method outperforms state-of-the-art methods in both tasks of segmenting the whole tooth and internal tooth structure.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_67

SharedIt: https://rdcu.be/dnwB1

Link to the code repository

https://github.com/Saeeeae/Internal-Tooth-Segmentation

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    This work proposes a 3-stage process to segment teeth in CBCT images taking the hierarchy of the structures into account: teeth – single tooth – internal structures (enamel, dentin, and pulp. Two out of these three stages are performed using a dual-hierarchy U-Net that fuses encoder and decoder features from higher and lower hierarchies.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method seems well tailored to this specific segmentation task, taking prior knowledge about the organization of the teeth into account in a straightforward way. The proposed method outperforms a classical 3D U-Net and Attention U-Net on all segmented structures.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • This seems like a rather complex method to achieve a small performance gain over the baselines.
    • Design choices are not fully clear: e.g., where exactly is the HFF module used in the DHU-Net design in Figure 3?
    • A double U-Net (hierarchical or not) has much more trainable parameters compared to a single U-Net so a direct comparison against 3D U-Net and Attention U-Net in Table 1 is not fair.
    • Applicability of the method is limited to a specific use case: the method is specifically designed to segment teeth in CBCT scans.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method is tested on a single internal dataset of 3D dental CBCT images collected from a single institution. Neither the data nor the code have been made publicly available and it has not been validated on external or public data, which hinders the reproducibility of the method.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • It would be insightful if the authors could comment on the clinical relevance. Which DSC/HD95 are good enough for the intended use? What was the intra- and interobserver variability of the experts who manually labelled the CBCT images? Is there still room for improvement in the methodology or is the segmentation performance limited by the amount of data and the quality of the ground truth labels?
    • Have the authors tried to compare against double 3D U-Nets and double Attention U-Nets in each segmentation stage? It would also be interesting to take other relevant indicators into account such as training & inference time. And have the authors observed any overfitting?
    • DSC and Jaccard are both overlap-based metrics that basically measure the same thing, so it is redundant to report both.
    • What about hyperparameter optimalization?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My main reason for recommending a reject are the arbitrary choices in the proposed U-Net architecture in combination with the rather specific use case and the relatively small improvement in segmentation performance over the benchmarks.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors have answered many of the open questions, and I appreciate the additional comparisons with a double UNet and a double attention Unet. DHU-Net achieved higher DSC, demonstrating the added value of a redesign of the architecture over “simple” cascading of UNets. The reason why I recommend a weak accept and not a strong accept is that the reproducability and applicability of the method are still limited because of the small internal dataset that is quite particular and will not be published.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a cascaded and staged CBCT tooth segmentation algorithm that can segment the interior of teeth and achieves good performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper proposes a cascaded network architecture and, for the first time, presents an end-to-end internal tooth segmentation algorithm.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    A small dataset may lead to generalization issues. In addition, the evaluation metrics were not reported on a public dataset, which may weaken the objectivity of the evaluation results.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It has strong reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The Hierarchical Feature Fusion (HFF) module lacks ablation experiments. In Formula 1, the values of hyperparameters should be given a determined set of values in subsequent publications. The impact of different hyperparameters on the experimental results should also be reported.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes a method for internal tooth segmentation, which is the first attempt to segment the inside of teeth. The method achieves good results and has certain innovation.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    The main contributions of this paper are (1) a novel hierarchical 3 stage method to segment teeth and its internal structures i.e. enamel, dentin and pulp from CBCT scans. (2) a hierarchical feature fusion module that allows for propagation of important channel and spatial features (3)extensive experiments to show the superiority of the method by comparing with other methods and performing ablation of the individual components.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of this paper are (1) introducing a novel hierarchical 3 stage method to segment teeth and its internal structures i.e. enamel, dentin and pulp from CBCT scans. The first stage extracts the tooth region, the second stage extracts the tooth patch and the third stage extracts the internal tooth structures. This automated segmentation of internal tooth structures is the first attempt. (2) a hierarchical feature fusion module that fuses features from the parent network’s encoder and the child network’s encoder and decoder features. These combined features are pooled using the method called MLPMixer.(3)Comparison of the two stage and three stage baselines and other methods to show the improved results by the proposed methods. (4) clearly mentioning implementation details (though they are provided in the supplementary details.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The ablation study/implementation details have been provided as part of the supplementary materials. But it would have been nice to have them as part of the paper.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors mention in the reproducibility checklist that code/pretrained models will be made public.The hyper-parameters,batch size,learning rate, epochs,gpu etc are mentioned in the supplementary material. Hence it is reasonable to expect that the code will be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Is the standard deviation in Table 2 across different runs or different folds? The authors mention that the method uses more resources and hence having a comparison would be nice. The ablation study shows that the introduction of the FTM loss slightly decreases the performance in enamel segmentation. It would be nice of the authors could explain this decrease. What are the values of lambda1 and lambda2?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Mainly because the method has good novelty. The implementation details are provided in the supplementary materials. The paper is generally well written.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The work describes a 3 stage approach to segment teeth in CBCT images, from binary segmentation of all teeth on lower resolution, over segmentation of individual teeth, to finally per teeth segmentation into 3 classes: dentin, enamel, pulp.

    Two reviewers vote in favor of this work, since it is a seemingly novel application area, i.e. segmentation of internal tooth structures, which has clinical relevance in planning of dental procedures. The work is well motivated and the architectural design is reasonable and well tailored to the specific task, however, it is a straight-forward pipeline derived from similar application areas, with no methodological surprises.

    Especially one reviewer mentions a number of weaknesses, which the meta reviewer agrees to and which have to be addressed in the rebuttal:

    • What is the clinical relevance of the results in performance, which performance is needed, and which performance is achieved by different raters to get a context?
    • Confusion about some design choices and about hyperparameters like lambda1 and lambda2.
    • Unfair comparison of the proposed method w.r.t. the baseline methods that have significantly less parameters.
    • The applicability is very limited, which is ok according to MICCAI guidelines if there is a very extensive evaluation of the method, however, there is only a single dataset from a single institution, and it is also not reproducible since the dataset is not intended to be published.




Author Feedback

We thank reviewers (R2,R3,R4) and meta-reviewer (MR) for valuable feedback. Below we provide responses to weaknesses (W) and comments (C). 1.(MR1,R2.C1) Evaluation metrics and clinical relevance: The internal tooth segmentation from CBCT is essential for precise diagnoses of caries, tooth fractures, periapical lesions, and conditions of root canals. The DSC metric is most relevant to clinical uses for identifying the boundaries of internal structures like tooth pulp. 2.(MR2,R2.C4,R3.C2) How are hyper-parameters tuned? The optimal lambda_1 and lambda_2 in Eq.(1) are 2 and 5, respectively, which were found using grid search with cross-validation. We will include these values in the revision. 3.(MR3,R2.W3,R4.C2) Is it a fair comparison with U-Net and Attention UNet? We believe performance improvement is more important than model size. Our model has 44M trainable parameters, which is comparable to UNet(33M) and Attention Unet(35.5M). Although we use two cascaded UNets, the model size does not double, because we mixed transposed convolution and interpolation in the decoder part of DHU-Net to reduce model size. Importantly, simply cascading networks does not improve performance: see Response 4. 4.(R2.C2) Comparisons with double 3D U-Net and double Attention U-Net: We ran experiments, and the DSC of double UNet and double attention UNet were (84.41/86.21/77.1)% and (83.68/86.65/77.43)% respectively for (enamel/dentin/pulp). DHU-Net achieved higher DSC given by (85.65/88.05/78.58)%. This shows that a simple cascading of UNets is not sufficient, but a careful design is needed. 5.(MR4,R2.W4) Applicability of the method is limited to a specific use case: Our framework is not necessarily limited to tooth segmentation. Human anatomy is hierarchical by nature, and most organs have internal structures. Our 3-stage approach is reasonable: 1) locate the ROI as a whole; 2) segment external structure; 3) segment internal structures, which is applicable to segmenting other organs. Besides, internal tooth segmentation is a challenging problem, e.g., segmentation into 96 classes at the original resolution. Some detailed measures were necessary to deal with challenges specific to the problem. 6.(MR4) Reproducibility. The model uses a single, internal dataset: To our knowledge, there is no publicly available CBCT dataset with labels on internal tooth structures. All the cited works on CBCT tooth segmentation used internal datasets. It is straightforward to implement and validate our model for a given dataset. For better reproducibility, we will publish our code in the revision. 7.(MR2,R2.W2) Clarification on model components: There was a typo in Fig. 3(a). In Child Network, MFFM should be HFFM, HFF-Module depicted in Fig. 2 3(b). We apologize for the confusion and promise to correct it in the revision. 8.(R2.C1) Intra- and inter-observer variability: The intra- and inter-observer variability of two experts for (incisor/canine/premolar/molar) were reasonably high in ICC of (95.0/95.4/94.0/94.0)% and (94.8/95.4/93.7/93.9)% respectively. 9.(R2.C1) Is segmentation performance limited by the amount of data or quality of the GT labels? Both are important, but our model may benefit from more data samples containing diverse tooth shapes and conditions. 10.(R2.C2) Training, inference time, and over-fitting: We implemented early stopping to tackle overfitting by monitoring the validation loss. One epoch of training takes 1.5 min. in the 2nd and 1 min. in the 3rd stage. The inference time is 0.12 sec. 11.(R2.C3) Similarity of DSC and Jaccard as metrics: We agree that they are similar metrics. We included them because most of the prior works reported both. In the revision, we will move HD95 in supplementary to the main table, and Jaccard to supplementary.

  1. (R4.C3) Meaning of std in Table 2: We used 5-fold nested cross-validation repeated 10 times. Thus, the ’std’ represents deviations from repetitions.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After consideration of the author’s rebuttal, a number of unclear aspects were clarified. Clinical relevance of the performance results were put into context with inter-rater segmentation values, some hyperparameters were clarified, additional comparisons with more fairly designed cascaded segmentation methods have shown the benefit of the proposed approach, and constraints on the dataset used have been stated. Overall, reviewers now vote favorably of acceptance of the work and also in my opinion there might be some interest in the community regarding this work, despite the somewhat limited methodological contribution and restricted evaluation.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The major concern is addressed in the rebuttal, especially the additional comparisons with a double UNet and a double attention Unet to demonstrate the motivation of the proposed method.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I agree with the reviewers’ and the primary AC’ concern on the limited dataset to be tested. However, MICCAI does not have a strict requirement for algorithms to be tested in multiple datasets. In this case, the segmentation of teeth seems to be more complicated than segmentation of lesion, organ, etc. in other medical images. Therefore, it might be challenging to conduct that for many different datasets for a MICCAI paper. Based on my reading of the paper, I think the rebuttal has addressed most of the concerns. Therefore, I recommend an acceptance.



back to top