Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yizhou Chen, Xiaojun Chen

Abstract

Orthodontic treatment typically lasts for two years, and its outcome cannot be predicted intuitively in advance. In this paper, we propose a semantic-guided and knowledge-based generative framework to predict the visual outcome of orthodontic treatment from a single frontal photo. The framework involves four steps. Firstly, we perform tooth semantic segmentation and mouth cavity segmentation and extract category-specific teeth contours from frontal images. Secondly, we deform the established tooth-row templates to match the projected contours with the detected ones to reconstruct 3D teeth models. Thirdly, we apply a teeth alignment algorithm to simulate the orthodontic treatment. Finally, we train a semantic-guided generative adversarial network to predict the visual outcome of teeth alignment. Quantitative tests are conducted to evaluate the proposed framework, and the results are as follows: the tooth semantic segmentation model achieves a mean intersection of union of 0.834 for the anterior teeth, the average symmetric surface distance error of our 3D teeth reconstruction method is 0.626 mm on the test cases, and the image generation model has an average Fréchet inception distance of 6.847 over all the test images. These evaluation results demonstrate the practicality of our framework in orthodontics.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_14

SharedIt: https://rdcu.be/dnwJw

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The author(s) have proposed a deep generative learning method for orthodontic visual outcome preview. It has been compared with [28] in Table 2. Additional analysis such Ablation and Multiple Evaluation Metrics are conducted.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Clinical relevance to dentists with visible impacts
    2. Partner hospital data collection on orthodontic visual outcome images 3; Nicely designed deep learning strategy
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Incremental performance gains, comparing to [28] in Table 2.
    2. Computational scalability analysis is missing since deep learning is known to be slow.
    3. Statistical significance testing is missing.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Fine, however, the dataset or link to the dataset needed to run the code is not available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Running time comparisons should be added for clinical utility.
    2. Datasets should be released if published on an academic conference.
    3. Statistical testing should be added to ascertain the incremental performance gains, comparing to previous method(s).
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    (see previous comments)

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    -Propose a semantic-guided styleGAN for counterfactual image generation for orthodontic treatment outcome prediction. -The proposed method align teeth and require 1 frontal image for the generation. -The statistical prior (parametric tooth-row template) is used for the reconstruction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Use the segmentation output to guide the image generation.
    • A statistic model is used in the optimization-based reconstruction.
    • StyleGAN is used as the image generation network. the structural information comes from the segmentation, and the style information comes form pSpGAN.
    • Include a teeth alignment algorithm to simulate orthodontic treatment.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -One of the contribution the authors claimed is the explainable. However, from the methods and experimental analysis, i do not see this point has been addressed very well. I suggest the authors clarify this point and its analysis in the rebuttal.

    • The segmentation network contains two parts, and the region-boundary feature fusion is used to output the final results. While the ablation study in Table 1 shows the effectiveness of the proposed fusion module, the comparison to SOTA segmentation models (nn-Unet) is also suggested.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The dataset is private and the code is not open source.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The proposed framework is shown to be effective in generating the image. Further, I suggest to include a causal model in the image generation pipeline. The control of confounders will lead the model more robust and explainable. References include: [1]. MICCAI 2021 - A Structura lCausal Modelfor MR Images of Multiple Sclerosis. [2]. NeurIPS 2020 - Deep structural causal shape models [3]. Arxiv2023 - Causal image synthesis of brain MR in 3D

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While this paper can be further improved with a causal model, the merits outweigh the weakness and worth publishing in the conference.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes a predictive framework for estimating orthodontic visual outcomes from a single frontal photo. The proposed pipeline involves segmenting the teeth and mouth cavity, reconstructing a 3D tooth model, simulating tooth deformation, and generating an output image. The framework is evaluated using dental scans, intra-oral images, and smiling images, and shows improved performance compared to existing baselines.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • This paper is very well written, with all elements of the pipeline clearly explained. The proposed strategy is logically convincing.
    • The application of the proposed approach to orthodontic treatment is highly interesting, as it is a widely performed procedure that has received relatively little attention in the field (to my knowledge). Moreover, the idea of generating deformed visual outcomes using only a single frontal image is also an interesting problem to tackle.
    • The datasets used for evaluation are relatively large, and the evaluation of each step is carefully carried out, with results thoroughly reported.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • While the application of the proposed approach to orthodontic treatment is interesting, the potential clinical relevance and impact of this work should be further discussed and highlighted in the paper. Additional details on this are included in my comments later.
    • The use of established models without enough technical novelty, such as UNets and StyleGANs [Ref. 12, 24], in this paper is reasonable given the novelty of the overall application. However, it would be beneficial to clearly mention the additional novelty of the proposed approach compared to existing methods for the same application, such as [Ref. 3, 20].
    • For the same reason, the comparison with [Ref. 3, 20] is mentioned in the Introduction, but only the comparison with [Ref. 20] is presented for the generation task. The generation task is supposed to be the most important part of this work, the evaluation seems unsound.
    • The proposed framework is logically designed, but is complex due to the involvement of multiple data representations (image, 2D semantic segmentation, 3D point-cloud/mesh) and neural networks.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • The labeling process of the data to obtain ground truth is not clearly explained.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Clinical motivation: it is standard practice to take dental scans before orthodontic treatment. These scans help orthodontists to diagnose and plan the treatment. It is not clear how this application could potentially reduce the effort of dental scans or improve treatment accuracy. It would be helpful to further elaborate on the clinical relevance and impact of this work beyond informing the patient of visual outcome.
    • I enjoy the presentation. The figures presented in this paper are well-designed and visually clear. They effectively illustrate the methods and results.
    • Details of statistical testing are not presented.
    • Additional comparison with Ref. 3 would make the evaluation more solid.
    • The authors could also investigate the impact of errors in each step on the overall performance of the framework. This type of analysis would help simplify the models by potentially identifying areas of improvement in the pipeline. It could also provide insight into the importance of each step in the pipeline and how errors propagate throughout.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The overall score for this paper is between 4 and 5. While the application is interesting, my major concern is the clinical relevance and impact of the proposed approach, as it is more like a computer vision application than a clinically motivated solution. My second concern is whether the novelty is enough compared to existing applications [Ref.3, 20]. Overall, I would recommend a weak accept with a score of 5, given the interesting application and well-done work.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes a deep learning method for orthodontic visual outcome prediction using a semantic-guided styleGAN for counterfactual image generation. The framework involves segmenting teeth and mouth cavities, reconstructing a 3D tooth model, simulating tooth deformation, and generating an output image. The proposed pipeline is compared with existing baselines and shows improved performance. The paper is well-written, and all pipeline elements are clearly explained, making the proposed strategy logically convincing. The application of this approach to orthodontic treatment is highly interesting. However, the potential clinical relevance and impact of the proposed approach should be further discussed and highlighted, especially beyond informing the patient of visual outcomes. The authors claim explainability as a contribution, but it is not well-addressed in the methods and experimental analysis. It is suggested that the authors clarify this point and its analysis in the camera ready. A causal model in the image generation pipeline is recommended to make the model more robust and explainable by controlling confounders. The novelty is also questionable compared to existing applications [Ref. 3, 20], yet the interesting application outweighs this aspect. It is suggested to clearly articulate the additional novelty of the proposed approach compared to existing methods for the same application, such as [Ref. 3, 20]. The fusion module’s effectiveness is shown in the ablation study, but a comparison to state-of-the-art segmentation models (nn-Unet) is also suggested. Furthermore, the computational scalability analysis and statistical significance testing are missing. The authors are encouraged to address these points, espeically related to clarifications and disucssion, in the camera ready version.




Author Feedback

We express our gratitude to the reviewers for their valuable feedback. The reviewers raised several concerns, and we provide the following summary of their concerns along with our corresponding responses.

Concern: The explainability of the approach is not addressed well. Response: The explainability of our approach lies in our ability to control the entire pipeline, resulting in a reasonable output. In the segmentation stage, each tooth is assigned a semantic label through prediction. In the reconstruction stage, a tooth-row model based on prior knowledge is constructed and deformed to generate a person-specific 3D teeth model. Our orthodontic simulation incorporates an algorithm that considers orthodontists’ experience and enables tooth displacement with collision detection in a 3D space. For image generation, we employ pSpGAN to disentangle the teeth structure and appearance, facilitating more realistic and controllable generation. Additionally, we can manually modify the intermediate outputs of each stage to rectify any undesirable errors in the final output.

Concern: The novelty of our approach compared to existing applications [Ref. 3, 20] should be clarified. Response: Our novelty lies in the stages of 3D teeth reconstruction and orthodontic simulation. Previous works have either required 3D teeth model as additional input and predicted its alignment using neural networks [Ref. 20], or directly utilized an end-to-end StyleGAN to predict the final orthodontic outcome [Ref. 3]. In contrast, our approach requires only a single frontal image as input, restores the 3D teeth model through a template-based algorithm, and explicitly incorporates orthodontists’ experience, resulting in a more observable and explainable process.

Concern: It is suggested to compare our approach with state-of-the-art (SOTA) segmentation models (e.g., nn-Unet). Response: We propose the use of a region-boundary dual-branch model and a fusion module to improve the final segmentation results. As this is an additional module that can be easily integrated into an encoder-decoder model, we present its performance when combined with UNet-like models. In future research, we plan to explore its performance with different backbone models and SOTA models.

Concern: The addition of a causal model could enhance the method’s robustness. Response: We appreciate this constructive feedback and intend to explore the incorporation of a causal model in our future work.

Concern: It is recommended to include running time comparisons to assess clinical utility. Response: On average, our method takes approximately 15 seconds to run a single case, with the 3D reconstruction stage accounting for the majority of the execution time. We performed tests solely on an Intel 12700H CPU.



back to top