Authors

Fan Duan, Li Chen

Abstract

Accurate segmentation of digital 3D dental mesh plays a crucial role in various specialized applications within oral medicine. While certain deep learning-based methods have been explored for dental mesh segmentation, the current quality of segmentation fails to meet clinical requirements. This limitation can be attributed to the complexity of tooth morphology and the ambiguity of gingival line. Further more, the semantic information of mesh cells which can provide valuable insights into their categories and enhance local geometric attributes is usually disregarded. Therefore, the segmentation of dental mesh presents a significant challenge in digital oral medicine. To better handle the issue, we propose a novel semantics-based feature learning for dental mesh segmentation that can fully leverage the semantic information to grasp the local and non-local dependencies more accurately through a well-designed graph-transformer. Moreover, we perform adaptive feature aggregation of cross-domain features to obtain high-quality cell-wise 3D dental mesh segmentation results. We validate our method using real 3D dental mesh, and the results demonstrate that our method outperforms the state-of-the-art one-stage methods on 3D dental mesh segmentation. Our Codes are available at https://github.com/df-boy/SGTNet.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43990-2_43

SharedIt: https://rdcu.be/dnwLY

Link to the code repository

https://github.com/df-boy/SGTNet

Link to the dataset(s)

N/A

Reviews

Review #4

Please describe the contribution of the paper

The authors introduced an novel semantics-based feature learning approach that effectively utilize the semantic information and captured both local and non-local dependencies using a graph-transformer. Additionally, they conducted adaptive feature aggregation of cross-domain features to achieve precise cell-wise 3D dental mesh segmentation results.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) They introduced a novel semantics-based feature learning approach that effectively utilized semantic information to enhance local and global features. (2) They designed a new feature fusion module to obtain global dependencies in each domain, further learning semantic information and adaptively fusing cross-domain features.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Many methods based on voxelization have achieved satisfactory results in the mesh segmentation domain, such as PV-RCNN, VoxelNet. However, the authors did not compare their approach with these methods in the experimental section.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

the reproducibility of the paper is very good
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

The authors should compare their approach with some voxelization based methods in the experimental section, such as PV-RCNN, VoxelNet.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

They desigend a novel semantics-based feature learning approach that effectively utilized semantic information to enhance local and global features and for the mesh segmentation. They also designed a new feature fusion module to obtain global dependencies in each domain, further learning semantic information and adaptively fusing cross-domain features.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The authors propose a graph transformer-based feature learning network that decouples position and direction features and allows to perform segmentation of dental meshes. The method benefits from additional semantic predictions. The method is trained and tested on 200 dental meshes and compared to four other SOTA approaches to show superior results.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The method in particular the changes to previous methods seems to be interesting and outperforms the TSGCNet. It seems that it is well suited to create teeth segmentation with accurate borders which I believe is beneficial for the application.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The authors could have provided a more detailed evaluation. For example examing the different teeth IOU or identify failure cases. Furthermore, the method description lacks some details in the beginning i.e. regarding input definition (C and N size) and the STN and kNN. The introduction should be clearer without having to read the TSGCNet. The approach is said to perform significantly better, but this should be backed up by statistical tests.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

There are only mean errors/IOU etc provided, without standard deviation, error distribution nor statistical tests. There is no code provided and no data. Hyperparameters for reproduction are provided. It is unclear how easy the additions to the TSGCNet can be re-implemented. Ablation study to examine the effect of different components is provided.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- The paper does not provide any discussion of failure cases or limitations of the current approach although I assume that it also has some problems (since the IOU is not perfect). Are there cases where the approach fails/ has problems? All examples given are selected because they highlight the proposed approach.
- Only the mean error value is given without any standard deviation or min/max
- Can it be evaluated tooth by tooth to see the differences? This was done for the TSGCNet.
- Can you comment why you did run all SOTA methods with 200 epochs instead of until convergence? This should be justified in the results since there is the possibility that the other approaches converge later than the proposed.
- It is mentioned in the text that opposite teeth have the same labels but on the figure they still have different colors?
- N+B, why 24? Some more explanations/defintions on the TSGCNet should be provided in order to improve comprehension.
- How do the methods compare with regard to memory and time?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I think it is a good paper overall but the evaluation has some flaws and no limitations are discussed although the IOU is still not perfect. On the other hand I appreciate that the method seems to manage displayed difficult cases much better. But maybe the other approaches would perform better when being trained until convergence. If the evaluation design is improved and some more descriptions added, I would be more confident to accept this paper.
Reviewer confidence

Somewhat confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #5

Please describe the contribution of the paper

This paper proposed a 3D dental mesh segmentation method for teeth and gums with following contributions: (1) Take the semantic information related to different teeth spatial positions into account, and the extracted semantic information is fully utilized in the task, reducing the segmentation error of teeth boundaries. (2) For existing methods that only focus on local features, the results of ambiguous region segmentation are limited. This paper uses both local and global information to improve segmentation performance. (3) Adopt adaptive weight fusion to fuse features from different domains, resulting in better segmentation performance. (4) Exceed the recent one-stage dental mesh segmentation SOTA methods in metrics of overall accuracy and mIoU.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This article fully utilizes the semantic information of cells and normal vectors from input images, explores the spatial differences of teeth in different positions, and combines local and global semantic information to improve the accuracy of segmentation. To learn abundant semantic information, this article predicts the pseudo labels of high-dimensional features extracted by Graph-Transformer. Each cell is classified by softmax and MLP, and then maximizes the features between cells of different categories. In this way, it can make the network easier to learn classification related features (including location, shape, etc.), and can inhibit the occurrence of overfitting to a certain extent, which makes it easier for the neural network to capture rich semantic information. At the same time, the extraction of semantic information also combines feature vectors of different scales to ensure that features of different scales can directly participate in the prediction of segmentation results. In the process of generating pseudo labels, the author also fused the information of C-domain and N-domain to make the generated pseudo labels more accurate. At the same time, in the process of feature fusion in different domains, the author used an adaptive method to fuse features from different domains based on learnable weights. This method allows the network to automatically filter out valuable features for segmentation results while suppressing the expression of irrelevant features. The method proposed in this article is like the self-supervised learning method, which allows the network to learn valuable information from the inputs themselves without labels. At the same time, combined with supervised segmentation tasks, the network can converge faster and achieve higher accuracy.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

There are many unclear descriptions in the overall method and the combination of figures and text in this article. (1) For the pseudo label prediction section, the author did not clearly describe the process of using pseudo labels. The author introduced the process of generating pseudo labels and then maximized the differences between different categories of cells but did not provide a detailed explanation of the process of how to maximize them.

(2) In the formula for calculating semantic distance, the author used predicted pseudo labels, but did not consider the possibility of inaccurate pseudo labels during the initial training stage. The author did not explain this and provided a more reasonable solution. (3) Figure 1 in the article is the overall structure of the framework. However, the semantic prediction, graph transformer, and adaptive feature fusion mentioned by the author did not show any details, and the flow direction of the data in the figure lacked arrow guidance, which is to some extent not conducive to readers’ understanding. (4) In the experimental section, the author conducted ablation experiments on the three main modules in the network but did not specify how the control group for the ablation experiment was set up. For example, if the adaptive feature fusion module is abandoned, how does feature fusion proceed? Meanwhile, which structure replaced the graph transformer, which is the main structure of the framework, in the ablation experiment?
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Firstly, the article will make the code publicly available after publication, but there is no mention of publicly available datasets. According to the article description, readers can easily implement the overall framework, and each module has reference articles. The author provides an introduction to the TSGCNet method for extracting C-domain and N-domain from input images, as well as how to fuse the information of these two domains, how to fuse features of different scale features, and the process of generating pseudo labels. The author also mentioned the loss function, GPU configuration, training epochs and other relevant information in the implementation details. But for some hyperparameter and details in the network, the author has not explained in the article (maybe it can be directly obtained from the code), such as the λ, the way to maximize different cells, encoder parameters, etc.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

The author can supplement the description of the method in details, while also reflecting the main components in the figure: (1) How did the author use the pseudo labels predicted to make a greater difference between cells of different categories. In the initial stage of training, whether the generated results of pseudo labels can be directly used to calculate semantic distance, the author should provide further descriptions and explain the relevant principles. The author can further explain the differences between the training and testing parts about the semantic prediction part. (2) Although figure 1 is the overall workflow structure, the author should also try to indicate the details as much as possible, especially for the semantic prediction, graph transformer, and adaptive feature fusion modules mentioned in the article. The author should provide a detailed graphical representation to facilitate readers’ reading and reproduction. If space is limited, author can also use additional figures as a supplement to describe the main components in detail. In addition, the author should include corresponding arrows in the figure to make it clearer for the reader. (3) In the ablation experiment section, the author should clearly describe the settings of the control groups. When the semantic prediction, graph transformer, and adaptive feature fusion modules are not used, the author should provide detailed explanations on how the network achieves feature extraction and fusion. (4) The author should highlight the key point. The introduction and conclusion sections repeatedly mention the full utilization of global and local information, but the methods section does not provide a detailed description of the processing of global information, only through a global graph transformer block. (5) If possible, add more robustness experiments.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

(1) This article proposes a pseudo label prediction module (semantic prediction), which makes it easier for the network to focus on valuable features through the way like self-supervised learning. Although there are some missing details, such as the use of pseudo labels, the concept is novel, and can also prevent the network from overfitting, which can accelerate the convergence of the network. (2) This article effectively combines the spatial features and topological structure of teeth by embedding N-domain into C-domain, while aggregating multi-scale features to improve the accuracy of segmentation. (3) The Transformer based graph network and feature fusion modules used in this article effectively combine global and local features. Although there is a lack of innovation in them, ablation experiments have shown that the introduction of these modules can improve accuracy and surpass SOTA methods.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper presented a new approach for 3D dental mesh segmentation by utilizing semantic information and a graph transformer-based feature learning network. The proposed method achieves superior results compared to other state-of-the-art approaches. It combines local and global semantic information to improve segmentation performance. The authors used a pseudo-label prediction module to enhance the network’s learning of valuable features and prevent overfitting. Additionally, they introduced a new feature fusion module to obtain global dependencies in each domain and adaptively fused cross-domain features. However, the evaluation could have been more detailed, such as examining the different teeth IOU or identifying failure cases. The method description lacks some details, and the figures could have provided more guidance. Authors are encouraged to add more clarifications in response to the reviewers’ comments in their camera-ready version. Overall, the method seems to handle difficult cases much better and has the potential for further development.

Author Feedback

N/A

back to top

3D Dental Mesh Segmentation Using Semantics-Based Feature Learning with Graph-Transformer