Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Huimin Xiong, Kunle Li, Kaiyuan Tan, Yang Feng, Joey Tianyi Zhou, Jin Hao, Haochao Ying, Jian Wu, Zuozhu Liu

Abstract

Optical Intraoral Scanners (IOS) are widely used in digi- tal dentistry to provide detailed 3D information of dental crowns and the gingiva. Accurate 3D tooth segmentation in IOSs is critical for vari- ous dental applications, while previous methods are error-prone at com- plicated boundaries and exhibit unsatisfactory results across patients. In this paper, we propose TSegFormer which captures both local and global dependencies among different teeth and the gingiva in the IOS point clouds with a multi-task 3D transformer architecture. Moreover, we design a geometry-guided loss based on a novel point curvature to refine boundaries in an end-to-end manner, avoiding time-consuming post-processing to reach clinically applicable segmentation. In addition, we create a dataset with 16,000 IOSs, the largest ever IOS dataset to the best of our knowledge. The experimental results demonstrate that our TSegFormer consistently surpasses existing state-of-the-art baselines. The superiority of TSegFormer is corroborated by extensive analysis, vi- sualizations and real-world clinical applicability tests. Our code is avail- able at https://github.com/huiminxiong/TSegFormer.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_41

SharedIt: https://rdcu.be/dnwJW

Link to the code repository

https://github.com/huiminxiong/TSegFormer

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The paper presents TSegFormer, a multi-task 3D transformer architecture for tooth segmentation in Optical Intraoral Scanners (IOS) point clouds. The method captures both local and global dependencies among teeth and gingiva, and uses a geometry-guided loss based on point curvature for refined boundaries. A large dataset of 16,000 IOSs is created for evaluation, even though is not available.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Accurate 3D tooth segmentation is critical for various dental applications, and this method provides accurate and reliable results, surpassing existing state-of-the-art baselines.
- The proposed method captures both local and global dependencies among different teeth and the gingiva
- A geometry-guided loss based is used to refine boundaries, resulting in an end-to-end segmentation process, which avoids the need for time-consuming post-processing.
- The authors created a dataset with 16,000 IOSs demonstrating the comprehensive evaluation and generalizability of the proposed method.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- the fluency and correctness of the text could be improved
- The method may not work well on low-quality or incomplete IOS point clouds, leading to inaccurate segmentation results.
- poor dataset description given that will not be made public
- no code avalable
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Non reproducible
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Plase improve the description of the model. Considered publishing at least a some sample from the dataset
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

3
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper provides to few details given that both the code and the dataset are not available
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

4
[Post rebuttal] Please justify your decision

The release of the code and some samples of the dataset will be of help. This will be done in future, thus I still cannot clearly evaluate the contribution

Review #2

Please describe the contribution of the paper

This paper presents a novel network to segment oral structures (tooth and gingiva). The method is appropriated and its results were compared with previous studies using well-known metrics.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The network architecture is novel
- Large enough database for training, testing and validation (considering this kind of clinical fields where datasets are usually smaller)
- Additional “difficult cases” (another external dataset) to check a later clinical translation
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Maybe, the mayor concern about this study is related to the selected networks to compare with the proposed one. From my point of view, it will be fair if any similar development (I mean similar i.e. others transformer-based networks) would be included in the comparison. Anyway, the selected networks have a good performance, over 80 in all metrics. So, the results are quite valid and interesting enough.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Nothing to add
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

It will be great to promote an open access to this dataset for the scientific community.

Maybe, other papers with transformer networks solutions should be considered to be compared with this proposed one. Some links I hope be useful: https://doi.org/10.1007/978-3-030-87589-3_40 https://doi.org/10.1007/978-3-030-59719-1_78 https://doi.org/10.1109/JBHI.2021.3129245 https://doi.org/10.1109/HORA55278.2022.9800072
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper is well-written and organized. The performance of the proposed network is compared with previous work. Additionally, a specific dataset with clinical ‘difficult’ cases have been tested with an acceptable behavior.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #4

Please describe the contribution of the paper

Main contributions are: (1) Designing a novel tooth segmentation framework based on the transformer architecture and a multi-task learning paradigm. (2) designing a novel point curvature loss that helps in refining the boundaries. (3) introducing an auxiliary loss helps in differentiating between teeth and gingiva better (4) demonstration that the proposed method works better than the existing tooth segmentation methods (5) extensive ablation and hyperparameter tuning results and (6) clinician’s visual scoring of the tooth segmentation results on challenging scans (7) validation of the method on a very large amount of data i.e. 16000 IOS scans.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

There are challenges in applying the transformer architecture to tooth mesh files directly. Hence the transformation of the tooth mesh file to a point cloud for the purpose of segmentation utilizing transformer architecture is interesting. The formulation of a novel transformer based architecture and point curvature loss for this task is promising. The method has been validated on a large dataset and further validated by the clinicians for particularly challenging cases. The authors have also added results on how the transformer based architecture performs when the number of samples are varied from 500 to 12000.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

(1)Downsampling the point cloud is not equivalent to a decimated mesh and modified cell centers/barycenters. The downsampled point cloud may no longer preserve the toppology of the original mesh files as the mesh triangles are disconnected now. If the authors could comment on how this is being handled, it will strengthen the paper. (2) Some details are missing e.g. what is the size of the downsampled point cloud? 10000 as mentioned in the supplementary material? Having this detail in the paper would be helpful (3) The training is being performed on a pointcloud size of 10k and test/inference is being performed on the original mesh files which are >= 100k. The training and inference on different scales of the pointcloud might be a problem. Generally the other tooth mesh segmentation methods operate on the same scale. (3)It is not clear how the other baselines have been tested e.g. what is the size of the meshfiles that TSGCNet and MeshSegNet and others methods have been trained on tested on? The training-inference of these other methods were done on same scale or other different scales? e.g. TSGCnet was developed on a meshfile size of 16k. Did the authors train the TSGCNet on a 16k meshfile and then test it on 16k meshfile or test it on the original 100k meshfiles? (4) Some of the implementation details are missing: what is the GPU that has been used to train, what is the batch size, what is the time taken to train an epoch
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Although the authors have not released any code or pretrained model, they have provided details regarding the training such as loss function, learning rate, hyper parameter tuning, number of epochs etc in the supplementary material. The authors showed the dental disease distribution of the challenging cases on which they conducted visual scoring by clinicians. Though the mean and error bar of the performance metrics is not provided in this case, the results might be sufficient given that the dataset is large i.e. 16000 IOS scans.The authors have specified details of the convolution layers such as input size and output size, number of nearest neighbors for the knn, kernel size and the attention layer details. From these details it seems the work could be reproducible. Adding some more details e.g. GPU used and batch size would be helpful.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Addressing the comments mentioned in the weaknesses section would strengthen the paper. I think adding some visual examples where TSegFormer did not work will be beneficial.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors have created a transformer based novel framework for the problem of tooth mesh segmentation. The method has strengths, but there are some questions regarding the implementations and other details which need to be addressed. Hence the rating.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

7
[Post rebuttal] Please justify your decision

The authors have provided the details which I had requested. The response is satisfactory. This is the first adaptation of the transformation architecture for the tooth intraoral scanner segmentation problem. It is non-trivial in nature because tooth mesh is different from the general mesh in the sense that one needs to work with high resolution. In general vision the mesh segmentation related problems usually deal with meshes which have much less number of faces. But for intraoral scans, at such small #faces, the dental mesh would lose its topology of the individual teeth. In fact it is for this reason that tooth meshes are treated as kind of pseudo point cloud where the he three vertices of the face are associated with the face center as attributes and this face center becomes equivalent to a point with attributes. I think this paper should go for oral presentation.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper proposes a 3D transformer-based architecture for tooth segmentation in Optical Intraoral Scanners (IOS) point clouds, which outperforms existing state-of-the-art methods. Another external dataset that includes difficult cases is used for clinical validation. An ablation experiment to assess the impact of training sample size is conducted. The approach considers both local and global dependencies among teeth and gingiva, and employs a geometry-guided loss based on point curvature to refine boundaries. To evaluate the performance, a large dataset of 16,000 IOSs was constructed, although it will not be made publicly available. While the proposed method has several advantages, such as end-to-end training that avoids time-consuming post-processing and operating on a point cloud instead of a mesh file, there are concerns regarding its applicability to low-quality or incomplete IOS point clouds. The method downsamples the point cloud, which may affect the topology of the original mesh files. It is not clear from the paper how the authors have handled this issue. Moreover, the mismatch of point cloud size between training and inference is not well justified. It would be beneficial to discuss other similar transformer-based networks for comparison and justify why these works were not included in the results. The lack of a detailed dataset description and the absence of publicly available code are also weaknesses that could be addressed to improve the reproducibility of the work. Additionally, there are some missing technical details, such as GPU, batch size, and training time. Visual examples where the proposed method did not work could also be provided to strengthen the paper. Overall, the paper presents a promising approach for tooth segmentation, but further improvements and clarifications are needed to address potential limitations and strengthen the findings.

Author Feedback

We thank all reviewers and AC for their valuable comments! We summarize and address them below. Reviewer #1 Q1: Text correctness A1: We will proofread and revise the paper. Q2: Performance on low-quality/incomplete IOSs A2: The methods are evaluated on a large-scale dataset with complex IOSs, including low-quality or incomplete ones. In particular, 1) The dataset with 16000 IOSs exhibits complex anatomical structures, such as varying resolutions (100k-350k mesh faces) and missing (incomplete) teeth as in Supplementary Material (SM) Tab 2, e.g., a missing rate of 34% for third-molars. TSegFormer’s performance surpasses the baselines. 2) The external dataset for clinical test comprises 200 mixed-quality cases, covering various dental diseases (residual roots, defective teeth etc), with distributions in SM Tab 3. TSegFormer’s clinical error rate is much lower than the baselines. 3) Fig 3 illustrates the segmentations and boundaries on IOSs with various dental diseases, which shows the superiority of TSegFormer on low-quality/incomplete tooth point clouds. Q3: Data description A3: Detailed dataset statistics (missing teeth/diseases) are in the SM. The patients are at age 20.05±8.27 years old, with 34.5% male and 65.5% female. The dataset is not publicly available due to privacy concerns. We will release a partial version of it. Q4: Code A4: We will make the code public.

Reviewer #2 Q1: Comparison with transformer nets A1: Thanks for the constructive comment. To our best, there has been no prior work on transformer-based 3D tooth IOS segmentation. Hence, we select recent state-of-the-art general 3D-transformers as strong baselines in Tab 1. The suggested references also adapt transformers to digital dentistry while focusing on structured medical images like X-rays/CBCT slices, which differ from non-Euclidean 3D tooth point clouds/meshes. We will discuss them in the revision.

Reviewer #4 Q1: Preservation of mesh topology A1: We agree that downsampled point clouds would lose mesh topology. Hence, we extract features such as normal vectors, Gaussian/point curvatures from meshes to preserve additional geometric information. Moreover, the self-attention in TSegFormer can effectively leverage both local and global geometry, i.e., our method performs better without requiring post-processing. The segmentations are mapped back to raw meshes with satisfactory clinical utility (Tab 5). In fact, most existing 3D IOS segmentation works are on point clouds, mainly because directly handling meshes with deep nets is hard and computationally expensive, especially for high-resolution IOSs. Recent works develop general mesh segmentation nets (MeshMAE, MeshFormer etc) for simple meshes (1.5k-2k faces/1.5k-5k vertices). The scalability to high-resolution meshes and effectiveness in medical data should be investigated in future study. We will discuss them in the revision. Q2: Downsample details A2: Downsampled point cloud size is 10000. Q3: Different point cloud scales in training/inference A3: In inference, we process the raw point cloud (PC) into multiple sub-PCs each with 10,000 points, i.e., consistent with training. If a sub-PC has fewer than 10,000 points, we randomly select points from the raw PC to ensure the condition holds. We feed all the sub-PCs to the trained model to infer each point’s label. Q4: Baseline settings A4: We follow the default settings of baselines for training. In inference, the performance is evaluated on raw point clouds/meshes. This is also employed in previous work because it’s consistent with real-world clinical scenarios, i.e., each face in the IOS should be annotated for precise diagnosis and manufacturing of aligners. Q5: Implementation A5: We use NVIDIA RTX3090 with batch size 6. An epoch takes about 441s. Q6: Failed cases A6: TSegFormer may fail in some complex cases, e.g., erupted third-molars, tooth extraction sockets etc. We will visualize unsatisfactory samples in the clinical utility test in the SM.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors have adequately addressed the reviewers’ comments. R1 concerns about data and code availability should not undermine the paper’s contribution. I suggest accepting the paper.

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

R1 is still voting for rejecting the paper, but the other two reviewers strongly support the paper. Also, I do not consider the weaknesses raised by R1 very substantial so I vote for accepting the paper.

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal does not fully address the concerns raised by reviewers.

back to top

TSegFormer: 3D Tooth Segmentation in Intraoral Scans with Geometry Guided Transformer