Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Javier Rodriguez-Puigvert, David Recasens, Javier Civera, Ruben Martinez-Cantin

Abstract

Estimating depth information from endoscopic images is a pre-requisite for a wide set of AI-assisted technologies, such as accurate localization and measurement of tumors, or identification of non-inspected areas. As the domain specificity of colonoscopies –deformable low-texture environments with fluids, poor lighting conditions and abrupt sensor motions– pose challenges to multi-view 3D reconstructions, single-view depth learning stands out as a promising line of research. Depth learning can be extended in a Bayesian setting, which enables continual learning, improves decision making and can be used to compute confidence intervals or quantify uncertainty for in-body measurements. In this paper, we explore for the first time Bayesian deep networks for single-view depth estimation in colonoscopies. Our specific contribution is two-fold: 1) an exhaustive analysis of scalable Bayesian networks for depth learning in different datasets, highlighting challenges and conclusions regarding synthetic–to–real domain changes and supervised vs. self-supervised methods; and 2) a novel teacher-student approach to deep depth learning that takes into account the teacher uncertainty.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_13

SharedIt: https://rdcu.be/cVRsZ

Link to the code repository

N/A

Link to the dataset(s)

http://cmic.cs.ucl.ac.uk/ColonoscopyDepth/

https://doi.org/10.7303/syn26707219

Reviews

Review #1

Please describe the contribution of the paper

The paper introduces bayesian neural networks to single-view depth estimation in colonscopy. Furthermore, the paper discusses synthetic-to-real domain challenges, self-supervised methods and introduce a new student-teacher model that considers the teachers uncertainty. These introduced methods and the results described in the paper can be valuable over a wide variety of endoscopic image analysis applications.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper starts off strong, introducing the domain topic and terminology around uncertainty.
- The self-supervised learning and teacher-student model with uncertainty are novel. Especially, the latter as the paper links this well to domain shift and label reliablity.
- The paper combines both aleatoric and epistemic uncertainy and the results indicate a clear distinguishment between both.
- The results from the paper can have impact on a wide variety of endoscopic CAD applications, where a more accurate and reliable (monocular) depth estimation, could lead to a better performance.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Only an internal comparison are performed, against variants of the proposed methods methods. Continuously, this makes the fact that the teacher student model outperforms the other methods less significant/impactful. Furthermore, no mention on other works that combine AU and EU quantification.
- While two data sets are used for evaluation, only one of them consists of real data and focusses on a single application. In my opinion, this is a big loss, as I have the feeling that the proposed method could also show interesting results for other applications. Demonstrating this would really improve the impact. Of course there is limited space in a conference paper, but I would love to see some additional data/tasks for evaluation.
- The authors fail to clearly mention the limitations of the proposed approach and makes often claims that are a bit too bold.
- The uncertain teacher performs only incrementally better over the teacher-student (Table 2), and in all honesty, to me the increment is negligible, even though it seems to be consistent. This combined with the fact that the training is only done over one split/initialization begs the question if the results are statistically significant or not, this is not mentioned in the manuscript.
- Paper lacks implementation details on the model besides information on the loss functions and the metrics are not explained. More in general, the paper feels fragmented at times. There could be more effort in to making the paper more coherent.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper lacks reproducibility but the novelties are well discusses and can be implemented by the reader. The paper could contain an appendix with details on their architecture and training methods. Metrics not well explained. It is a shame that a lot of the points on the given reproducibility list are not provided (the authors are honest and open about it though. The paper could be a lot better on this front by including these details. I have the strong feeling that the page limit has had a strong negative impact on the complete description of this interesting work.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
Honestly I really like the work, but its quality is currently not sufficiently reflected by the manuscript in its current form. In my opinion, only two main issues should be addressed to address this:
1. Clear and complete writing: The authors should add a more complete description of the proposed approach, that would enable reproducibility of the results. Also, some related work regarding aleatoric and epistemic uncertainty could have been included. Furthermore, some more discussions on the limitations of the proposed approach would be of great value to the manuscript.
2. Complete and extensive evaluation: Currently, the evaluation is a bit meager at times, especially since the differences with respect to the state of the art is minor. Including more different applications here would really help strengthen the generalizability of the work.
Finally, some minor issues:
- Pre-requisite -> prerequisite
- “… single-view depth estimation in colonscopy”
- An exhaustive analysis seems a bit too much.
- I would refrain using perfectly.
- MC dropout is a practical and scalable approximation of VI, but not the most reliable.
- “Illumination” in section two on bayesian preliminaries.
- 18 networks for endoscopy is quite expensive, perhaps Variational inference could still have been applied by making only a few crucial layers bayesian.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The novelty is interesting but the metrics and experiments are not well conducted/explained. The authors are the first to apply uncertainty quantification to this domain and introduce new methods for it. However, the results shown provide only limited support to the made claims. Improving on this front would greatly improve the generalizability of the work. The novelty of the work and the potential impact on other applications still motivates an accept in my humble opinion, albeit a weak accept, given the current limitations.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

This paper applied Bayesian deep networks for estimating depth maps with uncertainty for colonoscopic images. The additional teacher-student model taking into account teacher’s uncertainty further improves the depth estimation accuracy. The proposed approaches were validated on both synthetic dataset and real colonoscopic dataset.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors claim that they are the first to apply Bayesian networks for depth estimation with uncertainty on colonoscopic images. The paper is well structured and both quantitative and qualitative validation results of the proposed methods are also provided.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The authors may want to revise the literature review to better summarize the related works.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Methods applied the paper are well referenced.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
1. Please revise the literature review of Single-View Depth Learning to provide a more precise summary of the existing works.
2. On Page 7, “However, we observe that it successfully generalizes to the real domain…”. Does the network generalize well because there is not much appearance difference between the synthetic dataset and the EndoMapper dataset? Do we expect bad generalization ability when applying to a real endoscopic dataset with a different surface appearance? How many colonoscopic procedures are included in the EndoMapper dataset?
3. Please address the typos in the paper.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper presents a novel approach for estimating both depth maps and the uncertainty of colonoscopic images. The proposed teacher-student learning model with uncertainty further boosts the network performance.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Somewhat Confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

This paper presents a method to estimate both depth and uncertainty in the depth estimates using a teacher-student model architecture and the method is able to generalize to real data even when trained on synthetic datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Authors present clear explanation of the methods with thorough explanation of the intuition behind the methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Details in implementation are missing. For instance, how many epochs were models trained for, what hyperparameters were used and how were they chosen, etc. Without these details, results from this method may be difficult to reproduce.

Clarity in the results section could be improved (see detailed comments).
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Reproducibility needs improvement. Model training details like hyperparameters used are not present in the current manuscript.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Authors should reference Liu et al. ‘Reconstructing Sinus Anatomy from Endoscopic Video– Towards a Radiation-free Approach for Quantitative Longitudinal Assessment’ on pg. 4 where they describe their network’s dual outputs of depth and variance (I believe this reference outputs a mean depth map along with the standard deviation).

Are the rotation and translation of Eq. 5 the predicted relative camera motion? If so, please clarify in text.

While minor, it should be pointed out that according to the manuscript guidelines (https://conferences.miccai.org/2022/en/PAPER-SUBMISSION-AND-REBUTTAL-GUIDELINES.html), the ‘paper itself must contain all necessary information and illustrations by itself’ and not require readers to refer to references to understand, for instance, the evaluation metrics. I understand space is a constraint, but if authors are able to include in parentheses, for instance, what AUCE stands for where it first appears in text, that would be helpful.

To clarify, for evaluations shown in Table 2, the COLMAP reconstruction is used as ground truth?

Since the model trained with supervised GT already performs fairly well on real data, to what is the teacher-student architecture contributing to domain transfer?

The sentence “Note that, in general, teacher-student depth metrics outperform the models trained with GT supervision in the synthetic domain and with self-supervision in the real domain” is confusing and it is unclear what the authors are trying to claim. Are authors claiming that synthetic supervision for teacher-student models generates lower depth errors on synthetic data? Should this be shown in Table 1?

Fig. 4 caption - should b) say teacher-student instead of self-supervised?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the paper needs improvement, it introduces nice concepts and shows interesting results. Due to the missing clarity, I would not recommend direct accept, but is a good candidate for acceptance after revision.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper presents a model for single-view depth estimation in colonoscopic videos. The method combines self-supervision, Bayesian networks, and a teacher-student approach that considers the teacher’s uncertainty. The three reviewers have agreed on the interest and novelty of the proposed concept and its potential impact. Validation includes qualitative and quantitative experiments with both synthetic and real data. Despite several improvement suggestions for the experimental section: comparison to other methods, additional train/test split, and different real datasets/applications, the paper’s validation seems sufficient.

Limitations to be addressed in the rebuttal/revised version of the paper are: -Reproducibility: this is an important policy for MICCAI 2022. Revise to include all implementation details on the model, loss functions, and metrics (R1, R3). -State of the art (R1, R2, R3): better position the work w.r.t. to other aleatoric and epistemic uncertainty and single view depth estimation methods. -Discuss the method’s generalization ability (R3) and limitations (R1) -Discuss the statistical significance of the results.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

2

Author Feedback

We would like to thank you all for your efforts on handling our submission and providing such excellent feedback on our work. Your comments have been very valuable to improve the quality of our manuscript and our work. In addition to modifications to address your feedback, we will make a full and thorough revision of the text for the camera-ready so you might also notice minor modifications (rephrasings, fixed typos, etc.).

Q1: Reproducibility (R1, R3).

Besides the mathematical description of the loss functions, we include in the supplementary material graphical content regarding the loss functions used. We will include a short definition for the metrics used in the camera-ready manuscript for better understanding. We plan to release our implementation along with the training details. The datasets we used are already publicly available, we will also emphasize that in the manuscript.

Q2: State of the art (R1, R2, R3): We will provide an extended review of the state of the art

Q3: Model generalization (R3) : Although the synthetic dataset includes three different set of textures and lighting conditions, there exist a noticeable difference in appearance and illumination between synthetic and real dataset, even though the generalization is good. Based on further experimentation not included in this work, we expect good generalization for endoscopic images. Nevertheless, in case of bad generalization of the depth estimation, in a well calibrated model the epistemic uncertainty should be high in areas out-of-distribution e.g. different surface appearance.

Q4: Are the rotation and translation of Eq. 5 the predicted relative camera motion?(R3) Yes.

Q5: For evaluations shown in Table 2, the COLMAP reconstruction is used as ground truth? (R3) Yes

Q6: Since the model trained with supervised GT already performs fairly well on real data, to what is the teacher-student architecture contributing to domain transfer? (R3)

Our teacher-student architecture improves the depth prediction and the uncertainty calibration. The improvement can be quantitatively observed in Table 2, comparing the metrics between the “Supervised GT” case and the “Uncertain teacher (ours)” case. The model outputs can be qualitatively compared also in Fig. 4. Note for example how “Supervised GT’ depth prediction is failing at specular reflections, while our “Uncertain teacher” is able to make better prediction at those areas.

Q7: The sentence ”Note that, in general, teacher-student depth metrics outperform the models trained with GT supervision in the synthetic domain and with self-supervision in the real domain” is confusing and it is unclear what the authors are trying to claim. Are authors claiming that synthetic supervision for teacher-student models generates lower depth errors on synthetic data? Should this be shown in Table 1? (R3)

We want to emphasize that, in real data, the teacher-student model presents the best results in depth metrics. It should be in Table 2 because the teacher is supervised with GT data, but the student is supervised by the teacher predictions in real data and evaluated in real data.

back to top

On the Uncertain Single-View Depths in Colonoscopies