Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Jingwei Song, Qiuchen Zhu, Jianyu Lin, Maani Ghaffari

Abstract

This paper reports a CPU-level real-time stereo matching method for surgical images (10 Hz on 640*480 image with a single core of i5-9400). The proposed method is built on the fast LK algorithm, which estimates the disparity of the stereo images patch-wisely and in a coarse-to-fine manner. We propose a Bayesian framework to evaluate the probability of the optimized patch disparity at different scales. Moreover, we introduce a spatial Gaussian mixed probability distribution to address the pixel-wise probability within the patch. In-vivo and synthetic experiments show that our method can handle ambiguities resulted from the textureless surfaces and the photometric inconsistency caused by the non-Lambertian reflectance. Our Bayesian method correctly balances the probability of the patch for stereo images at different scales. Experiments indicate that the estimated depth has similar accuracy and fewer outliers than the baseline methods in the surgical scenario with real-time performance. The C++ code is attached.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16449-1_32

SharedIt: https://rdcu.be/cVRW7

Link to the code repository

https://github.com/JingweiSong/BDIS.git

Link to the dataset(s)

https://github.com/JingweiSong/BDIS.git


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper describes an approach to stereo reconstruction of surgical video building on Bayesian searching of correspondences as an extension to dense inverse searching. The approach is tested on synthetic and clincal data set and evaluated with respect to computing requirements and performance. The evaluation is done quantitativelywith respect to a diffuse and non-Lambertian ilumination and qualitatively wrt the achieved 3D reconstructions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A very nice paper, nicely developed and presented methods, well evaluated and discussed.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The results are reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    accept as is

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    8

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The overall impression, presentation and completeness of the evaluation.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This is paper introduces a stereo matching approach for minimally invasive surgical videos. The proposed approach includes a Bayesian Dense Inverse Searching method and a Spatial Gaussian Mixture Model to deal with textureless or no-Lambertian surfaces. The proposed approach runs fast in run-time and has been evaluated both synthetic and in-vivo dataset. The comparisons to state-of-the-art are also provided and results have shown the approach archives close performance to ELAS and has doubled run-time speed on processing same-sized images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The technical novelty of this paper is adequate. Combining Bayesian model into Dense Inverse Searching provides the confidence of identifying textureless and non-Lambertian surfaces, and this is particularly useful for dealing with specular highlights in surgical videos.
    2. The proposed approach presents good performance on both synthetic and in vivo datasets and has been compared to both classic feature matching and deep learning approaches. It is worth noting that the approach runs in real-time 25Hz on 360x288 images.
    3. Given current trend of using deep learning for depth estimation, this reviewer is delighted to see the practical approach proposed in this paper. Although DL-based approaches present outstanding performance in MIS data, these approaches however have not yet addressed their generalizability issues. Researchers are currently experiencing poor performance when using the trained networks to process unseen data.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Quantitative evaluation was not performed on either in vivo or ex vivo data.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have provided their code of the implementation.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This paper is overall written well. The flow of the paper is easy for readers to follow. I have minor comments listed as below:

    1. The second equation on Page 5 is missing a label. In addition, in this equation, there might be a typo in the second part:   .   ^2_F instead of   .   ^2_2.
    2. In the first paragraph of Section 3 /gamma was set to two different values for two different sizes. Can the authors provide more details how this value is chosen?

    3. There are regions in Fig 4 being highlighted in red circles, however it is not clear in the text why these regions are highlighted and what particular points the authors would like to discuss. I would recommend the authors to include a brief description of this in the figure’s caption.

    4. Currently, quantitatively results are only provided on synthetic data. Although qualitative in vivo results have been provided, it is not clear what the actual accuracy numbers are. Please have a look at the SACRED “Stereo Correspondence and Reconstruction of Endoscopic Data Challenge” dataset where ground truth data are available.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper presents a nice framework for stereo matching in surgical videos. The presented approach has adequate technical novelty that includes a novel Bayesian Dense Inverse Searching and a spatial Gausian Mixture Model. The validation is well conducted and results have shown their approach provides competitive performance to ELAS whilst retaining real-time processing speed.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors tackle the problem of disparity estimation in stereoscopic pairs of surgery scene. Their intent is to have a computer efficient approach. The proposed algorithm is a modified version from standard patch-based matching. The key points of the modification are the introduction of a Bayesian computation to estimate posterior probability and to associate a confidence to the pixels.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of the paper is a rather complete experimental work (synthetic and real data) and comparison with different approaches.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    In the result, one may question the interest of including the DNN approaches. As underlined by the authors, these methods have a sensitivity to the training.

    From a quantitative point of view, the numbers demonstrate an improvement which is quite modest. The qualitative images are interesting, but it is difficult to assess if the improvement is significant for a clinical user.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility is standard. The authors offer to provide all useful material.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Few comments to specific points:

    In eq 2, the computed weights for the fusion of the different patches may include a division by zero, in case of perfect egality between the left and right intensity. This point is not mentioned in the submitted article.

    “We compared ELAS, BDIS, DIS, and SGBM on the in-vivo data sets. Since no ground truth is provided, DNN-based methods cannot be implemented.” -> do not understand since it is just about doing inference on these data and not training?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed methods are classic but the authors made a true experimental work.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Introduction: This paper reports a CPU-level real-time stereo matching method for surgical images and builds on Bayesian searching of correspondences as an extension to dense inverse searching.

    Strengths: • Integrating the Bayesian model with Dense Inverse Searching improves confidence of identifying textureless and non-Lambertian surfaces. This is particularly useful for MIS for organs with poor textures or for dealing with specular reflections. • Qualitative and quantitative results on synthetic and real data have been provided. The algorithm is also capable of running in real-time. • Code has been provided with the paper.

    Weaknesses:
 • R2 and R3 were not convinced about the quantitative evaluation of algorithm. Specifically, R3 would like to know the clinical impact of the algorithm and how the surface reconstructions impact the clinical user. • (Minor) R2 and R3 have also highlighted minor typos and have sought clarification regarding Figure 4 and a couple of equations.

    Points to be addressed by authors: • Overall, this is an excellent paper and has been well received by all three reviewers. I would suggest addressing the questions raised by the reviewers on the qualitative and quantitative evaluations of the algorithm.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1




Author Feedback

The authors would like to thank the reviewers (R) for their constructive feedback and constructive suggestions. All the suggestions/concerns are summarized:

Q1: Quantitative evaluation was not performed on either in vivo or ex vivo data. R2.

Reply: Thanks for raising this concern. Our major claim is that this method is CPU-level real-time and achieves similar accuracy to ELAS. Regarding the accuracy, the experiments on the synthetic data sets (quantitative) and Hamlyn in-vivo data set (qualitative) show the accuracy is similar to or slightly better than ELAS. The experiments are adequate to validate the claims. We understand R2’s suggestion that quantitative evaluation of real-world data sets is more convincing. We put the table with a quantitative difference between ELAS and our method on Hamlyn data set to the supplementary material. The table shows that the average difference is around 1 mm.

In the final submission, we will provide a link to the experimental report of the real-world in-vivo SCARED (suggested by R3) and ex-vivo SERV-CT data sets, which have reference depths. Both video and detailed qualitative results will be presented. The proposed method does perform better than ELAS in these real-world experiments.

Q2: From a quantitative point of view, the numbers demonstrate an improvement which is quite modest. The qualitative images are interesting, but it is difficult to assess if the improvement is significant for a clinical user. R3.

Reply: Thanks for raising this concern. We claimed that “To our knowledge, BDIS is the first single-core CPU based stereo matching approach that achieves SIMILAR performance to the near real-time method ELAS.” We carefully worded our claim to emphasize that, while running at a much faster speed, the proposed method achieves similar/slightly better performance than ELAS in surgical scenarios.

Regarding clinical usage, this method provides the first cheap CPU-level stereo matching algorithm that achieves similar (slightly better as the experiments indicate) accuracy as the near real-time ELAS. Thus, this method is suitable for GPU-denied scenarios. According to our consultation with engineers, the computational resources are insufficient when lacking a high-end GPU or saving GPU for other tasks such as SLAM, detection, segmentation, or disease diagnosis. Computational resource deficiency in the operating room is a common problem for medical devices. Adding more computational power is difficult on real products, due to hardware design constraints in thermal and physical dimensions. Moreover, for DNN methods, their generalization is still questionable (R2 and R3).

Minor issues:

  1. The reason comparing with DNN methods. R3. Reply: This paper was rejected previously by MICCAI with the major flaw for not comparing with DNN methods. Thus, DNN methods comparisons have been added as a reference. Just as R3 suggested, we also emphasized that “The comparison between the prior-based DNN-based method and BDIS is for completeness only”.

  2. Test on SCARED data set. R2. Due to the page limit, tests on SCARED and SERV-CT are provided in the extended paper. The final MICCAI publication will give links to the detailed experiments report and demo video.

Finally, we appreciate the reviewers and editor’s kind help.



back to top