Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Chen Yang, Kailing Wang, Yuehao Wang, Xiaokang Yang, Wei Shen

Abstract

Reconstructing deformable tissues from endoscopic stereo videos in robotic surgery is crucial for various clinical applications. However, existing methods relying only on implicit representations are computationally expensive and require dozens of hours, which limits further practical applications. To address this challenge, we introduce LerPlane, a novel method for fast and accurate reconstruction of surgical scenes under a single-viewpoint setting. LerPlane treats surgical procedures as 4D volumes and factorizes them into explicit 2D planes of static and dynamic fields, resulting in a compact memory footprint and significantly accelerated optimization. The efficient factorization is accomplished by fusing features obtained through linear interpolation of each plane and allows us to use lightweight neural networks to model surgical procedures. Besides, LerPlane shares static fields, significantly reducing the workload of dynamic tissue modeling. We also propose a novel sample strategy to boost optimization and improve rendering quality in regions with tool occlusion and large motions. Our experiments on DaVinci robotic surgery videos demonstrate that LerPlane accelerates surgical scene optimization by over 100$\times$ while maintaining high quality across various non-rigid deformations, showing significant promise for future intraoperative surgery applications.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43996-4_5

SharedIt: https://rdcu.be/dnwOF

Link to the code repository

https://github.com/Loping151/LerPlane

Link to the dataset(s)

N/A


Reviews

Review #3

  • Please describe the contribution of the paper

    Inspired by so-called ‘Neural Radiance Fields (NeRFs), a state of the art 3D scene reconstruction approach, and EndoNeRF’, one of its advancements in the context of reconstruction of deformable tissue in dynamical endoscopic surgical scenes, the authors present a novel deformable tissue 3D reconstruction method called ‘LerPlane’, which offers the following novel contributions:
    1) Improved runtime performance compared to state of the art methods 2) An efficient surgical scene representation 3) An image sampling concept that improves neural network optimization and 3D rendering quality Using conventional robotic surgery stereo videos, LerPlane is compared to EndoNeRFand another state of the art method called E-DSSR in an experimental setup. Results demonstrate that LerPlane achieves a rendering quality that is comparable to the considered state of the art methods, but produces faster results and the ability to handle surgical tool occlusion and significant motion.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Overall a solid work with sufficient novelty aspects that is well motivated, described and evaluated.
    • The considerable runtime performance optimization aspect of the presented surgical scene reconstruction method seems to be a tremendous advantage compared to other methods when it comes to clinical acceptance of this method in routine surgery.
    • Section 2 (Method) is well-structured with multiple sub-sections
    • The usefulness of the proposed method ‘LerPlane’ is well demonstrated via a comparison against two state of the art methods -The supplementary material is excellent: Multiple surgical videos are provided in which a comparison against the EndoNeRF state of the art method is visually demonstrated. In addition, qualitative results are shown.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The conclusion section is very short (8 lines of text only) and doesn’t include a discussion section in which things to be improved and/or future development are discussed.
    • There is no visual presentation of the quantitative study results in the form of data plots, which would make it easier to compare individual study components. Instead, all results are in Table 1 with acronyms like ‘Ours-NS’, ‘Ours-TS’ etc. that are not directly explained in the table caption. This makes it harder to understand the results.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • Availability of source code and data sets: Since the authors indicated that their source code incl. used datasets will be released if this work is accepted, they should provide specific details such as links to a github repo (or similar) and their data sets before this paper is accepted. In section 3.2, on page 7, the authors wrote “The code will be released later.” This sentence is not clear since it can mean that the code will be released before the paper is published, or the code will maybe be released after the paper is published. The sentence should be changed in order to provide the reader with clearer information.

    • Implementation details: 1.) In section 3.2 on page 7 the concrete Ubuntu version should be mentioned. 2.) On a positive note, concrete values of neural network hyper-parameters are described which increases reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    There are a few minor things that could be improved:

    1.) Section 1, page 3, starting from line 4: The contributions 1), 2) and 3) should be highlighted as their own text enumeration rather than just listed in the normal text. This makes it more readable.

    2.) Section 2.2, page 3, equation 2: Maybe it would increase readability if you explain what “w(t)” is

    3.) Section 2.3, page 4, Figure 2: Maybe label the fused feature vector in the figure with “v”, so that the reader can refer to it when reading the text below.

    4.) Section 2.3, page 4, second to last text line at the bottom of the page: describing a computational cost as “nearly infinite” seems to be a bit imprecise. Can you change this to a concrete O-notation?

    5.) Section 3.3, page 7, first text line of this section: The sentence “we compare…” should start with a capital ‘w’.

    6.) Section 3.2, on page 7 (as already mentioned in my reproducibility comments) the sentence “The code will be released later.” is not clear since it could mean that the code will be released before the paper is accepted, or that the code will maybe be released at some point in the future after the paper has been released. Since you indicated that your source code incl. used datasets will be released if this work is accepted please make sure to provide links to the code and datasets.

    7.) Section 3.4, page 8: The enumeration of the two strategies 1) and 2) should be highlighted via its own Latex enumeration (\begin{enumerate})

    8.) Section 3.4, page 8: The sentence “we compare with two …” should start with a capital ‘w’.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, this is an excellent paper with clear novelty aspects and impressive results. However, a few things can be improved as explained in my detailed feedback comments. In addition, the authors should add links to their source code and used datasets increase the reproducibility.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    The paper introduces LerPlane, a new method for rapidly and precisely reconstructing surgical scenes from endoscopic stereo videos in robotic surgery. LerPlane treats surgical procedures as 4D volumes and simplifies them into explicit 2D planes of static and dynamic fields. This contributes to the small memory footprint and much faster optimization. Experiments on DaVinci robotic surgery videos show that it enhances optimization by more than 100 times compared to EndoNerf, indicating considerable potential for future intraoperative surgery applications.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The readability of the paper is pretty good. Clear explanations and illustrations.
    • The proposed method which uses Triplane for accelerating the rendering process is technically sound.
    • The experimental section demonstrates both good results and the contribution of each component of the proposed method through a detailed ablation study.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Some small mistakes. Please check the space in ’ (Sec. 2.3)’ and keep the format unified in the section 2.1 overview paragraph. Two types, i.e. ‘Oneblob encoding’ and ‘One-blob encoding’ in the section 3.2.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The key point of this work is clear. The technical route of the design is also clear, and it can be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The approach combines ideas from neural scene representation, novel-view synthesis, coupled using a new spatiotemporal importance sampling strategy.

    • The idea of borrowing Tri-plane representation from EG3D[4] for dynamic scene reconstruction is interesting and novel. It is an improved version of EndoNerf. The paper shows improvement over the previous method that relies on RGB-D video sequence.

    • Results presented in Table 1 and Fig 1 demonstrate the proposed method achieves very competitive performances while keeping a faster training speed. The approach can generate occlusion masks during training

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The motivation of this study is attractive for intraoperative surgery. Besides, the technique contribution is OK for MICCAI.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes LerPlane, a novel method for fast and accurate reconstruction of surgical scenes under a single-viewpoint setting. The reviewers agree that the paper is well written and of good quality. The novelty of the proposed work is sufficient and the performance evaluation study strong. The authors should address the points raised by the reviewers to further clarify details about the presented work. To increase the reproducibility of the work, links to the source code and used datasets should be provide in the camera-ready paper.




Author Feedback

We appreciate the valuable feedback and acceptance recommendation from the reviewers and AC. The recognition of our novel methodology and the significant performance improvements of our system in fast reconstructing deformable tissues, compared to other state-of-the-art methods, is greatly encouraging. We provide detailed feedback to reviewers’ comments below.

R1: “Some small mistakes. Please check the space in ‘ (Sec. 2.3)’ and keep the format unified in the section 2.1 overview paragraph. Two types, i.e. ‘Oneblob encoding’ and ‘One-blob encoding’ in section 3.2.”

We will meticulously proofread the manuscript and correct typographical errors throughout.

R3: “The conclusion section is very short (8 lines of text only) and doesn’t include a discussion section to discuss things to be improved and/or future development.”

We appreciate the suggestion to expand the conclusion section. We will add more content to discuss potential improvements and future developments related to LerPlane, such as accelerating inference speed and reducing the requirements of endoscopy data.

R3: “There is no visual presentation of the quantitative study results in the form of data plots, which would make it easier to compare individual study components. Instead, all results are in Table 1 with acronyms like ‘Ours-NS’, ‘Ours-TS’ etc. that are not directly explained in the table caption. This makes it harder to understand the results.”

We acknowledge the importance of visualizing the quantitative study results to facilitate better comparison and comprehension. As a result, we will incorporate relative figures alongside Table 1 to provide a more comprehensive representation of our ablation results. Additionally, to enhance the clarity of the table, we will include explanations of the acronyms used and refer readers to section 3.4 for further details.

R3: “Availability of source code and datasets … The sentence should be changed to provide the reader with clearer information.”

In line with our commitment in the manuscript, the source code and datasets will be made available in the future. We are diligently working towards making the code public. Once the code is adequately reformatted, we will provide the relevant link.

Minor comments: (i) We will include the specific Ubuntu version (Ubuntu 20.04) in section 3.2. (ii) We will enumerate the key contributions in sections 1 and 3.4 for improved readability. (iii) A brief explanation of “w(t)” will be added before equation 2 to enhance comprehension. (iv) To improve reader understanding, we will label the fused feature vector as “v” in Figure 2. (v) The term “nearly infinite” will be replaced with a specific O-notation to describe the computational cost more accurately. (vi) We will rectify grammar and capitalization errors in section 3.



back to top