Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Dimitrios Psychogyios, Francisco Vasconcelos, Danail Stoyanov

Abstract

Expanding training and evaluation data is a major step towards building and deploying reliable localization and 3D reconstruction techniques during colonoscopy screenings. However, training and evaluating pose and depth models in colonoscopy is hard as available datasets are limited in size. This paper proposes a method for generating new pose and depth datasets by fitting NeRFs in already available colonoscopy datasets. Given a set of images, their associated depth maps and pose information, we train a novel light source location-conditioned NeRF to encapsulate the 3D and color information of a colon sequence. Then, we leverage the trained networks to render images from previously unobserved camera poses and simulate different camera systems, effectively expanding the source dataset. Our experiments show that our model is able to generate RGB images and depth maps of a colonoscopy sequence from previously unobserved poses with high accuracy.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43996-4_51

SharedIt: https://rdcu.be/dnwPw

Link to the code repository

https://github.com/surgical-vision/REIM-NeRF

Link to the dataset(s)

https://arxiv.org/abs/2206.08903


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposed a variant of NeRF for learning implicit 3D scene representations from colonscopic image sequences. Especially, the newly introduced NeRF models the variation in tissue illumination with respect to the moving of endoscopic light source by conditioning the implicit representation on the position of light source. The implicit 3D representation is used to render more realistic colonscopic images with new viewpoints or different camera systems, in which the artifacts caused by varied illuminations in training data are avoided.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper shed lights on the varied tissue illuminations caused by moving endoscopic light source and their affects on the 3D reconstructions of colonscopic scenes. A variant of NeRF is proposed to achieve the realistic 3D scene reconstruction by conditioning implicit representation on the light source positions.
    • The new NeRF is verified to be able to render more photorealistic and artifact-free images than normal NeRF, which should be usable for generating realistic colonscopic video data with novel views or different cameras.
    • The paper is easy-to-follow with a clearly explained methodology and well-organized structure.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The distance or disparity between the camera trajectories used to generate training and validation data is not mentioned. It wouldn’t be surprising if the validation video looked realistic if the difference between the two was small. It would be better to reveal the PSNR and SSIM verification results at different levels of trajectory disparity (such as 70%, 50% and 30% overlaps).
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The definitions of t_i and i are ambiguous in Sec 2.1.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The definitions of t_i and i are ambiguous in Sec 2.1.
    • The data and ablation study about the overlaps between trajectories for generating training and validation data.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I am inclined to accept this article, considering that it illustrates a relatively novel method for solving a more practical problem: reconstructing more realistic 3D colonscopic scenes from video. The paper is well-organized and clearly clarified.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    NeRFs are reconstructed from colonoscopy videos and then used to generate additional training data with novel trajectory. The main contributions are the usage of depth supervision to obtain good reconstructions also from few views, and the idea to also add the current position of the endoscopic light source to the NeRF input, in order to handle varying illumination properly. The impact of these two contributions is examined in an ablation study, and qualitative results are shown.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Well written paper, easy to understand, with clear contributions
    • Simple, but useful idea to condition the NeRF also on the endoscopic light source
    • Nice ablation study
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The magnitude of the contributions is quite low, depth supervision is a known idea, and to condition the NeRF on the light source position is a rather straightforward idea
    • The improvements by considering the light source position are small in terms of numbers,. The accompanying video shows that also NeRF can reproduce the reflections pretty well. The depth supervision proabably has a larger impact. It would be interesting to analyze this behavior.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper is well written, and all necessary detail is provided to reproduce the results and the evaluation.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • In the introduction you mention that input are sparse depth maps, but according to Equ. 4, supervision happens with dense depth maps. Probably the second D_pi in this equation should be C_pi? Where do you get this depth map from, either sparse or dense?
    • The light source is co-located with the camera, but the fact that the light position is fix relative to the camera is not used. Maybe it would be even enough, not to condition the light field with the (normalized) view direction, but with the vector to the camera instead, or to include the view distance to get proper light attenuation? May also an image coordinate would be needed then?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The presented ideas make perfect sense, but are also quite straightforward. The approach is described well and easy to understand. The evaluation is fine, but also reveals that the impact of the knowledge about light position is not enormous, and that even plain NeRF can reconstruct glossy effects well. Nevertheless, I think it makes a reasonable contribution and should be considered for publication.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The purpose of this paper is to generate additional 2D data through NeRF modeling in endoscopic imaging. The method presented in this paper conditions illumination using light source location, and incorporates depth supervision in addition to RGB supervision to address textureless colon walls. The experiments demonstrate that the proposed method produces realistic 2D data when rendering novel views.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The generated 2D images are of high quality both in visual effects and numerical metrics.
    2. The structure is well-organized and coherent.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper lacks a description of the data source of the depth information used for supervision. Furthermore, the utilization of depth supervision in NeRF has been previously proposed in a prior work: Deng K, Liu A, Zhu J Y, et al. Depth-supervised nerf: Fewer views and faster training for free[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 12882-12891.

    2. The contribution and novelty is limited. The depth supervision has been proposed in prior work, as stated above. The modeling of illumilation is merely an addional input of camera source location.

    3. While the generated images are realistic, the practical applications of the generated data remain unclear. To further strengthen the contribution of this work, it would be worthwhile to evaluate the method as a data augmentation technique. This would provide insights into the potential usage of the generated data and also demonstrate the usefulness of the proposed approach for practical applications.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper describes the training details of the methods, but miss the description of the data source of the depth information used for supervision. It claims the the data and code would be released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I suggest that the authors could strengthen the paper by incorporating the proposed method into a data augmentation technique.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My rating is mainly based on the limited contribution of the paper.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes a deep learning framework for generating realistic colonoscopic scenes by fitting Neural Radiance Field (NeRF) networks in already available datasets. The proposed method conditions illumination using the location of the light source and incorporates depth supervision in addition to RGB supervision to address textureless tissue surfaces. All the reviewers agree that the paper is clear and easy to follow. R2 and R3 found the novelty of the work not significant. R1 suggests that the performance evaluation can be strengthen by validating the method for different levels of overlap between trajectory positions (such as 70%, 50% and 30% overlaps). According to R2, it would be interesting to validate how the incorporation of illumination conditioning and of the depth supervision improve the performance of the method. As R3 suggests, the potential usage of the generated data should be explained and the usefulness of the proposed approach for practical applications should be demonstrated.




Author Feedback

We thank all reviewers and the AC for evaluating our work and for allowing us to address their concerns. We were pleased that the reviewers found the paper well-written and intuitive.

a) Technical novelty: Both R2 and R3 raised concerns about the novelty of our paper because depth supervision has been introduced in the past. They noted that our contribution of conditioning NeRF based on the approximate light source location is limited. Indeed, our work did not introduce supervising NeRFs using depth, and we point the reader to the paper that introduced the concept with ref. [7]. However, while depth supervision improves the learned geometry, it can also introduce artifacts in the reconstructed RGB endoscopic images (Fig. 3.c). The extension we introduced to NeRF, although simple, improves the overall learned representation in endoscopic scenes (Tab. 1). Our work shows that combining our novel extension to NeRF with depth supervision, results in a network able to render precise depth maps and images without prominent artifacts in endoscopic scenes.

b) Further evaluation experiments: We appreciate all the suggestions for more detailed ways to evaluate our method.

  • R3 is concerned about the applications of our research and would have liked to see a study investigating if data generated from our method can be used for training. We understand R3’s viewpoint, however, due to the page limit, we could not include an evaluation of this kind without removing essential parts of our paper. Instead, we showed that when our method is exposed to less than 20% of the RGB images of C3VD and only 3% of pixel depth coverage(Sec. 3.1 & Eq.5), was able to reconstruct the remaining 80% of data with full-depth coverage in high accuracy (Tab. 1, full model & Fig. 3.e), which shows the applicability of our method for training data generation. Additionally, our method can be utilized for generating VSLAM evaluation data, facilitating pose estimation assessment in pre-specified trajectories. Data generation of this nature is already supported by our codebase and was employed to create RGB images in Fig. 4.
  • R1 raises concern about NeRF’s image reconstruction when trained with low pose overlap and suggests evaluating reconstruction metrics for varying pose overlaps (30%, 50%, 70%). In our experiments, pose overlap is slightly below 20% since, as we sample one every five frames to create both training and evaluation sets (Sec. 3.1). This falls below the lowest suggested pose overlap value recommended by R1 and therefore is a more strict pose overlap test.
  • R2 is interested in a study assessing the contribution of depth supervision and conditioning NeRF based on the light source location. Sec. 3.3 presents both qualitative(Fig. 3) and quantitative(Tab. 1) evaluations of the contribution of each of those two components. Depth supervision improves geometry but introduces RGB artifacts, and our NeRF extension eliminates those artifacts while slightly improving scene geometry. We show that when both techniques are combined, NeRF can learn a good representation in endoscopy.

c) Clarity: R1 pointed out an ambiguity in the definition of t_i and i in Sec. 2.1 and R2 highlighted a typographical error in Eq.4. We thank both R1 and R2 for their observation and we will refine relevant text in the next version of the paper. Finally, R3 expressed curiosity regarding the data source and about details relevant to data preparation. Across all our experiments we used the open-source C3VD dataset (ref. [4]). Information regarding the dataset and pre-processing can be found across Sec 3.1 and 3.2 while depth sampling from C3VD for depth supervision is defined by Eq.5

Overall, we again extend our thanks to the reviewers and AC for their valuable feedback




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors responded to most of the reviewers’ comments. However, the novel technical contribution of this work is not significant and the potential usefulness of the proposed approach for practical applications has not been demonstrated adequately in the paper.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper applies NeRF on colonscopic video in term of addressing variation in tissue illumination. The idea is interesting and research topic is clearly novel to the field. Two reviewers are positive and one reviewer is negative but with relative lower confidence. The meta-reviewer thinks the raised issues of R3 has been sufficiently addressed in the rebuttal and can be easily clarified in the final version.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This submission gives a NeRF adaptation for reconstructing implicit 3D scene representations from colonscopic images, by incorporating illumination model conditioning. 3D reconstruction of colonoscopic and other endoscopic images is an important and open research objective in CAI. R1 and R2 are in favour of acceptance. R3 and R2 state limited novelty (I agree the work is an incremental improvement), and R3 states unclear value of illumination conditioning (the main technical contribution) on down-stream tasks. As NeRF-based reconstruction is becoming more common, it is likely that exploiting the specific nature of the illumination in MIS images, should be beneficial. At the same time, the benefits have not been very convincing in this study, and marginal (no statistical significance given) and probably not enough to see a meaningful gain in various CAI-related down-stream tasks e.g. 3D measurement, inter-modal registration, simulated ML training, or improved simulated surgery. I am therefore inclined to recommend rejection, and recommend that the authors focus on quantifying technical, and importantly clinical value of this technique specifically for CAI systems and applications.



back to top