Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yuna Kato, Mariko Isogawa, Shohei Mori, Hideo Saito, Hiroki Kajita, Yoshifumi Takatsume

Abstract

Occlusion-free video generation is challenging due to surgeons’ obstructions in the camera field of view. Prior work has addressed this issue by installing multiple cameras on a surgical light, hoping some cameras will observe the surgical field with less occlusion. However, this special camera setup poses a new imaging challenge since camera configurations can change every time surgeons move the light, and manual image alignment is required. This paper proposes an algorithm to automate this alignment task. The proposed method detects frames where the lighting system moves, realigns them, and selects the camera with the least occlusion. This algorithm results in a stabilized video with less occlusion. Quantitative results show that our method outperforms conventional approaches. A user study involving medical doctors also confirmed the superiority of our method.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43996-4_26

SharedIt: https://rdcu.be/dnwO0

Link to the code repository

https://github.com/isogawalab/SingleViewSurgicalVideo

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose an automated system to tackle current challenges of video recording setups during open surgery, namely the issues of occlusion, instability during capture, and variations in surgical lighting conditions – an important area to address as data-driven algorithms continue to develop. The system described detects video frames where the lighting system moves, realigns the frames, and selects the camera with the least occlusion to continue recording video. Quantitative results demonstrate improvements in video stability and reduced occlusion versus conventional approaches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors present a novel solution to the automatic generation of stable virtual single-view video with minimal occlusion using a setup with five cameras installed in a surgical light. An algorithm is suggested for detecting the timing of camera movement based on the degree of misalignment between cameras and level of occlusion between frames. A qualitative evaluation was conducted with eleven experienced surgeons viewing the generated auto-aligned video, as well as a quantitative assessment. Surgeons indicated the surgical video caused no discomfort or fatigue and reported ease of viewing key anatomical structures.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The algorithms introduced to detect camera movement and movement timing are not new techniques in computer vision (SIFT feature extraction and feature point matching, and planar homography). Simplistic skin detection approach based on absolute image intensity.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No software is presented. Authors include supplementary video showing performance of their approach and details sufficient for software implementation.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Does the intensity selection for delineating skin color need to be adapted for each individual patient? All of the scenes presented in the supplementary data appeared to be roughly planar and perpendicular to the camera view pose with minimal camera movement. Please comment on the robustness of this approach to more complex surgical scenes and varied orientations of the overhead surgical lighting during a procedure. By relying solely on skin color based on a value range in the HSV color space, I would expect that blood, smoke, etc. could interrupt the framewise point correspondences of your approach. Please comment on the stability with external noise such as this. Please add information on who performed the manual alignment of video frames which was used as a baseline for comparison.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Interesting paper and relevant to the collecting of surgical data for further analysis/observation. Would benefit from additional details or discussion of limitations of current algorithm and approach. Well written and rigorous evaluation with a population of surgeons and against a benchmark dataset.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    Thank you to the authors for your responses to our suggestions. I believe this paper fits well within the scope of MICCAI submissions (focus on feasibility and initial evaluation of a novel idea to address an overlooked clinical workflow requirement) and includes a robust evaluation of their algorithm on real data. In their rebuttal, the authors mention that source code will be made publicly available to facilitate future collaboration. I look forward to future work from this group in the collection of stable and non-occluded surgical video.



Review #2

  • Please describe the contribution of the paper

    The authors propose a method to select the best view of a surgical scene from a number of cameras. The motivation for this is to enable surgeon to record/review their cases and select the best view of the anatomy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper identifies an interesting and likely overlooked problem. The results are very clearly analyzed and appears to provide a meaningful improvement over the alternative methods.

    The paper is also generally well written, clearly laid out and very easy to understand.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There is insufficient details on the methods to detect the homography transform and the selection of the best view. Although these are cited papers, it would be helpful to have further details on this area so that some understand can be gathered without having to look up the paper.

    I also question the broad applicability of the paper, although it identifies a clear problem I wonder how broad the use case is. Some numbers/figures on how routinely these multi-camera are used would help motivate how much of a big problem this is and how likely this solution could be used in practice.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code+data is released so reproducibility seems fine.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The biggest improvements to the paper would come from additional details. There are some areas in the methods section where additional details would be helpful.

    It would also be useful to have more details on the experimental setup, such as how the lights are positioned, where the cameras are positioned on the lights, whether the gloves, positioning of the surgical team, patient etc can impact the method and results.

    I also believe that a comparison vs head mounted cameras would have been very interesting.

    The use of hue to select the skin color also seems very fragile and i imagine it would fail regularly if there were changes in the lighting, skin color of the surgical team etc. The robustness of this method should be evaluated and i really recommend a machine learning method be use as a replacement here.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The main reason for my weak reject is that the method addresses a solution to a problem that seems fairly narrow in scope so I question the impact for a conference such as MICCAI.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    Motivated by the need for stable and occlusion-free single-viewpoint surgical field videos during open surgery, and recent state of the art methods that propose a multi-camera setup mounted on a surgical light, the authors present a multi-camera autocalibration concept. In particular, the author’s contributions are the following: 1.) A novel automatic generation of stable and occlusion-reduced surgical virtual single-view video from a surgical light equipped with multiple cameras. 2.) Two novel algorithms: One that detects the timing of camera movement, and another one that identifies frames with reduced occlusion. 3.) Quantitative and qualitative evaluations of the proposed method that involve comparison against conventional baseline methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Overall, a very good paper with well thought out sections (incl. sub-sections), algorithmic descriptions and detailed description of the conducted experiments and respective results.
    • This paper appears to have a sufficient novelty aspect that justifies its publication. The generation of stable and occlusion-reduced single-view surgical virtual videos seems to be a very useful aspect that could facilitate the further establishment of multi-camera surgical lights in surgical procedures.
    • Good supplementary material (surgical video with baseline methods comparison) and additional experimental results presented in a single pdf.
    • The presented algorithms are easy to understand and therefore more likely to be reproduced by other researchers in the future.
    • Very good ‘Experiments and Results’ section that covers quantitative and qualitative aspects and demonstrates the method’s utility.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Even though it is not required to provide the source code, a link to a github project or similar would have been a great addition to this paper in order to enhance reproducibility.

    • One could maybe argue that the Methods section is rather short with roughly two pages. It seems that not all relevant details are mentioned which could reduce reproducibility. For example, in section 2.1, page 3, the authors describe that the SIFT algorithm is used for feature point detection in each frame, and that feature point matching is performed for each of the 10 combinations of the five frames. Maybe it could make sense to describe some more details, via additional mathematical equations or just by adding these missing details to Fig. 1 (or a separate figure).

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • Availability of source code: The fact that there is no source code provided decreases the reproducibility of the proposed method.

    • Implementation details: Since the mathematical equations presented are fairly easy to understand and follow, reproducibility seems assured, at least to some extent. Nevertheless, reproducing the exact same work as it is presented in this paper might still be challenging since important implementation details regarding a fully functioning pipeline are hidden. For example, in section 2.1, page 3, it is mentioned that the SIFT algorithm is being used for feature point detection, followed by feature point matching ‘for each of the 10 combinations of the five frames’. Explanations like that could lead to more room for interpretation.
      In addition, the used programming language and libraries are not mentioned. On a positive note, the used OS incl. version as well as CPU and RAM details are listed.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    There are a couple of minor things that could be improved:

    1.) Maybe add an image that shows a surgical light with the multi-camera setup.

    2.) Section 1, page 2, last paragraph: Instead of listing the contributions ( (1) - (4)) as part of the standard text, use a Latex enumerate command to list each individual contribution.

    3.) Section 2.1, page 3, the following sentence that starts with “Assuming that the multi-camera surgical light never moves more than twice in 10 minutes, …”: Even though it is mentioned that this assumption the result of an empirical evaluation, it should be addressed in the discussion section of the paper because it could be a limitation of the presented concept. What if the camera does move more than twice in 10 minutes? Will the method still work, or would it fail? -> Please discuss this.

    4.) Section 2.2, page 4, the equation for S: In order to be consistent with the presentation style of the other equations, it could make sense to have S = {max_i(s_i) - …} as a separate equation, like D_t (equation 1) and threshold (equation 2). The threshold condition could be part of this newly added equation. This would improve readability.

    5.) Section 2.2, page 4, sentence “…, and if it is continuously below a given threshold …” → Please explain what “continuously” means in this context. Does it mean that S is below 0.5 for a specific period of time maybe? This should maybe be more precise.

    6.) Section 3, page 5, paragraph “Virtual Single-View Video Generation”: There is a typo in the second line of this paragraph: “frome” should be changed to “from”.

    7.) Section 3.1, page 6: in the sentence that starts with “In contrast, Our method …” the capital ‘O’ should be replaced by a lowercase ‘o’.

    8.) Section 4, page 8, the sentence that starts with “Our method currently selects the t_h based only on …”: Only using the variable name t_h in the conclusion section without describing what it means decreases readability here because t_h was defined on page 4. Maybe mention again that t_h is the timing when to obtain the homography matrix, or refer to section 2.2 like this: “Only using the variable name t_h (defined in section 2.2) in the …”.

    9.) As already mentioned under the weaknesses of this paper, it could make sense to be a bit more specific in terms of describing some of the methods, like the SWIFT based feature point detection and matching. I’d recommend describing these additional details via mathematical equations or adding them to Fig. 1 (or a new separate figure).

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, it is a very good paper with a strong motivation and interesting contribution. Providing the source code via a github repo (or similar) would have increased reproducibility. In addition, there are a couple of minor things to be addressed (as mentioned in my detailed comments for the authors).

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    After reading the other reviewer’s commens and the author’s rebuttal, my overall opinion slightly changed from “accept” to “weak accept”, because I realized that some more technical questions that should be addressed by the authors. I think the authors did a good job in addressing the reviewer’s comments in their rebuttal, but the final paper should really clarifiy some of the questions. In addition, the authors state that thes will make their code public which will increase reproducibility. I am not sure if the authors will have time to make their code public before submitting their revised version of the paper, but even a preliminary github repo that is still under development could be a valuable asset when it comes to enhancing reproducibility. However, in this case the authors should clearly indicate that their code is preliminary and still subject to changes.

    The reviewer’s concern regarding the limited technical novelty might be justified, but using existing algorithms in a novel pipeline setup in order to solve a surgical routine problem seems to be valuable. Especially when it comes to a potential transition to clinical practise. As a professional software engineer who worked in medical environments with the highest level of FDA regulation (PCR Tests), I can confirm that it is a wise decision to try simpler algorithmic solutions first before addressing more complex machine learning concepts. I think the authors addressed thatr sufficiently in their rebuttal.

    If the authors address all of the main reviewer questions and fix some of the minor issues that improve readability I recommend acceptance.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper presents a method for automatic generation of stable virtual single-view video with minimal occlusion using a setup with five cameras (integrated into a surgical light) within the context of open surgery. The authors were motivated by enabling surgeons to record/review their cases selecting the best view of the anatomy and performed a study with 11 surgeons.

    The reviewers agree about the novelty of the described method as well as the clarity of the analysis of the results. However, the authors should address the comments of the reviewers particularly in terms of clarifying the methodology and commenting on the robustness of the method (e.g. with respect to skin colour and blood, lighting changes, etc.).




Author Feedback

We are grateful that all the reviewers found our work “addressing an important area,” “novel” (R1), “providing a meaningful improvement” (R2), and “facilitating the establishment of multi-camera surgical lights’” (R3).

  1. Popularity of multi-camera surgical lights? (R2) The multi-camera surgical light is a prototype tailored for scientific research and was developed by medical doctors and surgeons with high expectations for open surgery [12]. The device comprises normal surgical light and multi-view cameras (e.g., Fig. 2 in [12]). It has been used in many actual surgery cases over three years. Therefore, the results in the literature and ours will impact future research and products open to the community. Our approach and results are of the MICCAI’s wide interests.

  2. Skin colors and vulnerability (R1, R2) We found that the word skin color was misleading. We used colors to separate the surgical field from the other areas. In the case of open surgery, the surgical field is often warm-colored, while the rest is cool-colored. We took the simplest approach (i.e., thresholding in color space) firstly to complete our pipelines since the segmentation task is not our main focus. Although we also anticipate that other machine learning-based approaches perform better than ours, ours is proven to perform enough for our test cases. We believe the lighting under surgeries does not vary as much as in daily images because of the controlled environment. Our real data results demonstrate that our simple approach is robust enough under the lighting changes in our real dataset. As the reviewers point out, more challenging cases (e.g., extreme blood, smoke, etc.) would induce failure cases. However, those cases are exceptional in our collected datasets. We will discuss such limitations. In our preliminary tests, we tried out the SOTA machine learning method [12]. However, we found it impractical as the method requires prior annotation per surgery and a large-scale dataset for performance supervision.

  3. Missing method and implementation details (R2, R3) Although we kept the descriptions minimum especially for the methods borrowed from existing work (e.g., Homography estimation and best view selection among five views), as the reviewers suggested, we elaborate on the descriptions for a better introduction to these methods.

  4. Flat operative fields and perpendicular cameras (R1) The surgical areas may have appeared flat. However, as a fact, they are not flat, considering the patient’s thickness. The cameras are mounted on the surgical light, which is located and lighting from above the patient, and never used to light from the side.

  5. Motion within 10 minutes (R3) The proposed method does not work if the light moves more than twice within 10 min, or images are misalignment meanwhile. We empirically determined the interval to 10 min and used it for all the videos. Although we rarely saw cases where surgeons moved the light more often, fine-tuning the parameter may result in further performance improvement. The revised paper will discuss the limitation.

  6. Continuously? (R3) As the reviewer assumes, our method detects that the surgical area is occluded if the S (the detected surgical area) is below 0.5 for a specific period of time. We will clarify this point in the final manuscript.

  7. Limited technical novelty (R1) Our technical contribution is to propose a new task setting and novel pipeline. As R1 pointed out, we introduce existing algorithms to some specific tasks (e.g., detect camera movement and movement timing). However, solving individual vision problems is not our major contribution.

  8. Who performed the manual alignment? (R1) A technician with a doctoral degree in our University’s Faculty of Medicine performed it.

  9. Misc. We will revise the minor issues (R3). We also add an image of surgical light with multi-view cameras to show a better overview (similar to Fig. 2 in [12]) (R2, R3). We will make our code public.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I believe that the concerns have been adequately addressed by the authors (the ratings from the reviewers have also increased) and that the paper can be accepted and would be of interest to the MICCAI community.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors responded adequately to the reviewers’ comments. The authors should enhance the camera ready paper following the reviewers’ suggestions. Two of the reviewers agree that this paper would fit well in the clinical session of MICCAI.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper addresses a highly relevant and under researched topic. The authors have responded well to major concerns of the reviewers. Authors should incorporate the all feedback (including post-rebuttal feedback) in the camera ready.



back to top