Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Charlie Budd, Jianrong Qiu, Oscar MacCormac, Martin Huber, Christopher Mower, Mirek Janatka, Théo Trotouin, Jonathan Shapey, Mads S. Bergholt, Tom Vercauteren

Abstract

Hyperspectral imaging (HSI) captures a greater level of spectral detail than traditional optical imaging, making it a potentially valuable intraoperative tool when precise tissue differentiation is essential. Hardware limitations of current optical systems used for handheld realtime video HSI result in a limited focal depth, thereby posing usability issues for integration of the technology into the operating room. This work integrates a focus-tunable liquid lens into a video HSI exoscope, and proposes novel video autofocusing methods based on deep reinforcement learning. A first-of-its-kind robotic focal-time scan was performed to create a realistic and reproducible testing setup. We benchmarked our proposed autofocus algorithm against traditional policies, and found our novel approach to perform significantly (p < 0.05) better than traditional techniques (0.070 ± .098 mean absolute focal error compared to 0.146 ± .148). In addition, we performed a blinded usability trial by having two neurosurgeons compare the system with different autofocus policies, and found our novel approach to be the most favourable, making our system a desirable addition for intraoperative HSI.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43996-4_63

SharedIt: https://rdcu.be/dnwQb

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    This work integrates a focus-tunable liquid lens into a video HSI exoscope, and proposes novel video autofocusing methods based on deep reinforcement learning. A first-of-its-kind robotic focal-time scan was performed to create a realistic and reproducible testing dataset. The authors benchmarked the proposed autofocus algorithm against traditional policies, showing that their approach performs significantly better than traditional techniques. In addition, the paper presents results of a blinded usability trial by having two neurosurgeons compare the system with different autofocus policies

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    a) This paper presents a well-motivated problem and describes in great detail the limitations of current HIS systems in terms of depth of focus as well as the rationale for the proposed approach.

    b) The authors substantiate the need for an autofocus system and present a data-centric approach based on reinforcement learning problems to tackle this problem

    c) The authors introduce two novel datasets for implementing and assessing HIS platforms, which could be a good contribution to this area if they are made public:

    Software Simulated Focal-Time Scans Robotic Focal-Time Scan

    Which were carefully constructed for i) enabling the training of models capable of generalization and ii) avoid any bias in the capture process of the different defocused images

    d) The method proposed by the authors is one of the first autofocus approaches for highly dynamic scenes from handheld devices in the literature, as previous works have been proposed mostly for static scenes

    e) Extensive experiments using two reinforcement learning policies using the proposed datasets are used to demonstrate the feasibility of the proposed approach

    f) Furthermore, the authors have also shown results of an integration and usability trial.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Some of my criticisms towards this paper are mostly concerned with a lack or more conceptual explanations of the ideas in this article, which might hamper the impact of the work. Some basic recommendations would be:

    Provide a more conceptual description of the overall proposal (a sort of graphical abstract,) in which the main idea is presented (i.e. combining images 1 and 2).

    A separate “solution model” (systems model), presenting the different contributions (i.e., datasets, types of reinforcement learning policies, either using the end-to-end model for learning the focal metric, as well as the main used hyper-parameters) could help to better understand the battery of experiments conducted by the authors, which is rather large.

    The hyper-parameter tuning is not properly discussed.

    The authors use Gaussian blur does not necessarily cover realistically all possible defocus scenarios. Maybe the authors can elaborate in the limitation of using such an approach and better justify the use of Gaussian blur.

    It could be interesting if the authors focus more on the contributions as the main point, as too many things are going on and the paper is often hard to read. This implies that the paper is rather vague and tha main message is lost.

    In the conclusion you mention that the method provided significant improvements over other methods. However, from Figure 4, it is hard as a reader to draw any conclusion about the superiority of the model. Autos are advised to produce some more quantitative and qualitative analysis to unequivocally support these claims.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Although the results could be easily reproduced, I think the authors do not intend to make the dataset public, as they are solving a well-defined task the necessitates of the hardware and might be too much constrained to the specific application detailed therein. Nonetheless, the community could benefit from following similar approaches.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I believe that this is a good paper. However, I think that some of my remarks on the weaknesses can make it better (along with those of fellow reviewers). For instance, graphically describing the main contributions and the experimental design would make the paper easier to follow.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the paper is a solid contribution and it could be a good paper for MICCAI. However, the writing and organization of the paper could be improved to simplify certain aspects and enhance the readability of the paper.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors combined an autofocus system with a hyperspectral imaging (HSI) system. The autofocusing technique is based on deep reinforcement learning (RL), which can be applied to the dynamic environment.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Introducing the reinforcement learning method to the handheld real-time HSI is very interesting especially in the dynamic environment. Robotic and simulated experiments also show that the improvement compared with tradition method is significant. Besides, the writing of this paper is good, and some basic concepts are also described clearly.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Motivation of using reinforcement learning to HSI is not well introduced. Though the authors are trying to tackle the decision problems in dynamic scenarios, which may be suitable for RL. The authors should at least discuss some similar type of technologies, e.g., the naive RNN, some classification or regression-based solutions, etc. Details refer to Q9.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    If the authors will release the code (“not applicable” in the checklist now), I think the reproducibility of the paper is good.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Is this the first that applying a RL method to equip with the HSI and achieve video autofocusing? If no, what’s the main difference?

    I suggest that maybe in the journal version, the authors can consider more CNN or other deep learning-based method to compare with RL. This can better illustrate that why RL should be used in this task.

    In Table 1, only one metric (MAE) is provided, the author should consider more multi-dimension metrics to better compare different methods.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Good writing, interesting idea and moderate novelty.

  • Reviewer confidence

    Not confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    I think the authors’ rebuttal has solved all my concerns. I recommend to accept this paper.



Review #4

  • Please describe the contribution of the paper

    The authors have implemented a Deep Q learning RL network to autofocus the intra-operative video captured by their Hyperspectral imaging (HSI) system.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1/ The proposed problem is quite interesting. As mentioned in the manuscript, there are not many previous work focus on the same problem.

    2/ The language of manuscript is in general well written.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1/ Uncertainty about impact. The proposed method is to perform the autofocus automatically. I am sure the HSI system can be manually tune to focus too. I am not sure how challenging is it for a human operator to focus it? The authors did not present any data to illustrate the need for the solution from this prospective.

    2/ Lack of novelty. The authors just implemented a standard DQN to implement the RL framework. This is a very standard practice. They also implemented quite a simple CNN for the image encoding, which is a common practice too. I failed to see the technical novelty from this article.

    3/ Performance evaluation. The authors claimed that they perform a blinded trial with two neurosurgeons. However, the presented comments seem very subjective. I am sure how systematic those evaluations were conducted with the human experts.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors did not indicate that they will share their codes or data if their work may get accepted.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    1/ Refer to question 6

    2/ For the reader study, I believe there are systematic way to conduct a proper reader study to allow one to formally evaluate the acceptance and also the efficacy of the proposed solution.

    3/ I think it is only fair to showcase the impact of the propose solution by comparing its performance to a manual focus process.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Lack of technical novelty and unclear clinical impact.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes a video autofocus method based on deep reinforcement learning (RL) which is integrated into a handheld Hyperspectral Imaging (HSI) system for visualisation of dynamic scenes in neurosurgery. In addition, HSI datasets has been generated for testing and validating the proposed platform. The reviewers agree that this is an interesting work. According to R1, the writing of the paper could be improved by including details about the methodology and the experimental design. R2 suggests that the motivation of using reinforcement learning in the proposed framework should be highlighted. Also, the performance evaluation study could be strengthened by including more validation metrics (besides the MAE) in the comparison study. R3 raised concerns about the technical novelty and clinical impact of this work.




Author Feedback

We thank the reviewers for their insightful comments and are pleased they found our paper “presents a well-motivated problem” and makes “a solid contribution”. We would particularly like to address comments concerning clinical relevance and technical novelty, being of upmost importance to publication in our field.

R4 questions the clinical relevance, suggesting manual focus could be sufficient but admits lack of confidence in this statement. They go on to say a comparison to manual focus should have been included. Our introduction remarks that our existing system was “…challenging to focus, posing significant usability issues.” We agree that this could be elaborated on. Discussions with surgeons have indicated that manual focusing is difficult during surgery, and systems would often be left at a fixed focal depth. We also present qualitative remarks from the usability trial, stating “…participants were positive about all presented policies.” and “…surgeons were very positive about the integration of AF…”. We acknowledge a quantitative comparison would support these claims, however comparison of focusing techniques for dynamic video is non-trivial. Our existing comparison relies on a pre-recorded focal-time scan, a key novelty of our submission. It is unclear how manual focus would be achieved in this framework, but we do include results for fixed focus.

R4 questions the technical novelty, stating that DQNs and CNNs are established techniques. R3 asks whether similar RL techniques have been applied to video AF in the past, and what differentiates our method. To answer R3, to the best of our knowledge, no application of RL AF to dynamic video exists in the literature. The closest related work we found applies a similar technique to bench-top microscopy (Xiaofan Yu et al.). They use a CNN to encode a stack of the last 3 images, which is fed into a DQN with the last 3 actions. In this light, we agree with R4 that the technical novelty of the formulation and architecture is modest. However, as pointed out in the contributions section, our method deviates to handle our novel and much more challenging scenario. Spatially varying depth necessitates using a small image patch to allow selective focusing. This, combined with a dynamic camera and subject, obfuscates the defocus signal. As such, a longer stack of 8 frames was needed. This hindered the convergence of the DQN, which was fixed using individual image encoding. Finally, we provide novelty in our simulated, and robotic focal-time scans.

On the presentation of our results, R2+3 request more metrics/analysis be presented and R2 says that Fig 4 is difficult to draw conclusions from. Fig 4 is intended to show the dynamic nature of the problem and how this leads to errors. We will attempt graphical changes to make it clearer, but for comparison of methods we would refer to Table 1. We agree more analysis could be presented. A nice addition would be to present the % of in focus frames under some threshold. If admissible, this can be added to Table 1 without increasing the length of the paper.

R3 suggests that the text should explore alternatives to RL and better motivate its usage. The decision to use RL stems from the need to minimise focal error whilst ensuring there is sufficient information to make decisions. RL is a natural way to incorporate this trade off by allowing exploratory steps in the focal power, even if harmful in the short term. Such a statement can be included in the text, but a full discussion of alternatives may not be possible.

R2 points out that Gaussian blur is not always realistic to simulate defocus. While our use of real testing data shows that the methods generalise well, we agree methods such as “Generalized Gaussian Blur Kernels” (Yu-Qi Liu et al.) may provide more realistic simulated data. We can include a short limitation statement referencing this work.

Once again, we thank all those involved and hope this rebuttal has cleared up some concerns.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors responded adequately to the reviewers’ comments. One of the reviewers increased the rating. Although the technical novelty of the presented method is not significant, this is an interesting work, worth presenting at MICCAI. The authors should enhance the camera ready paper following the reviewers’ suggestions.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have adequately addressed the reviewers’ comments. There is sufficient value in the paper for acceptance to MICCAI.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Overall the paper is interesting if not totally clear regarding the clinical motivation. Nonetheless, the rebuttal attempts to address questions on methodological clarity and additional information, including some results to be added to the manuscript that would shift this in the direction of an accept for me.



back to top