Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Adam Schmidt, Omid Mohareri, Simon DiMaio, Septimiu E. Salcudean

Abstract

Tracking points in robotic assisted surgery will help to enable models in augmented reality and image guidance applications. For these applications, both speed and accuracy are critical. Current dense convolutional neural networks can be costly, especially so when we only desire to track user defined regions. Faster methods use keypoints and their movement as a way to estimate flow in an image. In this paper we introduce a recurrent implicit neural graph (RING) which estimates flow efficiently. RING interpolates the flow at any selected query points with a implicit neural representation (also known as coordinate-based representation) that takes the surrounding points and history of the tracked (query) points as input. RING is able to track an arbitrary number of image points. We demonstrate that RING estimates point motion better than methods that do not use a state. We evaluate RING both photometrically and using ground truth depth data. Finally we demonstrate RING’s real-time effectiveness in timing experiments.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_46

SharedIt: https://rdcu.be/cVRwx

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The authors present a novel graph based method to track an arbitrary number of key points through a video sequence. It is designed to cope as different obstacles are introduced into the scene, and to track foreground and background objects without requiring explicit segmentation steps.

The method appears to be accurate and fast.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

A method that is adaptable to varying number of points over time.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

None as such. Im a little concerned that the standard error of pixel tracking errors seems almost too small to be true? Might be worth checking.

There seems to be very little explanation of what the method is for, or ultimately what the author hopes it will be suitable for.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Good. The code is even already available in a github repo.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

I have to admit to being a little out of my depth here. The paper is generally clearly written, with lots of technical detail.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

First use of graph methods that I’ve seen in this medical field.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Somewhat Confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper
- The paper presents a self-supervised method for estimating dense pixel flow in endoscopic videos.
- The main contribution compared to prior work is the addition of a temporal component which tracks deformation over time inside a recurrent neural network.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The method is a novel combination of Graph Neural Networks to parse keypoint information, an attention mechanism to refine displacements, a recursive network to carry information to the next images and a sampling strategy to turn the encoded sparse displacements into a dense output.
- The result works well on datasets that were not part of the training data (generalization is shown for datasets collected at different sites).
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

My main concern is that I found the Methods section quite hard to understand. This is in part because the method is complex and requires many different concepts like Graph Networks, Recurrent Networks, positional encoding, Attention… and the MICCAI format is very limited in space. I assume reference [1] explaines some of these in more detail, but it was anonymized for the review. However, even so, I think the Method section could be made much more readable by slightly changing some sentences (and often the order of sentences). At multiple points, it was not clear to me whether a new concept was being introduces or the previous concept was refined further (some examples in the details below). Similarly, some concepts are explained at one point in the text and the corresponding equation comes multple sentences later, making it hard to follow. Overall, I think the Methods section should be reworked to ensure a clear flow through the paper, introducing one step after the other.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The major components are explained, but reproducing this work without access to the code would be extremely difficult. The system is so complex that it is difficult to describe in such a short paper. Even so, the authors do make a strong effort to describe as much of the system as possible, and many parameters and setup details are given.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
I don’t quite understand the sentence “Each of these layers can be thought of as a new initialization: no weights are shared between each ϕ or γ.” Does this mean there are multiple γ? Or does ‘each’ only refer to the ϕ? In this case, this may be cleared up by writing “… are shared between ϕ_1, ϕ_2 and γ”? Maybe the word “initialization” could also be replaced, since this already refers to choosing initial weights in the context of neural networks?

“with the difference being that we add in a relative positional embedding to let the network select based on position.” I did not understand this sentence, because Equation (1) uses the absolute pixel positions p_i and p’_i (and the distance between them), but this sentence instead mentiones “relative” positions. Can you explain what you mean by “relative positions”? To what are they relative, and how are they used exactly? Or is this refering to the p_i - p_j in Equation (4)? In this case, maybe you can move this sentence to the next paragraph?

I assume the gamma in Eq. 1 and Eq. 5 are not the same? If so, could you use a different letter?

“We set the base offset to be the barycentric estimate, helping similarly to how a skip connection helps learn the residual in CNNs. We perform barycentric interpolation on the Delaunay triangulation of the refined neighbor node displacements” -> I had to read this a few times. Maybe you could start by saying that the base offset is set to an interpolation of the refined displacements in the neighborhood, and then go on to saying how it’s done (via barycentric interpolation)? Otherwise the two sentences sounded to me like you were performing two different steps.

“The information at the query point is broadcast to each of these neighbors, run through two graph convolutions and then pooled.” and later: “We first broadcast information from q to each neighbor.” - This is the same concept twice, maybe only mention once?

Suggestions for Fig 2:
- If I understand correctly, these are two “unrolled” steps of the RING network? Maybe the difference between the two boxes would be slightly clearer if the titles would be changed to: RING (at time t-1) and RING (at time t) or similar.
- In both boxes, h_q^(t-1) is used. I assume this should the t-2 in the first box and t-1 in the second?
For the SCARED dataset, the ground truth is calculated - could you show the errors (as images) of each sample in the supplementary material? That would be interesting to see (are the errors larger on the tools, or distributed equally etc.)

Minor:
- “we calculate new features for and refined displacement estimates for each match.” -> remove first “for”?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method is quite complex, making it difficult to explain. However, I feel like the explanation could be made clearer by moving around and reworking some sentences in the methods section. Alternatively, this may be better suited for a (longer) journal paper?
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

4
Reviewer confidence

Somewhat Confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The paper proposes a deformable tracking method on endoscopic videos using a recurrent implicit neural graph (RING). It extends a previous method by accommodating temporal information using a RNN. Its inference is quite fast enough to be used for real-time application.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Detailed descriptions of the method;
- Extensive experiments;
- Fast inference.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

As mentioned in the paper, it extends a previous method. There’s nothing wrong about this but then a lot of space is used for explaining the previous methods.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Thanks to its detailed explanations on the method, it seems reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

My only complaint is, as mentioned, if the 2/3 of the method is from a previous method, those could have been more brief. Then, there could have been more space for experiments etc.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method seems to be working well. Also, the fast inference time makes the proposed method quite useful.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

3
Reviewer confidence

Somewhat Confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper is about dense point tracking in endoscopic video using a recurrent graph neural network. The proposed method has novelties and is interesting but the paper is limited in terms of experiments. As mentioned by R3 more space could be used to provide additional results. In particular, the comparison to previous work could be extended. Nonetheless, the paper has merit and the AC recommend early acceptance.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

NR

Author Feedback

Thanks to the reviewers for the clarity of review and suggestions.

To provide clarity on the purpose of our method. RINGFlow will be used for tracking regions of tissue for telestration, automation of scanning and camera motion, and for real time registration for image guidance. It could also act as a data association term for any deformable SLAM method while replacing smoothness constraints that do not allow for separate motions.

Regarding the limited amount of space in the paper, along with our method being based on previous work. We believe our work is a strong step forwards. We incorporate a new base barycentric offset and temporal memory to tracked points, in addition to having more analysis and experiments than prior work by characterizing errors with respect to ground truth from the SCARED dataset.

We agree that the Methods section can be hard to understand as it extends some concepts introduced in a non-temporal prior paper. We will adjust the methods section to be more readable with better ordering and incorporate edits suggested by reviewers while still following the MICCAI policy to not change the paper substantially.

In response to R1 on the standard error being small, this is the standard error of the mean over all pixels in a test set. Thus as we take more samples, the error of the estimate decreases proportional to 1/sqrt(n). Additionally to R1 to clear up misunderstanding on Q7: our code is not actually in an online github repo; we will consider posting it in the future.

For questions from R2: On reusing the symbols phi and gamma for the positional embedding functions, we see how confusing this can be. Thank you for pointing out the different meaning of ‘initialization’ from what we intend, as it could refer to weight initialization in machine learning. We intended each reuse of gamma and phi to denote a new instance with different weights each time. All functions are separate instances in the same way that having x = Conv(Conv(b)) denotes two convolution operators which are untied and use different weights. We will correct this by adjusting the symbols and wording.

In our statement on the difference between our graph attention and another being our use of relative position, we were referring to the relative position in equation 4 prematurely and we will move this statement to be near to equation 4.

R2 is correct about Fig 2; these are two steps of RINGFlow over time. That is, we show two time steps each of which estimate flow for its respective frame in order to show how the RNN propagates the hidden state. The second box looks different as we compress the boxes of the graph refinement part and flow estimation for space to show the GRU and hidden state. We will clarify this accordingly. Regarding the labeling of the hidden state, we have mislabelled it; it should be h^(t-1) going only into the GRU in the second box. An arrow (h^(t-1) -> embed, graphconv) was incorrect (should be h^(t) from the GRU -> embed, graphconv) which likely led to the confusion.

Finally, although our supplementary material is currently at the two page limit, we appreciate the suggestion of showing the error visually on the SCARED dataset, and we plan to use this in future work to better understand where errors occur.

back to top

Recurrent Implicit Neural Graph for Deformable Tracking in Endoscopic Videos