Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Ruizhou Liu, Qiang Ma, Zhiwei Cheng, Yuanyuan Lyu, Jianji Wang, S. Kevin Zhou

Abstract

Fluoroscopy is an imaging technique that uses X-ray to obtain a real-time 2D video of the interior of a 3D object, helping surgeons to observe pathological structures and tissue functions especially during intervention. However, it suffers from heavy noise that mainly arises from the clinical use of a low dose X-ray, thereby necessitating the technology of fluoroscopy denoising. Such denoising is challenged by the relative motion between the object being imaged and the X-ray imaging system. We tackle this challenge by proposing a self-supervised, three-stage framework that exploits the domain knowledge of fluoroscopy imaging. (i) Stabilize: we first construct a dynamic panorama based on optical flow calculation to stabilize the non-stationary background induced by the motion of the X-ray detector. (ii) Decompose: we then propose a novel mask-based Robust Principle Component Analysis (RPCA) decomposition method to separate a video with detector motion into a low-rank background and a sparse foreground. Such a decomposition accommodates the reading habit of experts. (iii) Denoise: we finally denoise the background and foreground separately by a self-supervised learning strategy and fuse the denoised parts into the final output via a bilateral, spatiotemporal filter. To assess the effectiveness of our work, we curate a dedicated fluoroscopy dataset of 27 videos (1,568 frames) and corresponding ground truth. Our experiments demonstrate that it achieves significant improvements in terms of denoising and enhancement effects when compared with standard approaches. Finally, expert rating confirms this efficacy.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_2

SharedIt: https://rdcu.be/cVRYF

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    The paper proposed a self supervised method for fluoroscopy denoising. In their method they have first stabilize the the frames to compensate the non stationary background effect induced by the motion of the x-ray detector, than decompose the background and foreground using RPCA(a variant of principle component analysis) and then denoised the background and foreground separately.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Proposed framework utlized the knowledge of fluroscopy imaging physics to design the method.
    2. It is a self supervised method, so paired data is not required for training.
    3. The first two stage, i.e., stabilizing and decomposition has contributed significantly to improve the denoising performance of existing self supervised denoising methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Literature review of the paper is very poor. Although it has discussed different method for self supervised denoising, RPCA but no previous study regarding fluoroscopy denoising is described.
    2. Ambiguity in clinical study. It is not clear what the author mean by “In addition, for each group, the five images are permuted randomly”. If this permitted images are used as input to denoising network, then for some case the already denoised image may have been used as input to the network. What is necessity of this step is not clear, and how the author ensure the above did not happen mistakenly.
    3. No comparison with existing methods for fluoroscopy denoising. The paper only considered baseline denoising method without first two stage and complete method. However did not compare their method with exsinting fluoroscopy denoising methods. a. Matviychuk, Yevgen, et al. “Learning a multiscale patch-based representation for image denoising in X-ray fluoroscopy.” 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2016. b. Amiot, Carole, et al. “Spatio-temporal multiscale denoising of fluoroscopic sequence.” IEEE transactions on medical imaging 35.6 (2016): 1565-1574.

    Also other SOTA self supervised video denoising method should be used as baseline.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper is reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. Literature needs to more comprehensive about fluoroscopy denoising.
    2. Do comparison with other SOTA methods.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is interesting, but in my opinion validation is not enough.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The author clarified one doubt regarding the implementation detail in the rebuttal. However, main concern regarding the validation of the proposed method e.g., comparison with other methods are not addressed.



Review #3

  • Please describe the contribution of the paper

    They proposed a pipeline to stabilize and denoise fluoroscopy video where severe noise and motion exist. First find global motion between adjacent frame, decompose foreground and background using proposed mask-based RPCA, denoise them and composed. They showed the better denoising performance compared to other approaches in numerical measure and expert evaluations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    They built a framework which does stabilization, decomposition, and denoise which is novel. They provided mathematical proofs that support their claim.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Please see the detailed comments below.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Implementation details and experimental settings were well-explained. It could be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Robust Alignment[1, 2], which combined RPCA and image registration, has been studied to deal with stabilizing the video where sparse corruptions exist. Robust image alignment aligns the batch of images together, and simultaneously doing RPCA so that the alignment is robust to the sparse corruptions. Using robust alignment, we could acquire aligned images, and corresponding low-rank background and sparse foreground. It can be viewed as the combination of Stage 1 Stabilize and Stage 2 Decompose in the proposed paper. It should be clarified about the difference between the proposed method and robust alignment, and also the reason why the proposed method is a better formulation. Also, robust image alignment handles all image data at once, where the Stabilize step of the proposed method finds the transformation parameter for each frame. The robust image alignment can find a better registration parameter based on better accurate low-rank subspace.

    Training a smaller student network using a large teacher network to reduce prediction time is a widely used technique, but I can’t find any information about speed improvement or execution time w/ and w/o student-teacher network.

    Optical flow is estimated from the network implemented by PWC-Net. Was it newly trained on this fluoroscopy video, or was it a pre-trained network?

    In the example videos (in supp.), I observed the blood vessels are moving with the heartbeats, whereas the background body does not. Is it okay since we are finding a representative translation parameters using KDE (where the pixels for blood vessels are more sparse compared to the pixels of the backgrounds)?

    It is difficult to investigate better performance in Figure 4. Consider adding arrows to emphasize.

    Consider adding the following recent RPCA paper in the Introduction: Han, Seungjae, et al. “Efficient neural network approximation of robust pca for automated analysis of calcium imaging data.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2021.

    Followings are minor comments.

    • Are other denoising methods (Noise2Void, Noise2Self, … ) trained in XCA dataset the same as Self2Self?
    • In the citation of RPCA methods in the introduction, need to change the order of the last two papers (currently [… 32, 34, 33]).
    • Typo in the Table 1 caption. denoiseor –> denoiser
    • In Table 1, there are Ours+N2V, Ours+N2S, Ours+S2S. Isn’t the proposed framework include Denoise step? Consider denoting as ‘Ours w/ N2V’.
    • Not sure only one proficient radiologist is sufficient, without consensus, for qualitative validation of the methods.

    References: [1] Peng, Yigang, et al. “RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images.” IEEE transactions on pattern analysis and machine intelligence 34.11 (2012): 2233-2246. [2] Zhang, Xiaoqin, et al. “Robust low-rank tensor recovery with rectification and alignment.” IEEE Transactions on Pattern Analysis and Machine Intelligence 43.1 (2019): 238-255.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Good paper from an overall perspective, but there are some curious points as in comments. Also, the differenece and comparison to the robust image alignment method need to be clarified.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    Thankfully, I think authors clarified curious parts I raised. However, I will maintain my rating as ‘accept’.

    It would be better for authors to write references simple and dedicate spaces for responsing other questions.



Review #4

  • Please describe the contribution of the paper

    Proposed a three stage framework for denoising including stabilizing using optical flow, decomposing by proposing masked Robust Principle Component Analysis (RPCA), and denoising by using a simple self-supervised network.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Please see below

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Please see below

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Yes

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This paper proposed a three stage framework for denoising including stabilizing using optical flow, decomposing by proposing masked Robust Principle Component Analysis (RPCA), and denoising by using a simple self-supervised network. Evaluated on one private dataset and one clinical dataset (a radiologist to rate denoised image quality).

    The paper is easy to follow and understand, the writing and formation could be improved (such as adding numbers to Fig. 4). The novelty is limited except a contribution to RPCA (based on Inc-PCP algorithm) to solve non-overlapped areas, the first two stages are data processing stage and the denoising stage leveraged an available self-supervising network, overall it does not have a significant contribution.

    The experiments are not strong, suggest to use more publicly available benchmark datasets and avoid to use private dataset. The comparisons to other methods are unfair since the proposed method used optical flow to stabilize inputs for motion data, but other methods such as Self2self proposed for the additive white Gaussian noise (AWGN) data. More recent works suggest to discuss after the year of 2020 and compare if possible.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novelty and Experiment

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    There are non-converging review recommendations. The authors are encouraged to address esp. the issues raised by the reviewers including the novelties & technical contributions (how it differs from related existing methods), literature reviews (e.g. reviewing existing fluoroscopy denoising literature), empirical evaluations (e.g. comparison with existing methods in fluoroscopy denoising), presentation, among others.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6




Author Feedback

Q1: Literature review of the paper is very poor. Many conventional denoising methods are proposed. As simple mean filter will blur boundaries, more stochastic methods are proposed, such as edge-preserving adaptive filters[1], non-local means methods[2][3][4] and multi-scale denoising methods[5][6] [1]. T. Cerciello, M. Romano, P. Bifulco, M. Cesarelli, R. Allen, Advanced template matching method for estimation of intervertebral kinematics of lumbar spine, Medical Engineering and Physics 33 (10) (2011) 1293–1302. [2]. Buades, A., Coll, B., & Morel, J. M. (2011). Non-local means denoising. Image Processing On Line, 1, 208-212. [3]. A. Foi, Clipped noisy images: Heteroskedastic modeling and practical denoising, Signal Processing 89 (12) (2009) 2609–2629. [4]. K. Dabov, A. Foi, and K. Egiazarian, “Video denoising by sparse 3d transform-domain collaborative filtering,” in Proc. 15th Eur. Sig. Process. Conf., 2007, vol. 1, p. 7, 2. [5]. Matviychuk, Yevgen, et al. “Learning a multiscale patch-based representation for image denoising in X-ray fluoroscopy.” 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2016. [6]. Amiot, Carole, et al. “Spatio-temporal multiscale denoising of fluoroscopic sequence.” IEEE transactions on medical imaging 35.6 (2016): 1565-1574.

Q2: Ambiguity in clinical study. Ans: As we explained in “Expert rating” part in Section 3.1 “Setup Details”, “a proficient radiologist is invited to rate the denoised image quality for our Clinical dataset”, the Clinical dataset is used for evaluating denoised results by human expert rating. The dataset includes 60 groups corrupted images, and each group consists of 5 images, one of which is the original noisy image, and the remaining four are denoised by our method, Noise2Self, Noise2Void and Self2Self, respectively. Why the five images in each group are permuted randomly? If there is no random permutation, experts may find that a certain column (such as the last column) is clearer/dirtier when rating first few groups, which makes inertial thinking for the expert, thereby biasing the rating for the remaining groups and affecting fairness. Q3: It should be clarified about the difference between the proposed method and robust alignment,

  1. Indeed, the methods can find a better registration parameter based on better accurate low-rank subspace for a batch of images. However, our method is an online algorithm for processing video stream. 2. We expect to align the background of each frame and preserve motion characteristic of foreground. 3. In addition, processing speed is another factor take into our considerations. The method reviewer#2 prvided will converge after 20 iterations. But we aimed ai design a real-time processing method. Q4: student network speed v.s. self2self network speed: The consuming time of our model is 0.046748s. The original Self2self model is 0.00839s. Q5: Is PWC-Net newly trained? PWC-Net is a pre-trained network. It works well on fluoroscopy images data. Q6: Is it okay using KDE? As we explain in our paper, “Though it is likely that the video foreground possesses more pixels than background, the motion patterns of foreground pixels are random, while those of background pixels are consistent.” The KDE is used for estimating the distribution of motion vectors (optical flow) of each pixel. The motion pattern of foreground pixels is random/chaos, their motion vectors (optical flow) are scattered, thus their probability density peak value is quite small (as possess high variance). While the motion pattern of background are mostly consistent, their probability density peak value is quite large (as possess small variance). So, I think it is work well. In future, we may use a neural network to estimate affine parameters in homograph, which will be more robust Q7. Are Noise2Void, Noise2Self, … trained same as S2S? Yes




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper deals with fluoroscopy denoising using self-supervised learning for stabilization, decomposition, and denoise. The proposed idea is overall interesting, and is empirically supported. Meanwhile, the reviewers have raised a number of concerns including insufficient literature review and comparison with SOTAs, comparison with robust image alignment, clinical study ambiguity, among others. The authors need to seriously go through the issues raised by the reviewers and address them properly.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper studied the problem to stabilize and denoise fluoroscopy video where severe noise and motion exist. In the presented framework first finds global motions between adjacent frame, decompose foreground and background using proposed mask-based RPCA, denoise them and composed. Experimental results have been reported to support the presented learning pipeline.

    While two reviewers gave positive rating to this paper, I agree with Reviewer #4 in that the novelty of the paper is limited. All the three components of the paper are known machine learning methods. Thus a combination of them may not contribute enough novelty. And unfortunately, in the rebuttal the authors did not address this point.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    15



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents a pipeline to denoise fluoroscopy images, including optical flow to stabilize video sequences, robust PCA to separate foreground and background, and apply self-supervised learning to denoise the foreground and background. Though each individual component might be familiar to readers, the major contribution is from the pipeline implementation idea. More comparisons with related works should be included to validate the proposed pipeline.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



back to top