Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Takanori Asanomi, Kazuya Nishimura, Heon Song, Junya Hayashida, Hiroyuki Sekiguchi, Takayuki Yagi, Imari Sato, Ryoma Bise

Abstract

We propose a deep low-rank alignment network that can simultaneously perform non-rigid alignment and noise decomposition for multiple images despite severe noise and sparse corruptions. To address this challenging task, we introduce a low-rank loss in deep learning under the assumption that a set of well-aligned, well-denoised images should be linearly correlated, and thus, that a matrix consisting of the images should be low-rank. This allows us to remove the noise and corruption from input images in a self-supervised learning manner ({\it i.e.}, without requiring supervised data). In addition, we introduce multi-input attention modules into Siamese U-nets in order to aggregate the corruption information from the set of images. To the best of our knowledge, this is the first attempt to introduce a low-rank loss for deep learning-based non-rigid alignment. Experiments using both synthetic data and real medical image data demonstrate the effectiveness of the proposed method. The code will be publicly available.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16446-0_18

SharedIt: https://rdcu.be/cVRSY

Link to the code repository

https://github.com/asanomitakanori/Unsupervised-Deep-Non-Rigid-Alignment-by-Low-Rank-Loss-and-Multi-Input-Attention

Link to the dataset(s)

https://ieee-dataport.org/open-access/recovery-fa19-ultra-widefield-fluorescein-angiography-vessel-detection-dataset#:~:text=RECOVERY%2DFA19%20dataset%20is%20established,corresponding%20labeled%20binary%20vessel%20maps.


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a non-rigid registration network with a low-rank loss for noisy image registration. The experiments were conducted on synthetic and real images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of this work is that inspired by robust alignment by sparse and low-rank decomposition (RASL), the authors introduce a low-rank loss in current registration network to deal with the images with noises or corruptions.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weakness of this work is that the authors didn’t give a good description of their methods, the whole method part is difficult to follow. In addition, the application of this method may be limited.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Not so good.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    –I think the use of the low-rank loss for the registration of noisy images is good. –However, I do not think the authors give a good description of the proposed methods. The whole method part is difficult to follow, I can only try to understand the detailed design by using my knowledge on RASL and robust PCA. –The proposed network may require high computation resources and memory usage. The authors shall compare the parameter amount, floating point operations, and training time, etc. Moreover, too many hyper-parameters need to be tuned for different applications, which may reduce the generalization ability of the method. –The standard deviation of the results should be provided. –The RASL is a classical low-rank and sparse decomposition method, the authors could compare more methods focusing on fast and noisy robust decomposition, e.g., Wu, Yi, et al “Online robust image alignment via iterative convex optimization.” 2012 IEEE Conference on Computer Vision and Pattern Recognition. Zheng, Qingqing, et al “Online robust image alignment via subspace learning from gradient orientations.” 2017 International Conference on Computer Vision. –The registration accuracy seems not in a high level, with just 60% Dice. If there are other methods reported results on the same dataset, the authors shall mention them so let us know the current accuracy level for this application.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please see the described weaknesses and detailed comments.

  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    The paper propose three subnetworks for learning-based non-rigid alignments of photoacoustic hand imaging, that is, noise decomposition network with noise loss, non-rigid deformation network with low-rank loss, and sparse error complement network with multi-input attention module. Then the authors evaluate their method on both synthetic data and photoacoustic data, and compare with several other SOTA methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well organized and written.
    • Considering the actual problems in photoacoustic microscopy imaging, a complete pipeline is designed for multi-scaning pam images alignments.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The discussion in many places seems not objective enough and lacks supporting materials, such as the authors claim it is the first low-rank loss for alignment, but it seems that some articles for registration already exist: A low-rank representation for unsupervised registration of medical images.” arXiv preprint arXiv:2105.09548 (2021); although it is not stated in terms of loss. The author uses multi-attention to complement the sparse corruption, what is the motivation between attention and sparse.
    • And the authors claim the deep-learning-based methods can not have a denoise function, but for the learning of high-frequency information (noises) and low-frequency information, NN can reduce noise very well: Deep image prior, CVPR2018; Training deep learning based denoisers without ground truth data. NeurIPS2018.
    • The experiments can not effectively illustrate their method, I will list it in the comments.
    • A more clear outline of the next steps in research would be appropriate.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducibility is insufficient. The author’s paper states that the codes will be provided, but there is no explanation of the selection and sensitivity of hyperparameter of losses, and the collection of the datasets.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • For equation 1 loss noise, how is b chosen or calculated? If the noise can be defaulted to Gaussian noise with a constant mean and small variance, why not limit it, such as with KL loss, etc? and τ is also used here, so the convergence is also affected by the transformation? In this case, I think the loss declared by the authors will be more subjective about the penalty for foreground signals.
    • When there is no noise or strong bias, the behavior of the denoising module or the corresponding loss?
    • For the deformation loss, what is the reason why introducing the third regularization for sparsing displacement field.
    • Sensitivity to the selection of the four hyperparameters for overall losses.
    • Regarding the comparison with other methods, it doesn’t seem fair. Because the three modules are responsible for part of things (denoise, aligning, sparse or inpainting…), the number of parameters will be more compared to other methods.
    • The Ablation study is not sufficient. There is only a splicing and ablation of three sub-networks, and there is no way to prove the role of the innovation points discussed in each network. The contribution points in each network can not be demonstrated, such as loss noise, deformation…
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite the flaws in this paper, the overall method and experiments make some sense for the community, so I recommend weak accept.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The author’s reply relieved some of my confusion to a certain extent, but the effectiveness of the ablation study can still be improved, so I maintain my rating because the merits weigh over weakness: weak accept (5).



Review #3

  • Please describe the contribution of the paper

    They proposed a neural network for robust non-rigid image alignment. This method is especially powerful when the noise and corruptions exist in the images. It is based on the idea of low-rank and sparse decomposition, assuming that well-aligned images with corruptions and noise are removed should have a low rank, which is forced with a low-rank loss It achieves highest score among several rigid/non-rigid alignment algorithms.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Main philosophy of the paper is novel and make sense. Compare to previous robust image alignment algorithms, proposed method has following advantages: (1) it could handle non-rigid transformations, (2) since it is NN, we could generalize for dataset. And compare to previous NN based image alignment methods, it is robust since it decomposes sparse noise and corruptions from the data.
    • It could handle non-rigid alignment where severe noise exist, it could be potentially used in pratice.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There are some curious points for evaluation and comparison. Main concern is that whethere the comparison was fair.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    They did not submit the code, but it will be made public according to the authors. Details including hyperparameters, network architecture, computation time, etc. are given in the manuscript. It could be reproduced as the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • Figure 2 has some confusing points.
      • Why outputs of noise decomposition network contains black rectangles?
      • While \tau_2 and \tau_i are different, I_1, I_2, and I_M are same.
    • How the images (S_*) from the sparse error complement network look like in the real data? Since the noise will be removed in the noise decomposition network, inputs for sparse error complement network will be noise-free images. Was there any meaningful sparse corruption in the real data?

    • How the Dice score is calculated for RASL in the synthetic dataset in detail? How many the images are given to the algorithm? For robust alignment algorithms such as RASL, the number of images are important to acquire exact the low-rank space which drives the alignment. If the 8 number of images were passed to the RASL, it seems a small number of images compare to the experiments in RASL paper (See Fig. 5 of RASL paper).

    • Why the 6.4 pixels were used in the synthetic experiment? It would be better to calculate the metrics for different level of noise level or non-rigid transformation, such as phase diagram in RPCA and Robust alignment papers.

    • In the Table 1, VMorph which is a non-rigid alignment method does not outperform even the RASL which is a rigid-alignment algorithm. Did you inspected the aligned images from RASL and VMorph? Please give us a simple answer about this situation.

    • In the Table 1, what is the take home message for the time measurement? Is it just a result? And time for the NN approaches calculated only for inferencing without training? It is unfair to compare only evaluation time for NN to the optimization-based method such as RASL.

    • Authors’ method and VoxelMorph fix one image and do registration to make other images well-overlapped to the fixed image. RASL does not have this property and all images are registered together. In other words, first aligned image using RASL will different from the input first image. If the ground truth mask to calculate dice score was drawn based on the first image, it is natural that RASL achieves lower dice score. Then, it is not appropriate to calculate dice score to compare the performance. One way to do a fair comparison using dice score is that we should first find the transformation matrix of first image estimated using RASL, and apply inverse transformation for all images to achieve a same property that first image is the ‘reference’ and not be aligned.

    Followings are minor comments.

    • “Robust” in the title is missing in the pdf submission, where it exists in the CMT submission.

    • Is the gaussian noise was applied in creating a synthetic data? Then did the data have passed such as ReLU to enforce non-negativity where the image should be positive?

    • In the Table 1, the word Dice Time seems mistakenly added in the third line.

    • Explanation of Figure 4 is bit confused. The rightmost one is the image averaging without alignment. Seems that the left and middle images are also the average projection after the alignment using propose method and RCN, respectively. If they are, please clarify three of them are all average intensity projected image from the aligned result and raw data.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The formulation makes sense and it is natural to achieve best performance. However, there are some points in the experiments to be clarified.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Authors addressed well for overall questions, but some points I raised are not solved. I did not find reasons to edit my rating.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Unsupervised Deep Robust Non-Rigid Alignment by Low-Rank Loss and Multi-Input Attention

    This submission tackles the registration problem by exploiting the linearity of well-aligned de-noised images and proposes to learn in a self-supervised manner a loss function that optimizes for a low-ranked matrix while jointly denoising and aligning images. The evaluation is on photoacoustic imaging and baselines are on a variety of standard deep learning registration approaches. While the use of a low-rank decomposition appears original in a loss formulation, the validation, as is, remains weak with potential flaws that requires clarifications in a rebuttal:

    • clarification on the methodology and motivation of several loss terms (R1,2,3 all found confusion in the methodological description)

    • discussing on a comparison with low-rank decomposition methods (R1) to better grasp the contribution arising from formulating the low-rank loss

    • fairness of the comparison setting (R2,3), notably on multiple experimental settings that raise questionable doubts - including the mixing of steps in benchmarked methods, possibly fine-tuned experimental variables (why 6.4 pixels, why 0.4 std.dev), voxelmorph underperforming rigid alignement, insufficient ablation study.

    • results lack a variability study or significance test (R1,2)

    • I would also add that Dice scores is not enough since the quality of the deformation fields is not evaluated as in typical registration methods (jacobian maps of the deformation fields)

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8




Author Feedback

We would like to thank all the reviewers for their insightful comments, where all reviewers recognized the novelty of using low-rank loss for registration. We would like to address the main concerns as follows.

  1. Methodology R2: Motivation of multi-attention The motivation is aggregating the information of all inputs to find corruptions. Finding corruptions can be considered finding common (foreground) and sparse uncommon (error) components since corruption is sparse, and varies across the images. Multi-attention can aggregate such information and find the uncommon errors.

R3: Meaningful of sparse errors in real data As shown in Fig. 1, real data contains sparse errors, and outputs of the noise decomposition network still contain corruptions. Thus, the sparse error complement network is necessary. As a result, complements were complemented (Fig. 5).

R2: Why introduce the regularization term for sparse displacement in deformation? The aim of the l1 regularization term is to avoid large displacement. In our case, because the foreground regions are sparse, we used the l1 norm.

  1. Discussion of related works using low-rank R1: (Wu Y, et al ) and (Zheng Q, et al) The purpose of these papers propose online alignment. These also need optimization in inference, and it takes running time. Unfortunately, we could not find their public codes. R2: (Jia D, et al) The motivation of this method is representing features of a single image by low-rank projection. In contrast, the purpose of our low-rank loss is to align the set of input images, it is significantly different from the related paper. We will add these discussions of the related works.

  2. Comparison setting R2: Is ablation study sufficient? We consider our ablation study was properly designed. Because our goal is deformation despite severe noises and corruptions, the deformation part is necessary. Thus, we did not evaluate the performance of denoise and complement itself. In addition, each loss and subnetwork should be a set, and all networks require the low-rank loss, i.e., all of ‘ours w/o X’ use it. Our ablation study showed the effectiveness of each subnetwork.

R1,R2: Network parameters. Some of the comparison methods have a large amount of the network parameters, such as RCN (three times ours). Basically, the number of parameters is correlated to the running time. Even if we simply increase the layers of VoxelMorph, it may not directly improve the performance.

R1,R2: Robustness of hyper-parameters. In our experiments, we tuned the hyper-parameters using training and validation data. Empirically, the hyper-parameters were not so sensitive except λ_sp. This is related to the size of the corruption. We consider that it is not so difficult to tune this in real application.

R1, R2: Running time In real applications, the running time in inference is more important than training since training can be done before introducing systems. In addition, the other papers (e.g., VoxelMorph) also compared inference time. Therefore, we consider it a fair comparison.

R3: How is Dice calculated for RASL? We displaced the foreground mask images of both the source and target by the optimized displacement parameters by RASL. Thus, we consider it was fair comparison.

  1. Lacks significant test (R1, R2) We performed a non-parametric multiple comparison test. In the results, our method was significantly better than all the methods with p < 0.01 in all datasets.

  2. Evaluation metrics (meta) Dice has often been used in major registration papers, e.g., VoxelMorph as the performance metric without using the ground truth of the displacement fields. Dice is correlated to the similarity of displacement fields. If the warped source is very similar to the target by displacing noise regions to foreground regions, Dice become small. If foreground regions were properly transformed, Dice becomes high. For our purpose, aligning the foreground regions, Dice is an appropriate evaluation metric.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Unsupervised Deep Robust Non-Rigid Alignment by Low-Rank Loss and Multi-Input Attention

    The rebuttal has clarified major concerns on the evaluation. A general consensus among the reviews exists on novelty, robustness of the low-rank approach to non-rigid registration. The comments are encouraged to be further investigated in a future extension. For instance, Dice is not enough to validate registration since a perfect Dice could be reached with a highly irregular displacement field. The quality of the deformation therefore needs to be evaluated. Assuming that the proper clarifications from the rebuttal will be added in the manuscript, the recommendation is towards Acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The motivation for the joint denoising and registration was satisfactorily answered in the rebuttal, thus maintaining the novelty of the proposed method. Another major critique, how distinguishable the proposed method is from other low-rank-based registration methods, was further clarified in the rebuttal.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    10



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Limited validation without reporting variance or statistical tests. Lack of clinical relevance both in the metrics and the improvement claimed.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



back to top