Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Neel Dey, Jo Schlemper, Seyed Sadegh Mohseni Salehi, Bo Zhou, Guido Gerig, Michal Sofka

Abstract

Establishing voxelwise semantic correspondence across distinct imaging modalities is a foundational yet formidable computer vision task. Current multi-modality registration techniques maximize hand-crafted inter-domain similarity functions, are limited in modeling nonlinear intensity-relationships and deformations, and may require significant re-engineering or underperform on new tasks, datasets, and domain pairs. This work presents ContraReg, an unsupervised contrastive representation learning approach to multi-modality deformable registration. By projecting learned multi-scale local patch features onto a jointly learned inter-domain embedding space, ContraReg obtains representations useful for non-rigid multi-modality alignment. Experimentally, ContraReg achieves accurate and robust results with smooth and invertible deformations across a series of baselines and ablations on a neonatal T1-T2 brain MRI registration task with all methods validated over a wide range of deformation regularization strengths.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16446-0_7

SharedIt: https://rdcu.be/cVRSN

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The authors propose a new approach for multi-modal image registration that uses patch-wise features to learn a multi-scale multi-modality embedding space for the loss function.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The use of an embedded feature-based contrast comparison is a novel idea, and the presented results suggest strong performance for the proposed approach over existing methods. In addition, the paper was well written, and does a fantastic job covering relevant work.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Discussion/analysis regarding some key parts of the algorithm seems to be missing. Namely, the impact of the autoencoder performance on the alignment result, and the chosen parameterizations for the STN/registration network.

The dataset used for the evaluation of this work is disappointing. Outside of the rare cases of lost data, there are very limited situations where one needs to actually perform inter-subject registration between T1w and T2w MRI, since the two contrasts are almost always acquired together. The results would be more convincing if the method was tested on a real-world application.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Potential reproducibility of this work is moderate. The authors do a good job of explaining their method/hyperparameters, but are unwilling to release their software.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

It would be helpful if the author can provide some additional discussion regarding the auto-encoder used for feature detection. It seems to be a key part of the network, and plays a big role in determining the reliability of the contrast loss. What happens if the encoder just doesn’t work well for a given modality?

I would really encourage the authors to test their method on a more real-world example of multi-modal data alignment. Perhaps intra-operative to pre-operative alignment, as they motivated in their introduction. Or even CT to MR, which would demonstrate the method is robust to large nonlinear differences between the intensity spaces.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors present a novel method with promise results.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Not Answered
[Post rebuttal] Please justify your decision

Not Answered

Review #2

Please describe the contribution of the paper

In this work, the author introduces a new contrastive loss for deep learning multimodality registration. Three networks are used : a registration network, a T1 auto-encoder and a T2 autoencoder. The deformed image and the reference image pass through the two auto-encoder and the loss is calculated between their projection. The author also used a hypernetwork to choose the regularisation parameter lambda.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The author deals with the hard problem of multi modality registration and proposed an approach based on contrastive learning.

They compared with different classic multimodality losses and obtained better results.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

To the reviewer it is not clear if the proposed method is better than SynthMorph. Indeed, SynthMorph achieves a better Folds and sdLog (B1 Table1), and from the curve Figure 3 (row3), it seems that for the same level of % Folding Voxels, the dice score of the proposed method is much lower than SynthMorph.

The author did not discuss the impact of using a supplementary network in term of training time and memory used. If we have a fixed GPU memory, is it better to add two autoencoders for the constrative loss or increase the number of parameters of the registration network ?

The clarity of the paper should be improved, and specially the clarity of the methodology .
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

It seems to the reviewer that all the parameters are not given, to fully reproduce the paper. For instance, the number of negative and positive pairs to calculate the loss and how these pairs are obtained. The reviewer highly recommand to provide the code of this paper, if the paper is accepted, as it could explain part of the methodolohy.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

The reviewer suggests to the author to improve the clarity of the paper, specially the methodology, and to perform more experiement with published registration methods, to improve the quality of this paper.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

3
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The clarity of the paper and the insufficient comparisons with other methods.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

5
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

Not Answered

Review #3

Please describe the contribution of the paper

The authors present a new approach for unsupervised Deep Learning-based multimodal image registration. Their approach builds on contrastive learning techniques and is validated on the registration of brain image data from the Human Connectom Project and compared to other current approaches and variants of the proposed method.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- New approach. The novelty of the approach lies in the use of contrastive representation learning to measure the similarity of images with different modalities. The approach is novel in that it does not build on common geometric image similarity measures such as NCC or MI, nor does it explicitly learn a metric. It provides a very interesting and fully data-driven alternative for multimodal registration.
- The presentation is clear and sound with a balanced high technical level. The motivation is clear, with a detailed list of works cited (>20 references).
- Strong evaluation and comparison. The evaluation compares several variants of the approach using state-of-the-art methods, including hyperparameter optimization, to provide a fair and realistic picture of the comparison.
- The results are good and slightly superior to the compared methods for the presented expalele. This is very interesting as it strongly suggests that such data-driven methods could be a quite attractive and effective approach for multimodal registration.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The paper contains a lot of information and the presentation is quite “dense” due to the page limitation. Therefore, it is more accessible to a knowledgeable audience.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors do not provide any code. The data used are publicly available. The experiments and methods are clearly described in the paper. Qualitative reproduction of the results should be possible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

In summary, I think this is an excellent piece of work. I am very impressed with the clarity and density of the presentation and think this is close to what can be accommodated in 8 pages.

In fact, I have no concerns or further substantive comments.

My only minor complaint would be that the font size in the diagrams in Fig. 3 is too small and the legends are difficult to read.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

8
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors present a novel approach to multimodal image registration using deep learning. Excellent concise, theoretically sound paper with clear motivation, presentation of related work, good evaluation and comparison with the state of the art.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Not Answered
[Post rebuttal] Please justify your decision

Not Answered

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

Contribution

Propose a new approach for multi-modal image registration using patch-wise features to learn a multi-scale multi-modality embedding space for the loss function. Three networks are used: a registration network, a T1 autoencoder and a T2 autoencoder. The deformed and reference images pass through the two autoencoders and the loss is calculated between their projection. Work also uses a hypernetwork to choose the regularisation parameter.

The approach is novel, with its use of contrastive representation learning to measure the image similarity with different modalities. It provides a fully data-driven alternative for multimodal registration.

The motivation is clear, with a detailed list of works cited.

Strong evaluation and comparison. The evaluation compares several variants of the approach using state-of-the-art methods, including hyperparameter optimization, to provide a fair and realistic picture of the comparison.

The results are good and slightly superior to the compared methods. This strongly suggests that such data-driven methods could be an attractive and effective approach for multimodal registration.

Weaknesses to address

While the presentation is already dense due to the page limitation (with tiny fonts in some places), it may need more discussion/analysis of the impact of autoencoder performance on alignment accuracy, and the chosen parameterizations for the STN/registration network.

Clarify whether the proposed method really outperforms SynthMorph, as the latter achieves better results for many measures. There probably would not be time to compare against other published methods.

Perhaps needs to discuss the impact of the supplementary networks in term of training time and memory used (if space allows).

Code needs to be made available.

Some additional area chair comments

Composition notation $\phi \circ I$ is incorrect, and should be $I \circ \phi$.

The regularisation $ v _2^2$ does not seem to control smoothness (does not account for spatial relationships).

Notation would be less cluttered with $I_1$, $I_2$, $d_{12}$ etc, instead of $I_{T1}$, $I_{T2}$, $d_{T1T2}$.

What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

3

Author Feedback

We thank the AC+R1,2,3 for the very useful feedback. We are happy that they found the work novel [R1,3], well presented [R1,3] & well evaluated [R3], and to yield improved results [R1,2,3]

Major Responses:

|| ALL: “Release code” We will release code

|| AC, R1: “Impact of autoencoder (AE) perf. & add STN/regnet details” To probe the impact of worst-case AE performance (R1: “what if the encoder doesn’t work?”), we trained our mCR model using a frozen AE with no pretraining as a random feature extractor and trained only the regnet+MLPs. mCR works even in this worst-case, with only a ~1% Dice decrease - we’ll include this in Tbl.1. We respectfully argue that further AE analysis is best left to a journal paper due to the page limit. The STN/regnet is a standard VoxelMorph UNet detailed in SupplFig1

|| AC, R2: “Clarify if method outperforms SynthMorph (SM)” We include 2 SM models: ‘shapes’ & ‘brains’. Both our model & ‘brains’ widely outperform ‘shapes’ which trains on noise images. However, comparisons between ours and ‘brains’ need careful nuanced interpretation. As detailed in ‘Results/Pt2’, our model obtains higher accuracy than ‘brains’ (see Fig2) but has higher irregularity (the worse scores mentioned by R2)

However, accuracy vs smoothness trade-offs are common when comparing registration models. Higher irregularity may be required for this data as neonatal preterm brains display strong shape variation and inter-subject warping needs more irregular warps for better matching. Further, folding & SDlogJ in Fig 3 (row 1) for mCR+CR is highly competitive with the other appearance losses, which can all be used for any anatomy, whereas ‘brains’ is specific to brains and needs training labels

Also, these comparisons have other confounders: (1) The public SM model was trained with displacement regularization while we act on velocities. This confounds smoothness vs dice comparisons as our model may behave differently with displacement penalty (to be explored in future work) (2) We train velocity lambda-conditioned hypernetworks for all models except SM whose public models (used here for best case performance) are trained with a fixed displacement-lambda

Thus, exact 1-to-1 comparison needs to extend the SM simulation pipeline to neonates, hypernetworks, and velocity penalties which is a new research project. We’ll detail these nuances in Sec5

|| AC, R2: “extra networks time/memory cost” We’ll list extra memory+time needed. Notably, these increases are small. Eg, an MI loss ch64 model needs 35hrs to train for 100k steps while ours needs 40hrs

Detailed Responses:

|| AC: notation + v penalty We’ll correct the notation, thank you

There may be a miscommunication: velocities ‘v’ are on a spatial grid and integrate to give displacements ‘d’. v ^2 has a very similar effect to d ^2 (both smooth the warp, see SupplFig3 for higher v effects) and is used as it is theoretically sound for SVFs used here [“Parametric non-rigid registration using a stationary velocity field”, S1.2]

|| R1 “inter-subject warping unrealistic” We note that inter-subject warping is a common benchmark in both brain registration challenges [Learn2Reg 2021 task3] and many brain registration papers (eg VoxelMorph). Further, we work on non-rigid reg. whereas the intra-subject tasks mentioned by R1 (eg pre/intra operative) are typically affine. However, we agree that the task is synthetic and will include this in the limitations along with tasks like CT/MR

|| R2 “add more baselines” We respectfully argue that further baselines are best left to a journal as this paper is already dense [R3, AC] as we include 7 baselines & 11 ablations over methods, int steps & model sizes, 16 of which are swept over 17 lambdas

|| R2 “better to add autoencoders (AE) for new loss OR increase regnet params?” The loss is more relevant than #params. Our loss using a small regnet + 2 AEs (Tab1 A6) outperforms an older loss on a big regnet with ~9X more params in total (B3)

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Following the rebuttal, Reviewer 2 switched from reject to weak accept. All reviewers now score the manuscript as acceptable for publication (6, 5, 8).
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

5

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors present a novel approach to multimodal image registration using deep learning. The submission is written well, with lots of details and experimental results. There is clear motivation, a presentation of related work, good evaluation and comparison with the results of state of the art methods. The author also used a hypernetwork to choose some of the network parameters. The experimental results are demonstrated on the challenging data set of newborn MRI from the dHCP data set.

The authors participated in the rebuttal process and carefully answered all the reviewers concerns. They are also willing to share software related to this submission.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

1

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Perhspa due to some presentation issues, the paper has limitations in lack of clinical motivation, unclear motivation / novelty and unconvincing improvement.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

NR

back to top

ContraReg: Contrastive Learning of Multi-modality Unsupervised Deformable Image Registration