Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Nicola K. Dinsdale, Mark Jenkinson, Ana I. L. Namburete

Abstract

The ability to combine data across scanners and studies is vital for neuroimaging, to increase both statistical power and the representation of biological variability. However, combining datasets across sites leads to two challenges: first, an increase in undesirable non-biological variance due to scanner and acquisition differences - the harmonisation problem - and second, data privacy concerns due to the inherently personal nature of medical imaging data, meaning that sharing them across sites may risk violation of privacy laws. To overcome these restrictions, we propose FedHarmony: a harmonisation framework operating in the federated learning paradigm. We show that to remove the scanner-specific effects, we only need to share the mean and standard deviation of the learned features, helping to protect individual subjects’ privacy. We demonstrate our approach across a range of realistic data scenarios, using real multi-site data from the ABIDE dataset, thus showing the potential utility of our method for MRI harmonisation across studies. Our code is available at https://github.com/nkdinsdale/FedHarmony.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_66

SharedIt: https://rdcu.be/cVVqn

Link to the code repository

https://github.com/nkdinsdale/FedHarmony

Link to the dataset(s)

https://fcon_1000.projects.nitrc.org/indi/abide/

Reviews

Review #1

Please describe the contribution of the paper

The work presented in this paper is aimed at simultaneously handling issues related to both scanner/ acquisition differences and data privacy concerns when combining datasets across different site to form large, integrated medical imaging databases. The authors propose a strategy termed based on federated learning to address these issues and experimentally evaluate it using multi-site data from the ABIDE resting state fMRI publicly available database. Ultimately they show that only the mean and standard deviation of learned features need to be shared to address the scanner-specific effects.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is readable and the main themes readily understood. Also, the topic is current and interesting. Privacy-preserving data integration from images is helped by this approach in terms of unlearning scanner bias.

The testing performed on the ABIDE public dataset is reasonable, making use of the the T1 MR structural datasets. The 5-fold cross validation performed is appropriate and useful results are reported in the Tables.

Using age prediction from T1 MRI as the task, the mean absolute error (MAE) shows incremental improvement in accuracy over alternative strategies (e.g. FedProx or FedAvg) in both the fully supervised and semi-supervised cases, but the key new result is that the new approach (FedHarmony) reduces the ability to identify the scanner site where the individual data were acquired to close to random chance.

The PCA diagram in Figure 3 helps to further illustrate that FedHarmony moves to limit the ability to identify different scanner sites.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

While there is some innovation here in terms of the image analysis/ machine learning methodology that is used, overall it is relatively minor as the approach (including the individual loss function components and the VGG-based architecture) are mostly taken from the already-published literature. There is some original insight in terms of how the experiments were performed and the observations made about scanner classification accuracy, however.

The age-prediction accuracy achieved by FedHarmony does not appear to be particularly improved over other techniques as noted above, although this may be reasonable given the potential scanner identification improvements (reducing it to close to chance).

The testing done appears to be performed on T1 MR structural ABIDE data alone, whereas the ABIDE dataset is much richer, and includes resting state functional MRI data. Indeed others have looked Federated Learning using these rs-fMRI data (e.g. see X. Li, et al, in Medical Image Analysis , 2021). Perhaps this should be mentioned/ referenced.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility effort is decent here, but further insight into the particular hyperparameters chosen and experiments related to their settings would be welcome. These appear to be critical in the overall design in order to make the approach work properly.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

The discussion about privacy-preservation and scanner bias is interesting and this particular domain-adaptation approach is interesting. As noted above, it might improve the paper to go into some more detail regarding hyperparameter settings as well as to more highlight the methodological novelty of the approach. Finally, some discussion as to how the approach could be used with other image data types (e.g. rsfMRI) and abnormality outcome prediction (beyond just predicting age) would be helpful.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

There are some interesting insights presented regarding Federated Learning with respect to the problem of not being able to recognize image scanner acquisition sites. The most interesting ideas from the paper have to do with the experimental results found when combining these 3 particular loss terms (proximal loss, domain prediction loss and confusion loss), and the idea that obtain reasonable age prediction results while reducing the ability to identify the scanner/ site where the data came from.
Number of papers in your stack

3
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The paper describes a federated learning approach that requires minimal information sharing, specifically just the mean and standard deviation of each feature from each site. Experiments show the method outperforms baselines on a single multi-site data set.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Very timely work as federated learning is a key enabler of large scale medical imaging studies.

Simple method with high practical value.

Results are promising in the example shown.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Only demonstrated in one scenario. Although longer term it will be important to reinforce the results with more examples, I believe the single example shown is sufficient for this first publication of the idea.

Wording in the abstract “We show that to remove the scanner- specific effects, we only need to share the mean and standard deviation of the learned features” is too strong. The single experiment shows that in one specific scenario, this minimal amount of information can still produce decent results. The statement is not true in general. While I can believe that in most practical scenarios the proposed strategy will perform well, one can certainly construct scenarios where it won’t. Bounding the conditions under which the strategy performs well would be a good focus for further work on this topic and some preliminary discussion of that in this paper would be useful.

Further to the above, the simulation experiments are valuable but limited. It would be nice (not necessarily needed for this submission but for future work) to explore a much wider range of scenarios to identify conditions under which the strategy might fail, e.g. large imbalance of data from different sites, only one or two cases at some sites, large numbers of outliers at some sites, etc.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Seems fine.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Not much to add. My main thoughts are in the strengths and weaknesses boxes. This is not a topic I am deeply familiar with, although I have a good understanding. The main feedback would be to run more extensive simulations to identify the boundaries between where the strategy is and isn’t effective.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

As above, timely, practical, nicely demonstrated. Longer term needs more validation, but great as a MICCAI presentation.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Somewhat Confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The paper proposes a federated learning approach for multisite MR image harmonization. Experiments on age prediction using the ABIDE dataset show promising performance of the proposed FedHarmony on removing scanner-specific information.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is clearly written.
- Experiments are thorough. MR images from four sites are used in evaluation, and ablation studies are conducted.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The application for evaluation is not very representative—the authors explored age prediction as an application scenario of their method. However, age prediction is not a representative task for neuroimage analyses. Experimenting on other tasks like segmentation, disease prediction, or image-to-image translation could provide more insights of the proposed method.
- Figure 2 shows that the age of all four sites is similarly distributed. In other words, the variables of interest (in this case is age related features) should not be site-dependent. This requirement could limit the application of the method: aging subjects usually have anatomy related changes (e.g., brain atrophy). If one site have generally more aging subjects, this might compound the harmonization.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Good. Public datasets, implementation details are provided. Code is promised to be publicly available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

The authors should consider extending their evaluations to broader neuroimage analysis tasks, such as segmentation, disease prediction, or image-to-image translation.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is generally interesting. Although the evaluation is limited to age prediction, which is not a commonly explored task in neuroimage analyses, the general idea and the ablations make the paper an interesting topic for discussion at MICCAI.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Somewhat Confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This work proposes a federated learning method that handles heterogeneity induced by scanner/acquisition differences in multi-site learning. All the reviewers agreed that the paper was well-written, and results were promising. However, there are several suggestions and concerns that the authors could consider improving the paper quality further in the final version: 1) Several loss terms were proposed, but lack of ablation study on them. 2) Adding the missing references as suggested by the reviewers, and 3) The application of age prediction is not representative.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

2

Author Feedback

We thank the reviewers and area chair for their thoughtful comments on our work and identifying interesting areas for the future. We will incorporate your feedback when preparing the camera-ready version of the manuscript, especially discussion of future directions for exploring our approach. To address the points raised by the area chair directly:

An ablation study is included in table 1, where we explore the use of the loss functions we introduced. Further details regarding the three loss functions which control the harmonisation, which we have adapted to the federated setting, are available in [5].

The suggested reference explores the use of the ABIDE data for federated learning with functional data: while this study is interesting, we do not feel that it is informative for our study at this point, as the work is a preliminary proof of concept and T1 data is the most informative for the age prediction task. We will aim to incorporate the suggestion into further work, however, as harmonisation with multiple input modalities represents a useful direction for future investigation.

While the task of age prediction is different from many tasks in neuroimaging such as segmentation or registration, it is a well understood task which has begun to be explored for clinical data where federated learning will be most advantageous (see eg. Steps Towards Clinical Application of the Brain Age Paradigm, Cole, Bio Physc. 2022). We believe that this combined with the ready availability of ground truth labels makes brain age a suitable task for the exploration of our method. Future work will focus on the application of FedHarm to other tasks and architectures.

back to top

FedHarmony: Unlearning Scanner Bias with Distributed Data