Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Fenqiang Zhao, Zhengwang Wu, Dajiang Zhu, Tianming Liu, John Gilmore, Weili Lin, Li Wang, Gang Li

Abstract

Modern multi-site neuroimaging studies are known to be biased by significant site effects observed in imaging data and their derived structural and functional features. Although many statistical models and deep learning methods have been proposed to eliminate the site effects while maintaining biological characteristics, they have two major drawbacks. First, statistical models are applicable for harmonizing regional-level data but are inherently not suitable to represent the complex non-linear mapping of vertex-wise cortical property maps. Second, existing deep learning methods can only harmonize data between two sites, which are practically less useful in multi-site data harmonization scenario and also ignore the rich information in the whole dataset. To address these issues, we develop a novel, flexible deep learning method to harmonize multi-site cortical surface property maps. Specifically, to detect and remove site effects, we employ a surface-based autoencoder and decompose the encoded cortical features into site-related and site-unrelated components and use an adversarial strategy to encourage the disentanglement. Then decoding the site-unrelated features with other site features can generate mappings across different sites. To learn more controllable and meaningful mappings, we also enforce the cycle consistency between forward and backward mappings. Our method can thus efficiently learn rich information from the whole dataset and generate realistic harmonized surface maps at the target site. Experiments on harmonizing infant cortical thickness maps of 2,342 scans from four sites with different scanners and imaging protocols validate the superior performance of our method on both site effects removal and biological variability preservation compared to other methods. To the best of our knowledge, this is the largest validation of different methods on infant cortical data harmonization.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43993-3_36

SharedIt: https://rdcu.be/dnwNB

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a deep learning-based harmonization technique for cortical surfaces. The network is based on a Spherical UNet, with a disentangled autoencoder structure. The authors also use cycle-based losses to better control the generation process when new combinations of structure and site are used. The results demonstrate near complete removal of site differences. The experiments are performed on infant structural MRI data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work is well-organized, with clear methods and experiments. The application of deep-learning based harmonization to cortical vertices themselves (rather than source images or derived thicknesses) is novel. The results demonstrate improved performance over statistical harmonization techniques in multiple ways. There is the added difficulty of infant MRI data, which can have technical challenges.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There are two main weaknesses of this paper. The first is the lack of comparison to image-based harmonization techniques. This would be done prior to fitting the cortical surface. This would create a complete comparison of all feasible techniques. The second weakness is the limitation to infant MRI data. It would be extremely useful to show the performance on other age groups, especially adult data.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper uses partially open data. Code release is not mentioned in the paper, but is mentioned in the reproducibility statement. This should be included in the body of the paper upon release.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    There are a few things for the authors to consider.

    1. There are 2-3 distinct groups within the “orange” group in the site graphs. This might indicate that this site is actually multiple different acquisitions. This may be causing COMBAT to have reduced performance. This could be an added benefit to the proposed method, but should be discussed.

    2. The only data used is infant data, but the method is described as general. Either adult data or some discussion of this limitation would be warranted.

    3. The authors should consider the use of image-based harmonization as a final comparison. This would solidify the superiority of these results.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper has clear novelty and demonstrated results. The weaknesses are moderate and do not detract from the strengths of the paper.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a method for harmonisation of cortical surfaces based on an adversarial autoencoder working directly on the surface. An additional cycle consistency loss is applied to create more meaningful reconstructions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The method is relatively simple for harmonisation approaches and achieves good results. The majority of generative harmonisation methods working in 2D or 2.5D and so this method working directly on the cortical surfaces is a major advantage over existing approaches.
    • The exploration of the method is substantial, exploring the ability to remove the site effect and also the ability to preserve individual variability and a validation on a downstream task, and the results across theses explorations appear good.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The method isn’t particularly novel, but rather utilises techniques that have been used before (AE and disentangled approaches have been used extensively for image synthesis including for harmonisation eg [1]), but I think the application and space that is explored makes the work interesting.
    • The introduction needs improving. Some assertions are made that aren’t true, and some very related literature is missing. Further, the key limitation of existing harmonisation methods is that they working on 2D slices or 2.5D, which would make the surface reconstruction unreliable, is not discussed despite being a key advantage of this paper. Specific comments on this are below.
    • Comparison methods are limited to COMBAT and no other DL based approaches are considered. Given all sites are harmonised to S1 comparison could have been made with a method that only does one site at a time and this noted as a weakness of these approaches eg [5]
    • No exploration into the setting of the many loss weights are provided. This should be explored in future work but is fine for a MICCAI paper.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducibility seems fine. Pre-processing details are provided. They say that the code will be released. References are provided for cohorts.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • There are some comments in the introduction which aren’t entirely true and some missing very relevant literature in what is otherwise a well written introduction. o Existing methods do exist that can harmonise more than one site (quite a number) eg [1,2,3]. Either this needs to clarify that it means only for the cortical surface or needs removing. o [3] ([7] in the paper) doesn’t generate harmonised images. o Some very related literature is missing such as [2, 4]
    • Figure 4 cannot be interpreted as its too small with too many lines. Maybe just show the best COMBAT approach so that they can be larger.
    • I think a DL baseline would make the work more convincing, and certainly should be complete for future work.
    • Discussion should be added to the introduction about existing generative harmonisation methods working on 2D slices as a limitation, as this is a clear advantage of this approach if the goal is to work with surfaces.
    • Figure 1: the caption needs more detail so that the figure can be understood in isolation.

    [1] Unsupervised MR harmonization by learning disentangled representations using information bottleneck theory – Zuo et al [2] ImUnity: A generalizable VAE-GAN solution for multicenter MR image harmonization – Cackowski et al [3] Deep learning-based unlearning of dataset bias for MRI harmonisation and confound removal – Dinsdale et al [4] Scanner invariant representations for diffusion MRI harmonisation – Moyer et al [5] Harmonisation of infant cortical thickness using surface-to-surface cycle consistent adversarial networks – Zhao et al

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Strong results working in a feature space that has been little explored for harmonisation, and would be of great interest to those working with functional maps.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    Good paper, solving interesting paper. Limited comparisons prevent being stronger accept



Review #4

  • Please describe the contribution of the paper

    The paper presents Cycle-Consistent Adversarial Autoencoder approach for the harmonization of mutli-site cortical data. It follows the CycleGAN method in [28] where is a cycle-consistency is used to ensure the accurate reconstruction of images mapped to the “style” of another site and back to the original site, as well as a cross-correlation loss to preserve the structural information when doing the mapping. The main difference comes for an additional disentangled learning strategy inspired by [6] that separates the representation into site-related and non site-related features. In a comprehensive set of experiments, the proposed approach is shown to perform better than ROI-wise and vertex-wise Combat.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The idea of combining disentangled learning with a traditional CycleGAN-based harmonization technique is interesting and, as mentioned by authors, has not been explored in previous literature.

    • Experiments demonstrate the usefulness of the method in several different scenarios related to estimating population-level developmental trajectories of cortical measures, computing these measures for cortical parcels, predicting scan age, etc. Overall, the method provides a better harmonization of cortical surface data than two variants of the well known statistical model Combat (ROI-wise and vertex-wise).

    • The paper is well written, clear and easy to follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The novel contributions of the proposed method, in particular with respect to existing CycleGAN approaches for harmonization, are not entirely clear.

    • The paper lacks a proper literature review on harmonization.

    • Experiments do not fully demonstrate the contribution of the proposed method’s component (no ablation study) and lacks a comparison against strong baselines.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The Experimental Setting section is relatively short but seems to contain the necessary information to reproduce experiments (including main hyperparameters of the method), although two of the datasets used for experiments are private. I may have missed this information while reading the paper, but I do not think authors have provided a link to their code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The method is presented as a “cycle-consistent adversarial autoencoder” (AE), in contrast to the vanilla AE and Disentangled AE (DAE), but it is in fact more closely related to CycleGAN which is a common approach for harmonization. I would help to better position the paper with respect to this approach.

    • On a related point, I feel the paper discards a lot of literature on medical image harmonization, including several CycleGAN based approaches using similar strategies. See for instance Fig. 3 of (Hu et al., 2023). The contributions proposed method should be situated within this literature, and not only with respect to auto-encoders.

    • The paper mentions that existing methods can only harmonize data between two sites, making them less useful for multi-site scenarios. However, some existing approaches already tackle such scenarios, for example, StarGAN (Liu et al., 2021) and StarGANv2 (Bashyam et al., 2022). Moreover, I do not fully understand how the proposed method solves this multi-site issue differently, as it also maps from one style to another (chosen as one of the sites).

    • In the Experimental setting, how were the hyperparameter values selected? The authors do not mention a validation set for selecting these values.

    • Several experiments follow the protocol of [28], however the results are never directly compared with those of this previous work. In fact, the method should be compared with more recent and stronger baselines for harmonization, in addition to Combat.

    • Because there is no ablation study in the paper, it is hard to really evaluate the contribution of novel components (disentangled learning loss) on performance. It seems necessary to measure how performance varies when disabling the various loss terms. Additional ablation studies could also improve the paper, for example, evaluating the impact of the reference site.

    Hu, Fengling, Andrew A. Chen, Hannah Horng, Vishnu Bashyam, Christos Davatzikos, Aaron Alexander-Bloch, Mingyao Li et al. “Image harmonization: A review of statistical and deep learning methods for removing batch effects and evaluation metrics for effective harmonization.” NeuroImage (2023): 120125.

    Liu, Mengting, Piyush Maiti, Sophia Thomopoulos, Alyssa Zhu, Yaqiong Chai, Hosung Kim, and Neda Jahanshad. “Style transfer using generative adversarial networks for multi-site mri harmonization.” In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part III 24, pp. 313-322. Springer International Publishing, 2021.

    Bashyam, Vishnu M., Jimit Doshi, Guray Erus, Dhivya Srinivasan, Ahmed Abdulkadir, Ashish Singh, Mohamad Habes et al. “Deep generative medical image harmonization for improving cross‐site generalization in deep learning predictors.” Journal of Magnetic Resonance Imaging 55, no. 3 (2022): 908-916.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written and the method is evaluated in various applications. However, the method could be better situated with respect to recent work on medical image harmonization. Moreover, the experiments should include a proper ablation study and compare against stronger baselines.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    I have carefully read the authors’ response to the comments of all reviewers. Unfortunately, very little of this response addresses my main concerns about the paper.

    • Regarding the suggestion to position the work within the abundant literature on CycleGAN approaches for harmonization, authors declined, saying that their approach works on cortical surface data instead of images. While I agree that this is true, I do not really see the link with comment that the method is much closer to a CycleGAN than a AE, both of which can be used for surface data.

    • As for not comparing against the method presented in [28], authors claim that the method is too computationally complex and time-consuming. While this claim might be true, I believe it should be supported by empirical evidence in the paper.

    • Regarding the suggestion to have a more complete ablation study, the proposed method has a total of four loss terms the impact of which can hardly be assessed by testing only two settings (with or without the CycleGAN).

    I understand that there is limited time for preparing the rebuttal but, given the author’s dismissive answers, I cannot upgrade my original score.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    While reviewers appreciate the good results of the harmonization approach and that it works directly on cortical vertices, which is indeed a relevant representation of the cortex, there have also been significant concerns. I therefore invite the authors to address the mentioned issues in the rebuttal, which can be summarized as

    • missing/wrong information in the introduction and related work (Reviewers 3,4)
    • revision of figures & captions (Reviewer 3)
    • lack of baselines & ablation (Reviewers 1,4)




Author Feedback

We thank all reviewers’ comments and recognition of our contributions: 1) “well organized and written, clear and easy to follow” (R1, R4); 2) “method/application working directly on the cortical surface is interesting and novel” (R1, R3); 3) “good results and improved performance over statistical harmonization techniques” (R1, R3).

R3, R4, and meta-reviewer have concerns on the introduction and related work. Specifically,

1) “Existing methods can only harmonize between two sites”? We do mean “existing cortical feature harmonization methods” by “existing methods” as pointed out by R3. We apologize for the confusion and will clarify this by explicitly mentioning “existing cortical feature harmonization methods”.

2) The two more related works suggested by R3 are still image-based methods and do not directly work on cortical surfaces. We will include them when introducing image-based harmonization methods in the final version.

3) R4 suggested “situating our method within a review paper (Hu et al., 2023) to cover more papers, e.g., StarGAN and StarGANv2, not only with respect to auto-encoders”. However, R4 may mistake the key idea of our paper here, as highlighted by R1 and R3, “the major advantage of the paper lies in the method/application directly working on the cortical surfaces rather than source images”. As suggested by R4, we checked Fig. 3 of the review paper and it shows that current cortical feature-level harmonization methods are still based on auto-encoders, while many GAN variants exist for image-level harmonization. Since our contribution is primarily on feature-level harmonization, making image-level harmonization methods less relevant, we believe the current literature review focusing on feature-level approaches is fine. We will cite Hu et al., 2023 for clearer reference.

R3’s concern on the visibility of Fig. 1 and Fig. 4. We will enhance the captions and enlarge the figures for easier viewing in the final version.

R4 asked “why not compare with the results of [28]”. As already explained in Introduction, “[28] is inefficient and inconvenient in practice for harmonizing multi-site data, because a model needs to be re-trained between any two sites and ignores rich global information in the whole multi-site data”. In our experiments, we found that [28] is computationally complex and time-consuming, taking several days to train models between any two sites, which is inconvenient and less useful in real multi-site scenarios. On the other hand, our method leverages the prior global information across different sites and can harmonize cortical features from any site without extra training after pre-training a model with sufficient multi-site data. We will make this clearer.

R4 suggested evaluating the impact of the reference site. We appreciate the comment; however, as already mentioned in the paper, “our method maps less reliable sites (with low-quality images) to a more reliable site (with high-quality images)” as mapping high-quality data to low-quality data is not meaningful. Our method always selects the site with highest-quality images as the reference site. We thus believe there is no need to evaluate less reliable sites (typically with noisy and less accurate measures) as the reference site.

R4 suggested “evaluating the contribution of the disentangled loss when disabling various loss terms”, which is already done in the paper. The DAE model is without cycle losses and only utilizes reconstruction and disentangled losses, while the CDAE model incorporates all losses. We will clarify this in the final version.

R1, R3, and R4 also recommend performing more validation on adult data, comparison with image-based methods, ablation study with different parameter settings. We really appreciate these great suggestions but we cannot guarantee their inclusion in the final version due to the page limit. In this regard, we agree with R3 that “this will be explored in future work but is fine for a MICCAI paper”.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Although the reviews are, even after rebuttal, not all positive, I believe that the paper has more strengths than weaknesses. In particular the fact that it works directly on cortex meshes distinguishes this work from the bulk of alternative image-based techniques.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal failed to satisfy the reviewer’s comments adequately. Important comparisons to existing methods are missing and the authors’ response for not including them was not sufficient. For e.g. quantitative results could either support or undermine the author’s claim. This was not addressed in the rebuttal. The ablation testing of the method in the presence of four different terms was not adequate, as only two settings (on/off cor CycleGAN) were empirically tested. Finally, there was a concern about the method aplicable to only cortical surface data. While the authors addressed this point in the rebuttal, their explanation that the relevant highlighted and cited methods worked on images and not cortical surfaces was not adequate.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have adequately addressed the comments regarding the missing details (introduction and figures) and experiments (baselines and ablation study). They have noted that the reviewers’ comments will be incorporated into the final version of the paper. Hence, I suggest accepting this paper for MICCAI.



back to top