Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Heng Li, Haojin Li, Wei Zhao, Huazhu Fu, Xiuyun Su, Yan Hu, Jiang Liu

Abstract

The annotation scarcity of medical image segmentation causes challenges in collecting sufficient training data for deep learning models. This obstacle means that models trained on limited data may not generalize well to other unseen data domains, resulting in a domain shift issue. Consequently, domain generalization (DG) is developed to boost the performance of segmentation models in handling unseen domains. However, the DG setup requires multiple source domains, which can impede the efficient deployment of segmentation algorithms in real clinical scenarios. To address this challenge and improve the segmentation model’s generalizability, we propose a novel approach called the Frequency-mixed Single-source Domain Generalization method (FreeSDG). By analyzing the frequency’s effect on domain discrepancy, FreeSDG leverages a mixed frequency spectrum to augment the single-source domain. Additionally, self-supervision is constructed in the domain augmentation to seamlessly learn and inject robust context-aware representations into the segmentation task. Our experimental results on five datasets of three modalities demonstrate the effectiveness of the proposed algorithm. FreeSDG outperforms state-of-the-art methods and significantly improves the segmentation model’s ability to perform well on unseen domains. Therefore, our approach provides a promising solution for enhancing the generalization ability of medical image segmentation models, especially when annotated data is scarce.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_13

SharedIt: https://rdcu.be/dnwJv

Link to the code repository

https://github.com/liamheng/Non-IID_Medical_Image_Segmentation

Link to the dataset(s)

http://www.isi.uu.nl/Research/Databases/DRIVE/

https://www.kaggle.com/c/diabetic-retinopathy-detection

https://figshare.com/articles/dataset/LES-AV_dataset/11857698/1

http://www.retinacheck.org/datasets


Reviews

Review #1

  • Please describe the contribution of the paper

    The limited availability of annotated medical image data makes it difficult to train deep learning models for segmentation, which can lead to poor generalization performance on unseen data domains. Domain generalization (DG) has been developed to address this issue, but it requires multiple source domains, which may not be feasible in real clinical scenarios. To improve segmentation model generalizability, the Frequency-mixed Single-source Domain Generalization method (FreeSDG) is proposed, which uses a mixed frequency spectrum to augment a single-source domain and incorporates self-supervision to learn robust context-aware representations. Experimental results on five datasets of three modalities demonstrate that FreeSDG outperforms state-of-the-art methods and significantly improves segmentation performance on unseen domains.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The proposed paper is well written and easy to follow.

    • The proposed approach enables the development of accurate and generalizable segmentation models that can be deployed in real-world clinical scenarios.

    • The proposed approach is a combination of several techniques and seems technically novel.

    • The authors provide solid and convincing ablation study.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Future work is not discussed in the paper.

    • The effectiveness of the approach depends on the availability of a suitable single-source dataset and the quality of the self-supervision technique used to inject robust representations into the segmentation model.

    • The paper does not provide a detailed analysis of the computational complexity or scalability of the FreeSDG approach.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The main idea of FreeSDG is to leverage a frequency-based domain augmentation technique to extend the single-source domain discrepancy and inject robust representations learned from self-supervision into the network to boost segmentation performance. The approach is designed to address the limitations of domain generalization methods that require multiple source domains and may not be feasible in real clinical scenarios. FreeSDG employs a mixed frequency spectrum to augment the single-source domain and incorporate self-supervision to learn context-aware representations. The experimental results demonstrate that the proposed algorithm outperforms state-of-the-art methods and significantly improves segmentation performance on unseen domains. The proposed paper is well written and easy to follow. The authors used public datasets and the paper is easy to follow, however, the approach very depends on architectural design of the model and source code would be very helpful to reproduce the paper. The proposed approach is a combination of several techniques and seems technically novel. Future work is not discussed in the paper. The paper would be a good asset for the conference.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The authors used public datasets and the paper is easy to follow, however, the approach very depends on architectural design of the model and source code would be very helpful to reproduce the paper.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Technical novelty, reproducibility, and results achieved.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    My decision did not change.



Review #3

  • Please describe the contribution of the paper

    The authors of this paper propose a novel approach that combines data augmentation and contrastive learning by mixing frequency spectrums of trained images. The goal is to improve the generalization of a segmentation model trained from a single-source domain. Unlike existing methods, the proposed approach only requires a single-source domain dataset. Furthermore, the approach extends frequency mixing from a local range to the full view range in the frequency domain. Extensive experimental studies demonstrate the superiority of the proposed approach compared to other state-of-the-art methods when tested on out-of-domain datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The authors in this paper proposed a novel approach that performs data augmentation and contrastive learning approach by mixing frequency spectrums of trained images to improve the generalization of a segmentation model trained from a single-source domain.
    2. Extensive experimental studies demonstrate the superiority of the proposed approach compared to other state-of-the-art methods when tested on the out-of-domain dataset.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Some parts in the paper were not explained clearly. Please see section 9 for sepcific points.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The details about the method implementation are clear.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. For segmentation inference, what kind of data will be input into the coupled segmentation network? Will the testing data for the proposed approach also need to be filtered as during the training period? What about the inputs of the other compared methods? Are they the same as the testing inputs for the proposed method?
    2. In contrastive learning, why is ‘the rest one x0 cast as the specific view to be constructed from the mixed one’? Can other views be used for reconstruction?
    3. The term ‘data dependency’ was not explained clearly, even though it was commonly used in the paper. Can the authors explain the term ‘consistent data domain’ in Table 1? If a reconstruction task is used in contrastive learning in the proposed method, can it be considered as ‘consistent data domain’-required?
    4. Can the authors mention the sizes of each dataset used in the experimental studies?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors of this paper propose a novel approach that combines data augmentation and contrastive learning by mixing frequency spectrums of trained images. The goal is to improve the generalization of a segmentation model trained from a single-source domain. Unlike existing methods, the proposed approach only requires a single-source domain dataset. Furthermore, the approach extends frequency mixing from a local range to the full view range in the frequency domain. Extensive experimental studies demonstrate the superiority of the proposed approach compared to other state-of-the-art methods when tested on out-of-domain datasets.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    This paper explored the impact of frequency on domain discrepancy in medical image segmentation. The authors proposed a frequency-mixed domain augmentation by extracting and mixing diverse frequency views to augment the single-source data. A self-supervised task is also introduced to learn robust context-aware representations. Experiments on various medical image modalities demonstrate the effectiveness of the proposed approach.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Proposed a frequency-mixed domain augmentation. Previous work found the domain shifts between the source and target could be reduced by swapping the low-frequency spectrum (LFS) of one with the other. The authors found uniformly removing the LFS reduces inter- and inner-domain shifts. They propose to corp random patches from a frequency view and mixed with diverse ones from the identical image to conduct data augmentation.

    2. A self-supervised task is simultaneously acquired from the augmentation to learn generalizable context-aware representations from view reconstruction.

    3. Experiments on various medical image modalities demonstrate the effectiveness of the proposed approach.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Augmentation details are not clear. The authors did not give the details about the important contribution. For example, how many frequency views are used? how to define and implement frequency filter? When cutting the crops from images, how to control the crop size and location?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducibility is minor. The authors did not include code. No details about the augmentation methods are given to reproduce the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The authors should include more details on the augmentation which is the claimed to be the main contribution.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors propose a frequency-mixed single-source domain generalization method which shows effectiveness in multiple medical image segmentation experiments. However, the authors ignore many important details of the proposed method, which makes it difficult to evaluate the propose method.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper addresses the problem of domain generalization with a focus on 2D segmentation. Specifically, the goal is to improve the generalizability when only data from one source is available during training. The main idea is that processing of the input image in the frequency domain will improve domain generalization.

    Although the topic is important, the paper is poorly written. The authors fail to justify why their method should work. In fact, I think the observed improvement is liklely due to poor comparison with other methods. I urge the authors to read the literature on “neural networks with random weights”. To summarize, the neural networks (contrary to common belief) do indeed extract features with a wide frequency spectrum from data. Therefore, the observed improvements is likley due to increased model size/capacity or other such factors.

    I am thankful to the three reviewers; but disappointed that they did not notice the glaring weaknesses of the paper. I have strong negative opinions about the paper. With all due respect to the authors for their work, and I think the work does not have a good technical quality, yet makes highly overblown claims such as in the Conclusion where they write: “our approach has the potential to revolutionize medical imaging”. But since two of the reviewers have suggested accept, I will send this paper for rebuttal. If the authors fail to address my comments, I will recommend rejection.

    Many statements in the paper are unclear or do not make any sense to me. Examples:

    • In abstract: “Additionally, self-supervision is constructed …”
    • Page 4: “swapping/integrating the low-frequency spectrum”
    • Page 4: “Motivated by the hypotheses, domain augmentation is implemented by the filter Fn(·) with perturbed parameters.”
    • Page 5: “Notably, self-supervision is simultaneously …”
    • Page 5: “… to properly couple the features from …”

    • Authors fail to provide the settings for the compared methods, which makes the results/comparisons less solid.

    • Why do you think your model trained on fundus photography should work on “ultrasound images of joints with cartilage masks”? Will it work on anything and everything? What is the limit?

    • Why do you think the sub-figures in Figure 2 are different? They look similar to me. On page 4 you write “As shown in Fig. 2, compared to the raw images …”; I do not see this in Figure 2; please clarify.

    • Page 2: “Complex networks and shape priors constrain algorithms’ efficiency and versatility, negatively impacting clinical deployment.” This is not true at all. What is your source?

    • What is Mcc in the table? Please define.

    • The method has been presented explicitly as a 2D segmentation method. Explain if that is the case, as it inherently limits the method to 2D medical images.

    • You mention DRIVE first on page 4 without any reference or explanation.

    • Many important details are missing in the methods. How do you compute the frequency information? What are the \theta? Do you only use the magnitude or phase or both? What frequency ranges?

    • Typos: firts removement

    For rebuttal, please address my main comments above and major criticisms from reviewers, and also the following:

    1- Please re-write every sentence in the paper where categorical/general statements have been made without supporting evidence. Your paper is full of these statements. To give you some examples, here I only go through your list of main contributions on Page 2:

    • “efficient SDG algorithm”: Why do you claim it is “efficient? How did you quantify this in your paper?
    • “frequency factor”. What is this? If this is a standard term, give reference. If not, define it.
    • “margin of the single-source domain”. Define what margin means here.
    • “robust context-aware representations”. What is context-aware representations? To answer this question, please give an example of a representation that is “not” context-aware. Also, why do you say it is robust? How do you define robustness; and how did you quantify robustness?
    • “seamlessly” can you provide an example of a possible architecture that is not seamless?

    2- Why do you think frequency processing should have any significant impact on the results of a neural network for segmentation, in light of the fact that neural networks already extract highly rich frequency iformation from images? See the two example papers below: Saxe, Andrew M., et al. “On random weights and unsupervised feature learning.” Icml. Vol. 2. No. 3. 2011.

    3- You claim that your model trained on “vessel segmentation dataset on fundus photography” geenralizes well on “Ultrasound images of joints with cartilage”. Why should that be? What is the limit. If you apply it on 2D slices of brain MRI, what do you expect your method will do?




Author Feedback

We are grateful for the reviewers’ valuable comments. All comments have been carefully considered and accordingly revised in the paper. Responses to major comments are provided below, but the space is insufficient to cover more detailed ones.

Q1: Details and code. (META, R3, R4) A: Comparison methods are all implemented using public code, and our code will release on GitHub publicly. Due to the length and anonymity requirements, some details were not provided before. And following details have been added to the paper. FMAug is conducted with a Gaussian filter g(r,σ). The specific filter is given by F_0(x)=x-x*g(27,9), and r and σ are randomly sampled from [5,50] and [2,22] for F_n. The mixing mask is randomly sampled with the center in [128, 384] and size in [32, 256]. In the inference phase, our model loads images uniformly filtered by F_0. In the experiment, the data consists of training/style/test datasets, where the training dataset is 10-fold augmented, and style data are used by the comparison methods for style augmentation. Fundus photography (FP) data contain DRIVE(40 cases)/EyePACS(88,702)/LES-AV(22) + IOSTAR(30), and ultrasound image (UI) data are collected with changing settings and sorted as 517/7530/1828. DICE and Matthews’s correlation coefficient (Mcc) are used to quantify the results.

Q2: Impact of frequency processing on the segmentation results of neural networks. (META) A: We fully agree with the reviewer’s point that neural networks extract rich frequency information from images. This is also a foundation of our work. Previous studies on “neural networks with random weights” verified neural networks extract frequency information from images, and recent studies [1-2] further validate that domain shifts mainly come from low-frequency information. Therefore, style transfer by swapping low-frequency spectrum (LFS) [1-4] was used to generalize segmentation neural networks. Inspired by these studies, our study attempts to extend the single-source domain by diversifying the removal of the LFS. Moreover, a self-supervision is also designed by reconstructing a specific filtered image to learn representations robust to low-frequency variation. The manuscript has been accordingly revised following this suggestion. [1] Yang Y., et al. “FDA: Fourier domain adaptation for semantic segmentation.” CVPR 2020. [2] Huang J., et al. “FSDR: Frequency space domain randomization for domain generalization.” CVPR 2021. [3] Liu Q., et al. “FedDG: Federated domain generalization on medical image segmentation via episodic learning in continuous frequency space” CVPR 2021. [4] Xu Q., et al. “A Fourier-based framework for domain generalization” CVPR 2021.

Q3: Experiments on fundus photography and ultrasound images. (META) A: The experiments on FP and UI are independently executed to verify our model on multiple imaging modalities, rather than generalizing the model from FP to UI. The misleading description has been revised. For more details refer to Q1.

Q4: Revision of categorical/general statements. (META) A: We have double-checked the overstate/unclear/undefined sentence and rewritten the manuscript to describe our contributions as objectively and solidly as possible. E.g.: “Overall, our approach enables the development of accurate and generalizable segmentation models from a single-source dataset, presenting the potential to be deployed in real-world clinical scenarios.”

Q5: Differences between the sub-figures in Fig.2. (META) A: Fig.2 uses t-SNE to intuitively visualize the feature distribution of different datasets by features extracted with a ResNet-18 pre-trained on ImageNet. Between Fig.2 (1) and (2), the inner- and inter-dataset feature distance are both reduced by uniform LFS removal. While compared to (2), the feature distribution of DRIVE is respectively extended by (3) discriminative LFS removal and (4) FMAug. More details have been added to the paper to clarify this issue.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I thank the authors for their feedback. Nonetheless, (1) I am still not satisfied with their justification of their method. (2) I cannot see the revised paper; and the paper was poorly written throughout with strong claims and ambiguous explanations of the methodology. I cannot judge if these shortcomings will be adequately addressed. Therefore, I recommend rejection of this paper.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal unfortunately did not address my concerns. The reproducibility of the paper is poor and many details are unknown, it is hard to follow the paper.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The key concerns about this paper are regarding the writing and the basis for comparison. The authors have responded to the issues with the writing and clarified that they used the public version of the existing code, which is a standard technique.



back to top