Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Shishuai Hu, Zehui Liao, Yong Xia

Abstract

Although deep learning models have achieved remarkable success in medical image segmentation, the domain shift issue caused mainly by the highly variable quality of medical images is a major hurdle that prevents these models from being deployed for real clinical practices, since no one can predict the performance of a `well-trained’ model on a set of unseen clinical data. Previously, many methods have been proposed based on, for instance, CycleGAN or the Fourier transform to address this issue, which, however, suffer from either an inadequate ability to preserve anatomical structures or unexpectedly introduced artifacts. In this paper, we propose a multi-source-domain unsupervised domain adaptation (UDA) method called Domain specific Convolution and high frequency Reconstruction (DoCR) for medical image segmentation. We design an auxiliary high frequency reconstruction (HFR) task to facilitate UDA, and hence avoid the interference of the artifacts generated by the low-frequency component replacement. We also construct the domain specific convolution (DSC) module to boost the segmentation model’s ability to domain-invariant features extraction. We evaluate DoCR on a benchmark fundus image dataset. Our results indicate that the proposed DoCR achieves superior performance over other UDA methods in multi-domain joint optic cup and optic disc segmentation.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16449-1_62

SharedIt: https://rdcu.be/cVRXA

Link to the code repository

https://github.com/ShishuaiHu/DoCR

Link to the dataset(s)

https://zenodo.org/record/6325549


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents an unsupervised domain adaptation method for medical image segmentation, introducing a novel domain-specific convolution (DSC) module to dynamically extract domain invariant features, and an auxiliary high frequency reconstruction (HFR) branch for filtering out task-irrelevant low-frequency features. The effectiveness of each module has been validated in the ablation study. Comparison study with several competitive unsupervised domain adaptation methods on the multi-domain RIGA dataset verifies the superiority of the proposed method. Yet, I have concerns regarding the applicability of the HFR component for noisy image segmentation. See below.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well-written and easy to follow.
    • The two core components are novel.
    • The comparison study and ablations study are somewhat thorough. The idea of DSC with a domain-specific controller to characterise domain-specific convolutional block is interesting and effective. The advantage with such a dynamic convolution kernel design compared to shared convolution kernel with different sets of domain specific batch normalisation layers have been discussed and validated in the experiments.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • How did the authors select the beta, the window size for filtering out the low-frequency component? The sensitivity analysis of model performance against different choices of beta is missing.
    • Any other data augmentation did the author used in this paper, except for the Fourier style augmentation based on frequency spectrum replacement? I would like to know if training images have been augmented with high-frequency noises or itself contain some high-frequency artefacts, will the HFR bias the model to keep these non-robust features in f_img? In that case, can the segmentation branch (which takes f_image as input) still produce reasonable segmentation? Have the trained segmentation model been tested on noisy test images?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Have the authors evaluated the accuracy of the domain predictor on other unseen test data? The predictor was only trained on a limited unaugmented data. Can the segmentation network still produces reasonable segmentation result when the predictor fails to predict the domain id on some OOD data? I am also interested about the source domain segmentation performance as the DSC by design can support multi-domain segmentation.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    novelty and effectivess.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors did not address my concern on the sensitivity of the proposed high frequency reconstruction HFR module against the choice of hyper-parameters. If combining the table 2 with table S1 in the supplementary, it can be found that HFP w/ beta =0.005 and 0.015 produce even lower OC performance on both BASE1, BASE2 dataset compared to the weakest baseline w/o DA. This issue should be discussed and addressed. I still vote for weak accept given the novelty of the idea. Yet given the results are not that satisfactory and the method is not stable, I wouldn’t be supervised if the paper gets rejected finally.



Review #2

  • Please describe the contribution of the paper

    The authors have considered an important problem of domain adaptation in medical image segmentation. They have proposed domain specific convolution module to get appropriate features and an additional high frequency task to guide the domain adaptation process. They have has evaluated the method on relevant datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Paper is very well written and easy to follow.

    The usage of resnet-18 to get the encoding of the input image for domain specifc controller is interesting.

    The additional branch to compensate for the differences in the low frequency data is an apprecibale solution (more like multi-task learning).

    The design of experiments is reasonable and the results are also promising.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    What would happen if we use the one-hot encoding instead of providing the output of resnet?

    A better commentary on Table 2 in discussion would improve the paper further. For instance DoCR is better than Intra-Domain, a simple explanation on these aspects.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Authors have provided enough information for reproduciblity of the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Check weakness section

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I enjoyed reading the paper because of its novel extension and also the experiments design.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    I believe the paper has merits for miccai acceptance.



Review #3

  • Please describe the contribution of the paper

    This work presents a multi-source-domain UDA method called Domain specific Convolution and high frequency Reconstruction (DoCR) for medical image segmentation, where an auxiliary high frequency reconstruction (HFR) task is proposed to facilitate UDA and the domain specific convolution (DSC) module is constructed to boost the segmentation model’s ability to domain-invariant features extraction. The experimental evaluation on a benchmark fundus image dataset demonstrated the superior performance of the proposed DoCR over other UDA methods in multi-domain joint optic cup and optic disc segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The design of HFR is based on the assumption that the style information is embedded in low-frequency components and structural information is embedded in high-frequency components. Therefore, the authors propose the HFR to filter out the low-frequency components where most style information locates and hence preserve the structure information for UDA, which is interesting.
    2. The authors conduct extensive comparative experiments with eight methods, including baseline, data-level, feature-level and decision-level, to evaluate the performance of their method, which is a plus.
    3. Fig.1 is clear and helps to understand and this paper is well-written and nicely organized.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The description of DSC module doesn’t seem very clear. For example: (1) The light-weighted DSC head is missing in Fig.1. (2) What the domain code is designed for? (3) How x^a is generated? What augmentation method is used? (4) Why the DSC module can extract domain-insensitive features?
    2. In Table 2, the results produced by w/o DA are better than the results of some methods with domain adaptation. It confuses me while necessary analysis seems to be missing.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The source code of this work is provided, which is a plus. I believe this work is reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The authors should discuss the issues I mentioned.
    2. Comparing the results between w/o DA and Intra-Domain, the domain shift is not very severe. So I encourage the authors to evaluate their proposed method on other tasks with more severe domain shift.
    3. I recommend 5-fold cross-validation to verify the effectiveness of the proposed method.
    4. I encourage the authors to visualize the reconstructed image at the inference stage to prove the effectiveness of the proposed HFR module.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The description of DSC module and some experimental results make me confused, as I mentioned before.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors answered my questions well. Overall, the proposed method is novel and effective. Therefore, I recommend to accept this paper.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The key contribution of this work is a multi-source unsupervised domain adaptation method applied to fundus image analysis. This work addresses an important limitation of several domain adaptation methods, wherein the feature and structure-specific textural characteristics of the images are often not preserved when doing synthesis-based domain adaptation. The paper is very well written with clear technical novelties, and results demonstrating the improvement over existing methods. However, reviewers raised some concerns regarding the robustness of the approach to noisy test sets, lack of sufficient details for the domain specific convolution (DSC) and Figure 1, concerns regarding the applicability of the HFR component for noisy image segmentation, as well as the choice of resnet output instead of using one-hot encoding. Sensitivity analysis for the choice of beta parameter and better discussion of results are needed. Otherwise, this is a very well-written and excellent manuscript. The authors are strongly encouraged to address the afore-mentioned reviewers’ concerns in their rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1




Author Feedback

We sincerely thank all reviewers and ACs for their invaluable comments. The code of this work is in supplementary files and will be available on GitHub.

#R1Q1: Value of beta The sensitivity analysis of model performance against beta was shown in Table 1 in the supplementary. It shows that HFR achieves the best performance on all target domains when setting beta to 0.010.

#R1Q2: Robustness to high-frequency noises/artefacts We employed conventional augmentation techniques, such as random rotation, cropping, scaling, and adding Gaussian noise, in an ‘online’ manner to diversify training data. HFR is performed based on f_img, which inevitably contains high-frequency components other than the boundaries of targets. We believe our model is robust to high-frequency noises/artefacts due to the observation that the segmentation head can depress these ‘noises’, just like any other multi-task networks do.

#R1Q3: Accuracy of domain predictor and its impact The domain predictor aims to ‘soften’ the domain code of a Fourier augmented image x^a. During inference, no Fourier augmented image is generated, and this predictor is hence abandoned. We re-split the source domain dataset into 80% for training and 20% for testing, and got an average domain prediction accuracy of 91.87%, confirming its reliability. We also compared our DoCR with baseline trained with mixed source domain data and observed an average improvement from (95.60, 82.05) to (96.01, 83.99) in source domain segmentation.

#R2Q1: Why not using one-hot domain encoding The domain code is one-hot for a source/target domain image and not one-hot for a Fourier augmented image. If using one-hot encoding only, a Fourier augmented image will be treated as from one source domain. Thus, the DSC module deteriorates into a Multi-Input-Head. However, ‘Multi-Input-Head’ results in lower performance than our DSC module (see Table 3).

#R2Q2: Why DoCR outperforming ‘Intra-Domain’ in OD segmentation This observation confirms the ability of our DoCR to extract domain-invariant features on target domain images. The superior OD segmentation performance comes from the setting of UDA that enables our DoCR to be trained on more labeled data from source domains (see Table 1).

#R3Q1: Details of DSC module

  1. The DSC head contains a 3x3 convolutional layer with 32 channels, a ReLU layer, and a BN layer. The yellow box in Fig. 1 represents the DSC head, where ReLU and BN are omitted.
  2. The domain code is designed for generating the domain-specific convolutional filters in the DSC head.
  3. x^a is generated by replacing the low-frequency components of a source-domain image with that of an image from either another source-domain or a target-domain.
  4. The domain-specific controller takes a domain code as its input and produces domain-specific convolutional filters, which are used in the DSC head (yellow box in Fig. 1). Since the DSC module uses those domain-specific convolutional filters specifically produced for the input image, it can extract domain-insensitive features.

#R3Q2: Why ‘w/o DA’ outperforming some DA methods Surprisingly, ‘w/o DA’ outperforms CyCADA, BEAL, and pOSAL in OC segmentation on BASE1. A possible explanation is that the domain gap (reflected by the performance gap between ‘Intra-Domain’ and ‘w/o DA’) in BASE1 is smaller than that between the source and other two target domains. Still, the data- or decision-level adversarial learning-based methods must fool the discriminator, resulting in paying less attention to the segmentation task.

#R3Q3: Test on other tasks with more severe domain shift It shows in Table 2 that the domain shift in BASE2 is severe, i.e., a decrease of Dice by 9.61% in OC segmentation. Even though, our DoCR can improve the Dice of OC segmentation from 79.22% to 86.17%. Further tests on other tasks with more severe domain shifts will be conducted.

#R3Q4: 5-fold cross-validation and visualization of reconstructed images Will be provided in the final version.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The key contribution of this work is a multi-source unsupervised domain adaptation method applied to fundus image analysis. This work addresses an important limitation of several domain adaptation methods, wherein the feature and structure-specific textural characteristics of the images are often not preserved when doing synthesis-based domain adaptation. The paper is very well written with clear technical novelties, and results demonstrating the improvement over existing methods. Some concerns of the reviewers regarding sensitivity analysis and robustness of the method to noisy datasets could be better discussed. Otherwise, this is a very well-written and excellent manuscript.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper introduces a novel medical image segmentation using domain adaptation and domain-specific convolution. The reviewers agree that the two modules (domain-specific convolution and high-frequency reconstruction-based domain adaptation) are novel. Although one of the reviewers downgraded the rating from 6 to 5 after rebuttal, the other reviewer upgraded the rating from 4 to 6, so the overall rating leans towards acceptance. I think that the paper has some merits that outweigh the weakness, so I recommend accepting this paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents an unsupervised domain adaptation method for medical image segmentation, introducing a novel domain-specific convolution (DSC) module to dynamically extract domain invariant features, and an auxiliary high frequency reconstruction (HFR) branch for filtering out task-irrelevant low-frequency features.

    Reviewers basically agree to accept the papers but have some concerns on sensitivity to hyper paramater

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



back to top