Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Alvaro Gomariz, Huanxiang Lu, Yun Yvonna Li, Thomas Albrecht, Andreas Maunz, Fethallah Benmansour, Alessandra M. Valcarcel, Jennifer Luu, Daniela Ferrara, Orcun Goksel

Abstract

Accurate segmentation of retinal fluids in 3D Optical Coherence Tomography images is key for diagnosis and personalized treatment of eye diseases. While deep learning has been successful at this task, trained supervised models often fail for images that do not resemble labeled examples, e.g. for images acquired using different devices. We hereby propose a novel semi-supervised learning framework for segmentation of volumetric images from new unlabeled domains. We jointly use supervised and contrastive learning, also introducing a contrastive pairing scheme that leverages similarity between nearby slices in 3D. In addition, we propose channel-wise aggregation as an alternative to conventional spatial-pooling aggregation for contrastive feature map projection. We evaluate our methods for domain adaptation from a (labeled) source domain to an (unlabeled) target domain, each containing images acquired with different acquisition devices. In the target domain, our method achieves a Dice coefficient 13.8% higher than SimCLR (a state-of-the-art contrastive framework), and leads to results comparable to an upper bound with supervised training in that domain. In the source domain, our model also improves the results by 5.4% Dice, by successfully leveraging information from many unlabeled images.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_34

SharedIt: https://rdcu.be/cVVpP

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a contrastive learning-based method to deal with unsupervised domain adaptation problem on the OCT segmentation task. The key technology is the positive pair selection strategy. The authors not only utilize the augmentation samples but also the rounded slices in the 3D volumn. The results on their own collected dataset show its effectiveness on various ablation studies.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is easy to follow and well organized.
    2. Utlizing the adjacent slice to be the positive sample is interesting and reasonable.
    3. The ablation studies is exhaustive and improve a lot compared with the baseline methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. As said in the paper, the final pairing strategy is first generate P_slice and then apply augmentation on them. Actually, such operation will miss the original P_augm which only does the augmentation on the same slice. The results under P_augm + P_comb would be interesting.

    2. Actually, the improvement of updated Projection head did not bring too much improvement from 27.21 to 27.77 in Table 1.

    3. “In Table 1 and Table S2, UpperBound results for a supervised model trained on labeled data from the target domain are also reported for comparison. This labeled data, used here as a reference, is ablated for all other models.” According to the description, the dice performance of SegCLR(Pcomb ,Cch ) should be 29.32 + 27.77=57.09, right? Isi it too low for a segmentation task?

    4. Only one dataset is utilized for testing. The generalization ability of the proposed may need to be verified on other datasets.

    5. Missing comparison with other unsupervised domain adaptation methods.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No code is provided. Some parameters have been provided in the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    See above.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    No comparison with the state-of-the-art unsupervised domain adaptation methods. Only one evaluation dataset is utilized.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    The authors propose semi-supervised contrastive learning approach for domain adaptation, including augmentation strategy for pair generation, and projection head for embedding generation from convolutional features. The model validated on Spectralis and Cirrus datasets emerged from clinical trials.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    I find this paper interesting, especially application of the contrastive learning to the problem of the domain shift between OCT devices. The method seems to be well-motivated, novel to some extent, as the majority of the methods focus on adversarial domain adaptation, and validated.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • some formulas are not clear
    • validation compares only with contrastive learning methods
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method is reproducible

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • some formulas are not clear tilda notation is typically applied to random variables. It’s more suitable to use x \in D in the sum limit, or specify sum limits explicitly. Also in formula (1) for example, is the summation is performed over all samples of the dataset or over all pixels in y_i and F(x_i)? same applies to other formulas. What’s the difference between d1 and d2? Both have L2 norm in denominator. Formula (5) uses \in notation instead of tilda. The formulas require revision.

    • validation compares only with contrastive learning methods. The paper lack the comparison with the domain adaptation methods of another origin, CycleGANs for instance [1,2], or adversarial domain adaptation [3] or [4]. Classification methods can be adopted to segmentation.

    In table 1, please also mention baseline score in absolute numbers, for positive numbers it might be more intuitive to use + to highlight that these numbers are relative.

    [1] USING CYCLEGANS FOR EFFECTIVELY REDUCING IMAGE VARIABILITY ACROSS OCTDEVICES AND IMPROVING RETINAL FLUID SEGMENTATION https://arxiv.org/pdf/1901.08379.pdf [2] Domain Adaptation via CycleGAN for RetinaSegmentation in Optical Coherence Tomography https://arxiv.org/pdf/2107.02345.pdf [3] UNSUPERVISED DOMAIN ADAPTATION FOR CROSS-DEVICE OCT LESION DETECTIONVIA LEARNING ADAPTIVE FEATURES https://ieeexplore.ieee.org/document/9098380 [4] Unsupervised Domain Adaptation by Backpropagation https://arxiv.org/abs/1409.7495

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I tend to weak accept since from one hand it’s an interesting method and interesting application, from the other hand, the paper slightly lacks in validation part.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors addressed the concerns raised in the reviews, most importantly validation part. I’m happy to raise the score.



Review #3

  • Please describe the contribution of the paper

    The authors propose an unsupervised domain adaptation method for the segmentation of 3D OCT images. They train a U-Net segmentation model with full supervision on the source domain, and explore the effect of an adapted contrastive loss function, a new augmentation strategy, and a new projection head channel-wise aggregation. They state that they outperform other contrastive frameworks on the target domain, and achieve similar (or sometimes better) results as the supervised method on the source domain.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written and clearly structured. The method and the performed experiments are described in detail. The evaluations, comparison to SimCLR and SimSiam, as well as the ablation study are sound. The presented method can outperform existing frameworks on the given dataset for the segmentation task. While the contrastive framework is well known, the novelty lies in the combination of the adapted contrastive loss, the augmentation strategy, and a new projection head. The supplementary material also includes a hyperparameter analysis.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -In Table 1, only the scores relative to the baseline are given. This is confusing, as the reader has no idea how good or bad the performance of the baseline method is. In the text, it is stated that the baseline method has a poor performance on D_t, but this performance is shown nowhere in the main paper. Looking it up in the supplementary for each class is a bit cumbersome. -It is unclear whether this approach is done for the 3D volumes or only 2D slices. In the abstract and introduction, it is stated that this is a 3D segmentation. However, the pairing is performed on the slices, and the cited U-Net is also originally implemented in 2D. Is the segmentation performance then computed in 2D or 3D? -In Figure 3, the relative scores are reported, which is somehow confusing in my perception. It would be nice to compare the real performance on D_s and D_t, instead of a relative loss of performance. -In Section 3.4, the authors state that their method with unsupervised domain adaptation remarkably produces results within the ranges of experts’ variability. However, I do not see that in Figure 4. If the authors want to make this statement, they should explain more about this in Section 3.4.

    • The proposed projection head C should be included in Figure 1. -Some comment on the model complexity (e.g. number of parameters) should be given. -The authors only compared themselves against similar contrastive learning approaches. It would be interesting to see the performance for other UDA methods such as disentanglement- or adversarial-based methods.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The loss function and hyperparameters are described. The architecture is described. However, there is no link to code and no declaration of the authors to make the code publicly available. It is also unclear whether the dataset is publicly available or not.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Please address all points listed under “weaknesses”. Please refer in the main paper to the results given in the supplementary, as they help for the understanding.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This approach introduces three novel changes to an existing contrastive learning framework, which lead to an improvement of the segmentation performance in an unsupervised domain adaptation scenario. While the method is well presented and the results are sound, the presentation of the results should be improved. Moreover, comparison to more state-of-the-art methods would be interesting.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    All there reviewers noted the novelty in the method and appreciated proposed use of contrastive learning for domain adaptation, and that the paper was well-written expect for the presentation of the results. They also all remarked on the missing comparison to alternative domain adaptation strategies. In the rebuttal, among other, please clarify if the segmentation is 3D or 2D as well as the comparison to inter-grader variability as noted by R3.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4




Author Feedback

We address reviewer (R#) comment (C#) on weakness, grouped topically:

Comparison to other unsupervised domain adaptation (UDA) strategies< We originally focused on the s.o.t.a. contrastive learning (CL) literature: SimSiam, SimCLR. As suggested (R1C5, R2C2, R3C7) below are additional comparisons:

  • CycleGAN: we adapt [1] (R2C2) to our UNet by using entire slices. Training converged with meaningful translated images from Dt to Ds, on which we run pretrained Unet.
  • Domain Adversarial Neural Network (DANN): Since [2] (R2C2) is specific to detection, we instead adapted the gradient reversal layer (GRL) from [3] with the design in [4] for segmentation.

With remaining parameters the same as in our paper, the relative metrics (Dt Dice&UVD, Ds Dice&UVD) as in Tab1 are: CycleGAN: -6.53,+2.41,0,0 DANN: +17.93,-5.25,-0.51,+0.02

DANN performs better than Baseline & Finetuning, but much worse than our proposed SegCLR(Pcomb,Cch). CycleGAN performs inferior to even our Dt Baseline (contrary to that shown in [1], which is likely due to our Baseline being much superior to that of [1] having a Dice ~0). Also, CycleGAN in [1] performs much worse than their UpperBound, while our proposed SegCLR is close (and at times superior) to our UpperBound!

In conclusion, our method is superior to the compared CL & adversarial methods.

[1]Seeböck et al.”Using CycleGANs… across OCT…” ISBI’19 [2]Yang et al.”UDA for cross-device OCT…” ISBI’20 [3]Ganin & Lempitsky:”UDA by backpropagation” PMLR’15 [4]Bolte et al.”UDA to improve image segmentation…” CVPR W’19

Comparison of number of parameters< (R3C6) Cpool increases Unet (Baseline) parameters by 6.85% and 7.33% for SegCLR and SegSiam, respectively. Cch adds a mere 0.03% more parameters than in Cpool.

2D or 3D UNet< (R3C2) We segment individual slices with 2D UNet, and evaluate per-slice, since (1) only some slices were annotated in OCT volumes; (2) this enables our slice-contrasting scheme.

Revising formulas< (R2C1) d1 uses L1, and d2 the L2 norm (see Sec2.1). We agree to use “x \in D” for summation. Eq1 will be clarified for sums occurring over pixels, while the loss being a sum across samples.

Use of relative vs. absolute metrics< Averaging metrics across classes with large variation may lead to bias. Thus, we first normalized each metric (m^c_i) for method i and class c, by its class Baseline (m^c_bas) and then averaged them, i.e. average of (m^c_i - m^c_bas) over all c, for reporting in Table 1. This is similar to paired tests in stats. Yet we also reported per-class absolute metrics: illustratively in Fig2 and via tabulation in Tables S2&S3. To address any confusion (R1C2&3) and to improve presentation (R2C3 + R3C1), we will explain the above and add averaged absolute metrics (which indicate similar improvements) as a column in Tab1.

Value of C_ch w.r.t. C_pool< (R1C2&3) Note that the small (0.56) Dice difference puts C_ch a striking (36%) closer to UpperBound than C_pool. Indeed, C_ch is superior to all methods in both domains and metrics, and is the only method superior even to D^s Baseline in both metrics.

Intergrader variability< (R3C4) Whereas similar literature (on UDA&CL) often compare only to Baseline/UpperBound, we intended to further analyze w.r.t. intergrader variabilities. We will replace the original “… within the ranges of experts’ variability” with the more-precise statement: “We evaluated segmentation metrics for graders by comparing them with one another. We deem our method within intergrader variability when its metric for a class and image with respect to any grader is better than that of at least one human intergrader metric (variation). Across images and classes, SegCLR(Pcomb,Cch) performs within such inter-grader variability in 65.34% and 48.30% of cases based on Dice and UVD, respectively.”

To leave space for adding CycleGAN&DANN results and other clarifications, we will remove intergrader Fig4, which becomes redundant with the above-added explanation.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    With the rebuttal, the authors have clarified the issues around the inter-grader variability and the originally confusing way to report the results. I find it a strong and interesting paper addressing an important domain adaptation challenge.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    There seems to be consensus among the reviewers that the proposed method is novel and an interesting use of contrastive learning. There were some concerns about the validation and lack of some comparisons which appear to be resolved after the rebuttal which lead one of the reviewers to raise their score. The use of a single source/target dataset pair might limit generalizability of the results to some extent, but I don’t think this is a fatal flaw.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    All reviewers agree the paper has its merits including esp. the novel use of the constrastive learning idea here. There are also concerns on e.g. the empirical evaluations. Overall I feel the paper is interesting and could be considered for publication after serious modifications based on the reviews.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    12



back to top