Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Siyi Du, Nourhan Bayasi, Ghassan Hamarneh, Rafeef Garbi

Abstract

Despite its clinical utility, medical image segmentation (MIS) remains a daunting task due to images’ inherent complexity and variability. Vision transformers (ViTs) have recently emerged as a promising solution to improve MIS; however, they require larger training datasets than convolutional neural networks. To overcome this obstacle, data-efficient ViTs were proposed, but they are typically trained using a single source of data, which overlooks the valuable knowledge that could be leveraged from other available datasets. Naïvly combining datasets from different domains can result in negative knowledge transfer (NKT), i.e., a decrease in model performance on some domains with non-negligible inter-domain heterogeneity. In this paper, we propose MDViT, the first multi-domain ViT that includes domain adapters to mitigate data-hunger and combat NKT by adaptively exploiting knowledge in multiple small data resources (domains). Further, to enhance representation learning across domains, we integrate a mutual knowledge distillation paradigm that transfers knowledge between a universal network (spanning all the domains) and auxiliary domain-specific network branches. Experiments on 4 skin lesion segmentation datasets show that MDViT outperforms state-of-the-art algorithms, with superior segmentation performance and a fixed model size, at inference time, even as more domains are added. Our code is available at https://github.com/siyi-wind/MDViT.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_43

SharedIt: https://rdcu.be/dnwDR

Link to the code repository

https://github.com/siyi-wind/MDViT

Link to the dataset(s)

[1] ISIC 2018: https://challenge.isic-archive.com/data/#2018

[2] Dermofit Image Library: https://licensing.edinburgh-innovations.ed.ac.uk/product/dermofit-image-library

[3] Skin Cancer Detection: https://uwaterloo.ca/vision-image-processing-lab/research-demos/skin-cancer-detection

[4] PH2: https://www.fc.up.pt/addi/ph2%20database.html

Reviews

Review #1

Please describe the contribution of the paper

This work proposes multi-domain vision transformers while addressing negative knowledge transfer.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper is relatively well written with clear directions and good literature review. This work explores multi-domain learning which is an interesting topic for the MICCAI community. Observations made in this work could be helpful for better understanding inter domain knowledge transfer.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Better statistical analysis is required to validate and discriminate improvement from noise.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

This work can be reproduced
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Instead of providing mean and std over all four datasets, mean and std for individual datasets should be provided. Statistical significance should be evaluated to support the claims made by the authors.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Papers structure and direction
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

6
[Post rebuttal] Please justify your decision

I thank the authors for including statistical analysis to corroborate their findings. The paper is well written with some good observations (performance on smaller vs larger datasets) which will be interesting for the community. I am sticking to my original score for this paper.

Review #2

Please describe the contribution of the paper

The paper proposes MDViT, a multi-domain ViT for medical image segmentation that includes domain adapters to mitigate data-hunger and combat negative knowledge transfer by adaptively exploiting knowledge in multiple small data domains. MDViT is a U-net shaped ViT architecture with a domain adapter module inside the factorized multi head self-attention, to adapt the model to different domains and a mutual knowledge distillation strategy to extract more robust representations across domains. It consists of a universal network and M auxiliary network branches, one for each domain, which are used only during training.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is well written and well explained and organized. The architecture is well described.

A good comparison with state of the art, and ablation studies have been conducted.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The paper states that the experiments were executed employing 5 -fold cross validation, but no standard deviation is reported in table of performance, this also makes the slightest improvement in performance meaningless.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors detailed both the architecture and the adopted training procedure.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

It is appreciated to report the standard deviation, to reinforce the validity of the assessment, especially when improvements are so slight.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

4
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the architecture is interesting, the results are not substantiated with standard deviation, and the very slight improvements are therefore not very meaningful.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

4
[Post rebuttal] Please justify your decision

The authors have met my request, but in any case the improvements are minimal, thus I confirm my rating.

Review #3

Please describe the contribution of the paper

In this paper, a Multi-domain Vision Transformer (MDViT) trained on 4 skin lesion datasets (domains) in a multi-domain manner is presented. MDViT enhances other ViTs via domain adapters (as part of MHSA) counteracting negative knowledge transfer and via mutual knowledge distillation improving the knowledge representation of the baseline network. The presented approach is a combination and extension of many existing methods of multi-domain learning.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is very well written and provides extensive analysis of results. The problem of negative knowledge transfer is well presented and the approach of combating it with domain adapters and mutual knowledge distillation is clearly explained and justified. MDViT performs better in comparison to SOTA methods. MDViT uses additional branches during the training. During inference, the universal (baseline) network is required.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The analysis of statistical significance of difference between results is not presented. MDViT without domain-specific normalization performs worse than BASE with multi-domain adaptive training and domain-specific normalization. It seems as there is a small performance gain from DA and MKD.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The Authors used 4 publicly available datasets and state that will publish the code upon the acceptance.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

The Authors state that MDViT alleviates ViTs’ data-hunger. It is not exactly clear whether this is due to combatting NKT only or there is another reason for that. It would be interesting to see whether the method scales to bigger datasets. Please provide the analysis of the statistical significance of differences in results between BASE and MDViT. BASE method seems to have more parameters and performs worse than SwinUNETR or TransFuse when trained jointly (JT) or separately (ST). Why did the Authors decide to create new baseline instead of using the existing model? What is the advantage of using BASE as the main network? The Authors state that the size of the network remains fixed at inference time when adding new domains. This would be true for all other networks trained using joint training. To retrain MDViT additional parameters for domain adapters would be required, please comment on that aspect. Only 1212 images that exhibited similar lesion conditions from the DMF dataset were used. If DA and MKD mechanisms successfully combat NKT I assume that the whole dataset could be used. Please discuss this. How would the network perform on unseen datasets? It would be interesting to see the case when the network is trained on 3 datasets and tested on the fourth.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is very well written and tackles the issue of scarce medical datasets coming from multiple sources. It introduces domain adapters and mutual knowledge distillation mechanisms in an interesting way combating issues of multi-domain learning. While the technical contribution is based on combination or extension of multiple existing methods and the performance of MDViT is not significantly higher than those of other SOTA methods and baseline the paper fits well to the MICCAI conference.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

6
[Post rebuttal] Please justify your decision

The Authors have answered my main concerns, thus, I have decided to keep my positive assessment of the paper.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
This submission proposes a multi-domain vision transformers to exploit multiple small datasets. The originality resides in proposing a mutual knowledge transfer between a cross-domain network and domain-specific network branches. The evaluation is on four skin lesion segmentation datasets. The review ranges from weak rejection to acceptance. The authors are therefore invited to address the following main concern in a rebuttal:
- Statistical significance - The gain from the proposed multi-domain transformers may be considered marginal (R1,R2,R3) when taking into account the performance and its variations (92.52 to 92.56%). A clarifying discussion on the statistical significance is necessary.

Author Feedback

(Meta-Review, R1,2,3) Statistical significance of results

To test the statistical significance of MDViT improvements, we report the p-values produced by the paired sample t-test on the null hypothesis H0 that the means of the average IOU of MDViT and of respective SOTA are equal: ST JT SOTA 0.0005 0.0005 BASE 0.0011 0.0190 SwinUnet 0.0001 0.0022 UTNet 0.0005 0.0046 BAT 0.0669 0.0506 TransFuse 0.0149 0.0181 SwinUNETR MAT 0.0005 Rundo MAT 0.0034 Wang

Adopting p<0.05 as the threshold for rejecting H0, our MDViT improvements are statistically significant in 12 of 14 comparisons.

MDViT is better by 3.56% and 1.4% on Avg IOU than BASE ST and JT, respectively, with p<0.001. On Avg IOU, MDViT surpasses the best SOTA TransFuse by 0.72%, whereas TransFuse surpasses the second-best SOTA SwinUNETR by only 0.22%.

Besides better segmentation results, we emphasize MDViT’s parameter efficiency and reduction of negative knowledge transfer (NKT). Though TransFuse SJ works well on some of our datasets, it requires a new model for each new dataset, requiring 4*26.3=105.2 M parameters for 4 datasets. Our MDViT has only 28.5 M parameters, which do not increase with more domains. Also, JT of all SOTA suffers from NKT, e.g., TransFuse JT is worse by 0.89% than TransFuse ST on DMF’s IOU, which is unsatisfactory for the desired objective of having a single model perform well across all datasets.

We agree that BASE+DSN and MDViT+DSN are comparable on Avg Dice (92.52% VS 92.56%, p-value=0.014), and BASE+DSN outperforms MDViT on Avg Dice. However, on the DMF dataset, BASE+DSN suffers from NKT while MDViT+DSN does not: BASE+DSN is worse by 0.07% than BASE ST on DMF’s IOU, while MDViT+DSN is better than BASE ST across all datasets, by 0.1% on DMF’s IOU and 3.76% on Avg IOU. Further, BASE+DSN has domain-specific modules and increases its parameters with more domains. When comparing MDViT with other multi-domain methods with a fixed model size (Table 2), MDViT surpasses them by 1.45% on Avg IOU, with p<0.005.

*(R1,2) Not reporting Std

Due to the table width limitation and the large number of columns (13), the std for each dataset had to be omitted, but we did show the std of average metrics similar to [1]. Also, please refer to the above stat. sig. results, which directly relate to std values. We can add std for each dataset to the supplementary material.

*(R2,3) Novelty & contribution of MDViT

Though adapters and knowledge distillation are not new, our method is not just a simple extension/combination of them. Our adapters do not require the domain-specific layers used in previous methods, thus saving computation and memory costs. Unlike distilling a single dataset’s knowledge from one network to another, we mutually transfer knowledge between a domain-shared network and multiple domain-specific network branches for multi-domain representation learning.

To reiterate our contribution, we are the first to use multi-domain learning to combat ViTs’ data-hunger demonstrating how a single such model can perform well across all datasets. MDViT harnesses ViTs’ capabilities even when only a few small datasets are available, as shown in our paper.

*(R3) New baseline & model size of MDViT

We could not adopt SwinUNETR and TransFuse as baselines since they use CNN decoders whereas our DA extends the multi-head self-attention in ViT.

Though MDViT has additional parameters for DA, it keeps a fixed model size (28.5 M) and will not get larger with more domains, which differs from other multi-domain methods that use domain-specific layers whose parameters linearly increase as more domains are added.

*(R3) Data selection & experiments on unseen datasets

As in [1], we use 1212 images in DMF. As our paper aims to address ViTs’ data-hunger through multi-domain learning, we did not do generalizability experiments. The advice is very interesting for future work.

[1] Bayasi et al., Culprit-Prune-Net…, MICCAI 2021.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal has addressed the main concerns on the statistical significance of the results. The improved are indicated as better than the related state-of-the-art methods, although considered marginal. The authors has clarified the several added advantage of using their multi-domain transformers, notably on an important reduction of the number of parameters. The scientific merit of the paper remains valid. For all these reasons, the recommendation is towards Acceptance.

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Two reviewers are positive and another review is negative to accept this work. To address the comment of the negative reviewer, the author provide results of standard deviation, and other experimental results in the supplementary material. Hence, this work can be accepted.

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal has sufficiently addressed the major concerns on statistical significance of performance improvement. The paper has its scientific merit to be accepted.

back to top

MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets