Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Michael Gadermayr, Lukas Koller, Maximilian Tschuchnig, Lea Maria Stangassinger, Christina Kreutzer, Sebastien Couillard-Despres, Gertie Janneke Oostingh, Anton Hittmair

Abstract

Multiple instance learning exhibits a powerful approach for whole slide image-based diagnosis in the absence of pixel- or patch-level annotations. In spite of the huge size of whole slide images, the number of individual slides is often rather small, leading to a small number of labeled samples. To improve training, we propose and investigate novel data augmentation strategies for multiple instance learning based on the idea of linear and multilinear interpolation of feature vectors within and between individual whole slide images. Based on state-of-the-art multiple instance learning architectures and two thyroid cancer data sets, an exhaustive study was conducted considering a range of common data augmentation strategies. Whereas a strategy based on to the original MixUp approach showed decreases in accuracy, a novel multilinear intra-slide interpolation method led to consistent increases in accuracy.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_46

SharedIt: https://rdcu.be/dnwJ1

Link to the code repository

https://gitlab.com/mgadermayr/mixupmil

Link to the dataset(s)

https://gitlab.com/mgadermayr/mixupmil

Reviews

Review #1

Please describe the contribution of the paper

This paper describes (and evaluates) several augmentation strategies for pre-existing features for MIL in histopathology, based loosely on the existing “Mixup” strategy. Feature level (rather than image level) data augmentation means features do not need to be re-computed saving a LOT of compute power. The methods presented are essentially novel and merely inspred by the principle of mixup (combining different data), due to the nature of MIL.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

While feature level augmentation isn’t completely novel, the methods presented are, and there is evidence they are more effective than the simpler methods that proceeded them. The comparative evaluation is well conducted (all be it on one data set only).
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

There is no comparison to image based augmentation. Clearly this would not totally be comparing like-with-like due to the large extra computational effort needed to re-compute features with image based augmentation, but it would have been nice to see how the two approaches compared. The use of ImageNET as the feature generator (when clearly this isn’t the optimal feature generator -see detailed comments) is a weakness in the approach.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Seems reasonable to me.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
This is mostly a nice piece of work, presenting novel, potentially useful methods and giving a clear conclusion about which approach to take. I’ve already mentioned comparason to Image-based augmentation above. Other issues with this paper:
- Some of the English language is a bit suspect. For example: The use of the word “exhibits” is wrong in multiple places. e.g. “Multiple instance learning exhibits a powerful approach” I would consult anative speaker on this (I’d personally just say “IS a powerful approch”). Sentence on P1 starting “MIL approaches typically consist of…” P2 - “operations are slim” P2 “WSIs showing thyroid cancer tissues” (maybe containing/of rather than showing? None of these English issues prevents understanding, but do detract from the quality of the paper none the less.
P4 - “We actively decided to not use a self-supervised…” what you are saying here is you deliberately chose an inferior feature generator so you could improve things with your method. Part of me is thinking that if you can’t improve things with a state-of-the-art feature generator is there any point to your method. It’s not clear if you tried this and there was no improvement, or youdidn’t bother trying. I look forward to your answer/justification in the rebuttal as I think this is important.

I’d also be interested in why you chose a fixed number of patches per image. Surely a larger tissue piece contains more patches than a smaller one. I assume you are sampling from these. Do you ever have to over-sample for smaller tissue pieces? I think you need an argument why this is appropriate and to state the distribution of patches in the data set. If you are sub-sampling 50% of patches, that’s probably resonable. If it’s 1% less so.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper presents novel work and has a clear message.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The paper investigates the application of the MixUp data augmentation technique on multiple instance learning from whole slide images. The authors present multiple possible formulations for the technique and evaluate all of them against a fair baseline. Overall, the authors find some benefit for a single formulation, which orthogonalizes the components of the vectors in latent space before combination.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper investigates the idea of applying a previously well-known and well-studied data augmentation technique to the field of histopathology, and in particular to MIL on WSIs. I think that this is a very sound idea and worth investigating.
- The paper does not oversell its results but tries to dig into the reasons of why classical (linear) mixup fails in latent space.
- The evaluation includes a multitude of relevant conditions.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- With only one dataset used for the evaluation, the insights are limited a bit. Datasets do have characteristics as well and it might well be possible that a positive evaluation of the method on this dataset does turn out less favorable on another but similar data set. I would have loved to see this evaluated on more (and more diverse) data.
- The work is furthermore limited by only using Imagenet-embeddings for feature extraction. General-purpose feature extractors exist (Ciga et al., Machine Learning with Applications Volume 7, 15 March 2022, 100198), might have a different latent space organization and would have been interesting candidates.
- The source code and data is only available “upon request”. This unfortunately limits reproducibility.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The source and data are only available upon request, which limits the reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
Regarding the scenario with added noise: When using noise as augmentation method, you would typically want to make sure to not reach the magnitude of the original data in order to not risk data corruption. This would be, however, most likely different for each individual feature vector element. Is it sensible to use a normal distribution with fixed variance as the authors did?

The authors state that the source code is available upon request. In my own personal experience, this will be a significant stepstone for the spread of the approach. Of course, I assume that the authors of this fine paper would certainly hand out the code to every other researcher asking for it. However, the experience in the field is that it’s not easy to get code that is „available upon request“ and hence many researchers don’t even bother asking any more. Thus, in order to improve reproducibility, transparency and trust in the work, I encourage the authors to make their code (and data, if possible) available on a public repository.
- Ultimately, the experiments only show a leading edge of the approach in a small number of conditions. This only a critique about the method, not a major criticism of the paper, however, since the authors do not claim otherwise. I do like that the authors report about their results transparently.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- The paper was an interesting idea and the results are interesting as well.
- Yet, I feel that with only one dataset and one feature extractor being used for evaluation, the insights are limited.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

6
[Post rebuttal] Please justify your decision

The authors’ point that there is little space in a MICCAI paper and hence they only compared on one data set is fair. I think it is justified to accept the paper.

Review #3

Please describe the contribution of the paper

This paper is based on the MixUp image data augmentation method, which has been successful in the traditional computer vision field, aiming to improve the modelling efficiency of MIL methods for WSIs in scenarios with limited training data. The proposed method follows the core idea of MixUp by randomly combining patch-level embeddings and their corresponding labels, enhancing the smoothness of the training samples and projecting the training target to a continuous space.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper targets a specific improvement goal in MIL and proposes a credible improvement based on a proven successful method. The simple and effective idea is transplanted from the traditional image recognition field to the modelling scenario for pathological WSIs. The paper not only describes a successful method transplant strategy but also proposes a sufficient number of variants for users to choose from, conducts complete experimental verification for different variants, and provides discussion. The paper is readable and does not present significant obstacles for readers.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The proposed method is based on a technique that has been successfully applied for years. Overall, the method directly transplants this idea to the modelling of WSIs. Although interesting, it is incremental, but I am inclined to maintain expectations for the innovative value of this method in the application. The ultimate goal of data augmentation is to enhance model training efficiency. The MixUp method was proposed in 2018, and in recent years, large model pretraining, self-supervised training, and other methods have greatly improved the modelling efficiency of small data tasks downstream. Considering some issues brought by MixUp data augmentation (explained in questions), whether it is still worth doing requires more discussion. There are minor errors in the paper, such as the second to last line of the Results paragraph: “patches from different different WSIs (inter-WSI)”, where “different” appears twice.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Although the paper does not provide the source code, the proposed method can be easily implemented, and the paper also provides a sufficient description of the experimental settings.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. WSIs are characterized by a large amount of information, much of which is noise unrelated to the final classification. Will the random combination strategy of MixUp combine too many noise patches with valuable patches, thus hurting the expressiveness of WSIs? For example, in the Inter-Mix method, some WSIs have more noise, while others have less noise. Will combining them destroy the original morphological expression of WSIs? In the Intra-Mix method, will the combination of noise patches and informative patches destroy the histological expression of detail features? These combinations come from random strategies and are uncontrollable.
2. Related to the first question, we know that the goal of MIL is not only to classify WSIs, but in many cases, we also need to visualize the results to indicate bio-markers highly correlated with the training target (e.g., using attention to show important tissue areas). Will the performance of attention visualization be affected by MixUp-MIL, which enhances the smoothness of WSI feature expression while destroying the original definition of patches? Will it make the tracking of bio-markers vague?
3. We know that WSI classification based on MIL includes many types of tasks, such as prognosis, molecular feature typing, treatment response prediction, grading classification, etc., as well as different tumour types. These different tasks reflect WSI feature expression at different scales. Can the performance of MixUp-MIL be stable? The paper only validates the proposed method on one task, so is the generalizability of the method questionable?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

4
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Despite the above questions, this is still a sincere work, and I am inclined to give it a chance for rebuttal. Satisfactory answers will prompt me to raise my rating.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

4
[Post rebuttal] Please justify your decision

The authors have addressed two questions I previously raised, and the paper does make some contributions, meriting discussion at the conference. However, the paper’s limitations remain evident. Specifically, the approach of randomly sampling patches from WSIs to enhance the robustness of WSI modelling and representation does not appear particularly novel. To my knowledge, there have been several similar concepts presented in other computer vision conferences such as CVPR in 2021 and 2022. That being said, they still differ from this paper. I am inclined to maintain my current rating. However, if there were an option for a ‘borderline’ rating, I would opt for that. I would not object to the paper being accepted.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The authors present an interesting adaption of MixUp for augmenting data in the feature domain, with a central goal of reducing computational costs in the context of WSI classification.

The key strengths of the method include the straightforward adaptions/variations to the histopathology domain and the detailed investigation of the inner workings of the approach(es) and variants.

Major concerns include the limitation to a single dataset for evaluation, the lack of comparison to image-based augmentation, the impact of noise, and - for all reviewers - the lack of investigation with a more powerful / targeted feature extractor (pretrained on histopathological data or self-supervised feature extractor).

Given also the expressed wish of multiple reviewers to discuss these aspects in a rebuttal, I would encourage the authors to carefully address the questions and concerns raised by the reviewers with a focus on the two aspects mentioned above. The authors additionally may want to consider making the code (and potentially the data) available in a public repository with a suitable license.

Author Feedback

Thanks to the reviewers for carefully studying the paper and for providing constructive feedback. Overall, we understand and appreciate all major concerns. However, the formal guidelines of MICCAI papers in combination with our strong incentive to construct a readable paper, induced us to restrict to a limited number of (overall 86) experiments, even though many further aspects would also have been of interest. We are confident that this paper can provide a broad and solid basis for further studies in this field. In the following paragraphs, we will address the major points outlined by the meta reviewer ((1) data sets, (2) feature extraction methods, (3) impact of noise). As suggested by the reviewers, we will make both code and data set available (CC license) via a public git-hub repository as soon as the paper is accepted. We aimed to wait for publication to maintain the possibility of blinding for the paper review intact.

(1) Single Data set: one major point was that we only focused on a single data set (or rather a single problem statement). Due to the page limit, we carefully chose two different data sets, with quite different image characteristics. We agree that the final goal was the same, but due to the different characteristics, we are confident that this selection is appropriate for an evaluation of the proposed methods. The frozen sections have clearly less contrast, higher variability, and more artifacts than the paraffin sections. From that perspective, we rated the differences between these two data sets as even higher than the difference between paraffin data sets (e.g. our paraffin dataset, Camelyon, TCGA). We also considered adding further data sets, but finally decided that this would significantly decrease the quality of presentation (due to the page constraint and the already large number (n=86) of experiments). For this very first study within this field, we considered the investigation of different MIL methods in combination with a considerable amount of baseline tests as more relevant than a larger number of data sets. At a later stage, we surely plan to perform a larger evaluation with additional data sets and feature extraction methods.

(2) Feature Extraction: as stated by the reviewers, the investigation of self-supervised CNN training is a further remarkably interesting aspect to be studied in an extended version of the paper. Since we expect that this could lead to quite different feature characteristics (with impact on the augmentation stage), occurring effects need to be carefully studied in a larger study. CNNs pretrained on large general purpose data sets, such as image-net, however, still exhibit a competitive feature extraction approach, also for histological image data. We chose this well-studied de facto standard approach, since we wanted to focus on other aspects.

(3) Impact of Noise: reviewer 3 raised the concern that the combination of random patches might affect the final classification since ‘informative’ patches can be combined with ‘noisy’ patches. We also assumed that there might be a negative effect when randomly combining arbitrary patches for each descriptor. For this reason, we investigated several different settings with random combinations in 25%, 50%, 75% and 100% of the final descriptors. Based on the obtained results, we surprisingly did not notice any negative effect in the intra scenario. Future work, however, will also focus more on this aspect and on identifying more appropriate sampling strategies. Reviewer 2 remarked that the baseline experiment with additive noise should be performed dependent on each feature dimension’s variability and not with a fixed sigma. After an analysis of the features’ distribution, we agree and will change the description & results accordingly. After performing the new experiments, we noticed quite similar results with the adjusted setting.

Finally, thanks for pointing out linguistic errors which have been corrected accordingly.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper presented an interesting idea of translating MixUp augmentation to MIL in histopathology, with small to moderate improvements, and a transparent discussion. Main weaknesses include the use of a potentially suboptimal feature extractor and that - at least for comparison - no image-based augmentations were included as reference. In their rebuttal, the authors addressed most questions raised by the reviewer satisfactorily, and see this paper between an accept and a borderline. Given the broad interest in MIL topics, this paper focussing on a complementary aspect (data augmentation for MIL) is from my perspective a suitable addition to MICCAI.

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal has addressed some of the reviewers’ comments. However, using a single dataset for evaluation is typically insufficient for MICCAI papers, and the presented method has limited novelty.

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposed an extension of Mixup augmentation in the setting of multiple instance learning in histological images. As reviewers point out, the paper is relatively well written with the comparative studies well carried and explained. It is also interesting to investigates a previously well-known and well-studied data augmentation technique to the field of histopathology. Two significant limitation identified by reviewers and not sufficiently addressed in the rebuttal (also my major concerns): 1 Features are extracted using a network that is pretrained on imagenet, rather than histopathological images. As far as I know, to apply e.g.CTransPath (medical image analysis paper, pretrained network available online for a while, or retccl, or kimiaNet, all three are pretrained network on histopathological images) is as easy as image-net pretrained network and I do not understand why it is not done. 2. single data evaluation. Yet after a second read, I partially understand that authors want to focus on feature augmentation and they worried invariant feature extraction from contrastive learning will interfere with their augmentation (yet to me, this need experiments to confirm). But to the end, this is still an interesting paper and I would put the paper in the borderline with slight incline to accept.

back to top

MixUp-MIL: Novel Data Augmentation for Multiple Instance Learning and a Study on Thyroid Cancer Diagnosis