Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Salome Kazeminia, Ario Sadafi, Asya Makhro, Anna Bogdanova, Shadi Albarqouni, Carsten Marr

Abstract

Deep learning-based classification of rare anemia disorders is challenged by the lack of training data and instance-level annotations. Multiple Instance Learning (MIL) has shown to be an effective solution, yet it suffers from low accuracy and limited explainability. Although the inclusion of attention mechanisms has addressed these issues, their effectiveness highly depends on the amount and diversity of cells in the training samples. Consequently, the poor machine learning performance on rare anemia disorder classification from blood samples remains unresolved. In this paper, we propose an interpretable pooling method for MIL to address these limitations. By benefiting from instance-level information of negative bags (i.e., homogeneous benign cells from healthy individuals), our approach increases the contribution of anomalous instances. We show that our strategy outperforms standard MIL classification algorithms and provides a meaningful explanation behind its decisions. Moreover, it can denote anomalous instances of rare blood diseases that are not seen during the training phase.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_33

SharedIt: https://rdcu.be/cVVpG

Link to the code repository

https://github.com/marrlab/Anomaly-aware-MIL

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes an interpretable pooling method for MIL to address the poor machine learning performance on rare anemia disorders classification from blood samples. Experiments demonstrate the superior performance of the proposed strategy over standard MIL classification algorithms, with providing a meaningful explanation behind its decisions.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This paper aims to address a very important and meaningful problem.
2. The idea and the overflow of the proposed framework is clear.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. For the introduction section, the contributions are not clearly summarized.
2. Why the proposed strategy can overcome the limitations of the attention mechanism?
3. For the Mask-RNN, is it pretrained on another large-scale dataset? or it is directly applied to the dataset used in the experiments.
4. For the Anomaly scoring, why using the Mahalanobis distance in Eq. (3).
5. The number of comparison methods is relatively few.
6. It is difficult to understand the difference between the Anomaly method and attention method in Fig. 3 and Fig. 4.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The idea in this paper is clear and the overflow of the proposed framework is also clear. So i think this paper has good reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
This paper aims to address an important problem, the idea and the overflow is clear, but i still have some concerns as follows:
1. For the introduction section, the contributions are not clearly summarized.
2. Why the proposed strategy can overcome the limitations of the attention mechanism?
3. For the Mask-RNN, is it pretrained on another large-scale dataset? or it is directly applied to the dataset used in the experiments.
4. For the Anomaly scoring, why using the Mahalanobis distance in Eq. (3).
5. The number of comparison methods is relatively few.
6. It is difficult to understand the difference between the Anomaly method and attention method in Fig. 3 and Fig. 4.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

4
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Please see the weakness of this paper.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

4
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

Not Answered

Review #2

Please describe the contribution of the paper

The authors proposed an interpretable MIL network to increase the contribution of anomalous instances. This work shows SOTA performance when compared with other MIL approaches.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper is mostly well-written and clear.
2. The method achieves good performance on the authors’ private dataset.
3. Most of the aspects are described in sufficient detail to enable the reproduction of results.
4. The anomaly-aware GMM modeling using the negative bag is novel, and provides good potential impact in real-world clinical applications.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. It would be better if the authors can compare with some SOTA anomaly detection approaches for anomaly recognition.
2. Authors may consider training some of the MIL methods from computer vision on microscope data and comparing them.
3. A public dataset can be used to further justify the contribution of this work.
4. The novelty is a bit limited, given that [14] proposed to use attention and MIL classification for such problems as well. The loss function is the same.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Authors claim they will release the code upon acceptance.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

The paper is well-written and the novelty can be deemed as sufficient given the use of anomaly GMM module. However, there are still a few things can be strengthen during rebuttal.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

See above.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

Not Answered

Review #3

Please describe the contribution of the paper

This paper proposes an anomaly-aware pooling strategy for multiple instance learning. The key idea is to design a latent space that uses Bayesian Gaussian mixture models to estimate the distribution of negative instances. Extensive experiments are conducted to validate the effectiveness on bag/instance classification and the anomaly analysis.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

+The idea to model the distribution of negative instances by using Bayesian Gaussian mixture models is interesting and in-depth. As traditional MIL is capability of finding the positive samples, how to alleviate its omission error rate is always an important topic. Hence, I believe the idea in this work can have impact on the MIL community, vision community, explainable community and the medical imaging community.

+The discussion of the proposed method, especially on the behavior of the anomaly score is in depth and solid.

+This paper is well written and easy to follow.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

-The motivation of this work in its current form is not clear enough. Why the negative instance estimation is necessary when using MIL need more specific justification.

-Regarding the technical framework. The aggregation from instance representation to bag representation is required to be permutation invariant. Will the proposed anomaly score warrant this aspect? The author needs extensive effort to justify this issue in the rebuttal stage.

-This work lacks multiple strongly related deep MIL based works in the past few years. For example: [1] A multiple-instance densely-connected ConvNet for aerial scene classification. IEEE Transaction on Image Processing, 2020 [2] Multi-instance multi-scale CNN for medical image classification. International Conference on Medical Image Computing and Computer Assisted Intervention, (2019) [3] Local-global dual perception based deep multiple instance learning for retinal disease classification. International Conference on Medical Image Computing and Computer Assisted Intervention, (2021) [4] Loss-based attention for deep multiple instance learning. AAAI 2020.

The authors are suggested to enrich the related work accordingly.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Either the dataset or the code is available. I don’t think I have the confidence to reassure that this work is reproduceable.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Please refer to the weakness part in Sec5 for details. The major issues to improve this work include: justification of motivation; more theoretic insight on whether the MIL pooling function; and the enrich of some strongly relevant MIL based related work.

Also, the authors are suggested to validate the proposed method on more publicly available benchmarks.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Both the strength and the weakness are obvious in this work. The strength is slightly over the weakness. I would recommend weak accept.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

After reading the comments from the other two reviewers and the rebuttal from the authors, I think all my concerns are well addressed. Hence, I lean more to Reviewer 2 on accepting this work. I would maintain my recommendation ‘weak accept’.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

There are non-converging review recommendations. The authors are encouraged to address esp. the issues raised by the reviewers including the novelties & technical contributions (summarize in the intro section), empirical evaluations (limited baselines, limited datasets esp. public benchmarks), presentation (clarify how overcome the issues with attention mechanism; issues with Figs.3&4), among others.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

8

Author Feedback

Dear Area Chair, dear Reviewers,

We would like to thank you for your constructive feedback on our manuscript and for giving us the opportunity to clarify the points raised. We are happy that all reviews acknowledge the importance (R1: “This paper aims to address a very important and meaningful problem”), the novelty (R2: “The anomaly-aware GMM modeling using the negative bag is novel, and provides good potential impact in real-world clinical applications”, R3: “The idea to model the distribution of negative instances by using Bayesian Gaussian mixture models is interesting and in-depth. I believe the idea in this work can have an impact on the MIL community, vision community, explainable community, and the medical imaging community”), and the overall presentation and organization of the paper.

To improve the manuscript, R1 and R3 asked for a clearer motivation and contribution of our method regarding the limitations of the attention mechanism. We thank reviewers for raising this point. In the revised version, we remedy this easy-to-address issue by rephrasing two sentences in the introduction section as follows: “Attention mechanisms are prone to fail in scenarios where only a few samples are available for training. Therefore, the algorithm is unable to identify relevant instances in the bags. In HHA classification, this leads to noisy attention scores where lots of disorder-relevant (positive) cells receive low attention, and the distribution of attention on non-disordered (negative) cells would be non-uniform. Here, a strategy accounting for the diversity of disorder-relevant cells and data imbalance is required. This paper introduces a robust method called anomaly-aware pooling to address these limitations of attention-based multiple instance learning”.

Regarding other baselines and benchmarks, mentioned by all three reviewers, we found the method of Sadafi et al. (MICCAI 2020) the closest to our target application and thus chose it for a fair evaluation and demonstration of how our approach improves classification performance and instance-level explanation. To the best of our knowledge, our method is the first to apply anomaly detection to MIL. Applying a GMM and Mahalanobis distance to detect anomalies significantly improves the algorithm’s attention. We refrained from using more complicated deep-learning-based anomaly detection SoTA methods (suggested by R2), such as SimCLR and AnoGAN, since they impose excessive deployment costs. We appreciate R2’s comment on utilizing our idea in more general computer vision challenges and clinical image data settings and will consider this in future works.

Attention and anomaly scores are calculated independently and processed with a 1×1 convolution, which makes our pooling method permutation invariant, an interesting point raised by R3. Moreover, the Mask R-CNN extracts features of each instance (i.e., cell) independently, regardless of its size, scale, and location. This simple step is extremely beneficial in our MIL training as it avoids challenges stemming from non-uniform information distribution in instances. The methods mentioned by R3, however, address specifically this challenge. We thank the reviewer for mentioning the works. We now cite them in our manuscript and compare their scope with our approach in the revised discussion.

Finally, R1 asked about the reason behind employing Mahalanobis distance. The high dimensional shape of the distribution of negative instances matters and Mahalanobis distance is a simple method that considers the covariance structure of the distribution. We thank the reviewer for bringing up this point and will clarify it in the method section as well.

We hope that we were able to address all major issues raised and are grateful for your suggestions. We would be delighted if our paper would be considered for acceptance at MICCAI 2022.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper deals with anomalous instances in classification with interpretable MIL network, with good empirical results. Overall it receives favorable ratings from all three reviewers. Meanwhile, the reviewers have raised a number of concerns including the novelties & technical contributions (summarize in the intro section), empirical evaluations (limited baselines, limited datasets esp. public benchmarks), presentation (clarify how overcome the issues with attention mechanism; issues with Figs.3&4), among others. The authors need to seriously go through the issues raised by the reviewers and address them properly.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

2

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Paper strengths: 1) Paper addresses an important problem 2) Paper is well-written 3) Good performance on private dataset 4) New anomaly-aware GMM 5) Good discussion on the behavior of the anomaly score

Paper weaknesses: 1) Contributions are not clearly summarized in the introduction 2) Unclear why the proposed strategy overcomes the limitations of the attention mechanism 3) Few comparison methods 4) Difference between the Anomaly and attention methods 5) Missing comparison with SOTA anomaly detection (e.g., MIL methods) 6) Novelty with respect to [14] 7) Unclear motivation of the negative instance estimation 8) The aggregation from instance representation to bag representation is required to be permutation invariant

The rebuttal does not address well the issues 3,5,6,8. However, given the positive reviews and the rebuttal to the other points, I recommend the paper to be accepted.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

5

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposed an interpretable MIL network to study the problem of anomalous instances in classifications. The experimental results are good. The authors rebuttal has reasonably addressed the questions/issues raised by the reviewers. So I recommend an acceptance.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

3

back to top

Anomaly-aware multiple instance learning for rare anemia disorder classification