Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Raheleh Salehi, Ario Sadafi, Armin Gruber, Peter Lienemann, Nassir Navab, Shadi Albarqouni, Carsten Marr

Abstract

Diagnosing hematological malignancies requires identification and classification of white blood cells in peripheral blood smears. Domain shifts caused by different lab procedures, staining, illumination, and microscope settings hamper the re-usability of recently developed machine learning methods on data collected from different sites. Here, we propose a cross-domain adapted autoencoder to extract features in an unsupervised manner on three different datasets of single white blood cells scanned from peripheral blood smears. The autoencoder is based on an R-CNN architecture allowing it to focus on the relevant white blood cell and eliminate artifacts in the image. To evaluate the quality of the extracted features we use a simple random forest to classify single cells. We show that thanks to the rich features extracted by the autoencoder trained on only one of the datasets, the random forest classifier performs satisfactorily on the unseen datasets, and outperforms published oracle networks in the cross-domain task. Our results suggest the possibility of employing this unsupervised approach in more complicated diagnosis and prognosis tasks without the need to add expensive expert labels to unseen data.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_71

SharedIt: https://rdcu.be/cVRuW

Link to the code repository

https://github.com/marrlab/AE-CFE

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors have proposed an approach of features extraction for white blood cell microscopy images such that the features may generalize well across samples collected from different sites. The paper defines the problem as a cross-domain learning task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea is interesting and might contribute towards building generalized models for wbc classification.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There is no comparison with state of the art methods. The authors does not report any insights on the computational requirements. The motivation of the pipeline is not clear. Why choose what they choose (for the different components of the model. Mask R CNN is a semantic segmentation model, not a detection model.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    One of the three datasets used in the work is private and not shared by the authors. No code or link for this or an intent for sharing of this is provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    While the idea might be interesting, the authors need to carefully consider the way they present it. The writing requires some more clarity. I can only quote few examples. Page 2, Section 1, “that is analyzed by an autoencoder”, what is analyzed? how is autoencoder an analyzer? (Also, I am repeating comments from above section). There is no comparison with state of the art methods. The authors does not report any insights on the computational requirements. The motivation of the pipeline is not clear. Why choose what they choose (for the different components of the model. Mask R CNN is a semantic segmentation model, not a detection model.

    Page 5, Section 3.2, “we decided to use 50 as the bottleneck size”, 50 what? 50 images? not very clear.

    How does the different component effect the performance? What motivated the choice of these components?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I believe the idea might be interesting, but the way it is presented and motivated is not very clear.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    I do not see my comments addressed.



Review #2

  • Please describe the contribution of the paper

    The paper presents a cross-domain methodology to extract features in an unsupervised manner from individual white blood cell image datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novelty: although the authors employed some well known methods (Mask R-CNN), the proposal contains novelty characterisation, because it attempts to perform a cross-domain feature extraction based on instance features of a Mask RCNN.

    Experimental evaluation: The authors exploited three very different blood cell data sets and the two tables show precisely the results obtained, which I consider to be of great interest to the community in this specific task.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Introduction and motivations to the work: There is a great deal of nuance involved in haematological classification. The authors chose as a use case a very narrow and highly simplified problem of distinguishing white blood cell types on images containing individual white blood cells. Although the results obtained are of considerable interest and realistically applicable on further domains, ideally I believe the introduction lacks detail and characterisation of the problem at hand. For example, what is the rationale for classification on images consisting of only white blood cells? To cite an example (please note that this is not a request for citation, if the authors do not consider it necessary): in some works, such as this one (https://www.mdpi.com/2076-3417/12/7/3269#) recently, it has been discussed how reliable CAD systems based on CNNs are for the analysis of globules on “whole” images, i.e. composed of a multitude of cells, and how it is basically impossible to have a reliable diagnosis on the basis of a direct classification carried out with CNNs. Therefore, I would suggest to the authors to improve the introduction so that it provides a deep overview of the study. In fact, I think it is unclear what unique challenges are associated with this task.

    In Section 2: from the figure, it seems that the Mask R-CNN training was performed on the three dataset merged. This aspect is not stated in the text. Could the authors be more precise in this sense?

    Section 2 again: why do you call it ‘reconstruction’? It sounds more like image generation to me. If it were a reconstruction, I would expect to see the WBC clean and not something “polished”.

    Still Section 2: “In our experiments GN was effective in image generalization.” How and why?

    Section 2.1: What is the anchor dataset D0? Please explain.

    Section 3.2: why was the constant β in equation 3 set to 5?

    Section 3.2: the training procedure is not entirely clear to me. Did you train on the 80% of the original images? Or on the “reconstructed” ones?

    In reference to Table 1, the authors state “AE RF: random forest classification of features extracted by a similar autoencoder trained on all datasets with no domain adaptation.”. However, I think it should be useful for reader to give, even if brief, a detail about the similar autoencoder, also considering it reached best results in same-dataset experiments.

    With reference to the classification method used, Random Forest, I think the authors should better motivate his choice (Note: this is not a criticism of the method itself, just a request for clarification)

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors gave a satisfactory quantity of details for reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Dear Authors,

    I read your manuscript with great interest and I found it of good quality. Also the results are quite impressive and opens the field for further improvements. However, I think that several clarifications are needed, as in some points I found it unclear. I list them as follows:

    Introduction and motivations to the work: There is a great deal of nuance involved in haematological classification. The authors chose as a use case a very narrow and highly simplified problem of distinguishing white blood cell types on images containing individual white blood cells. Although the results obtained are of considerable interest and realistically applicable on further domains, ideally I believe the introduction lacks detail and characterisation of the problem at hand. For example, what is the rationale for classification on images consisting of only white blood cells? To cite an example (please note that this is not a request for citation, if the authors do not consider it necessary): in some works, such as this one (https://www.mdpi.com/2076-3417/12/7/3269#) recently, it has been discussed how reliable CAD systems based on CNNs are for the analysis of globules on “whole” images, i.e. composed of a multitude of cells, and how it is basically impossible to have a reliable diagnosis on the basis of a direct classification carried out with CNNs. Therefore, I would suggest to the authors to improve the introduction so that it provides a deep overview of the study. In fact, I think it is unclear what unique challenges are associated with this task.

    In Section 2: from the figure, it seems that the Mask R-CNN training was performed on the three dataset merged. This aspect is not stated in the text. Could the authors be more precise in this sense?

    Section 2 again: why do you call it ‘reconstruction’? It sounds more like image generation to me. If it were a reconstruction, I would expect to see the WBC clean and not something “polished”.

    Still Section 2: “In our experiments GN was effective in image generalization.” How and why?

    Section 2.1: What is the anchor dataset D0? Please explain.

    Section 3.2: why was the constant β in equation 3 set to 5?

    Section 3.2: the training procedure is not entirely clear to me. Did you train on the 80% of the original images? Or on the “reconstructed” ones?

    In reference to Table 1, the authors state “AE RF: random forest classification of features extracted by a similar autoencoder trained on all datasets with no domain adaptation.”. However, I think it should be useful for reader to give, even if brief, a detail about the similar autoencoder, also considering it reached best results in same-dataset experiments.

    With reference to the classification method used, Random Forest, I think the authors should better motivate his choice (Note: this is not a criticism of the method itself, just a request for clarification)

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novelty of the proposal, results obtained, quality of experimental evaluation.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    This work proposed a representation learning method that can learn robust features which can work well on the unseen domains. The authors achieve this by manipulating the cell feature representation from the Mask R-CNN, using self-supervised learning and a domain adaptation approach.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The workflow of the proposed method is interesting, which combines self-supervised learning and domain adaptation.
    2. Using feature from the Mask R-CNN is smart as the object detection algorithm is pretty robust.
    3. The writing is very clear.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Major:

    1. The baselines are not strong enough. In my opinion, the main contribution of this work is in the domain adaptation area, where no label from the target dataset is given. Also this is the place where the proposed method shows better performance (Train on one dataset, and directly test on another without fineturning). This method should be compared with domain adaptation method. But we cannot find this kind of the baselines in the experiments.
    2. Inconsistant description for the ResNet-RF method. In Table 1, it seems it is trained on different datasets in each experiment. If so, where does the classification label come from? And in the main text, it said “ResNet RF: random forest classification of the features extracted with a ResNet101 [9] architecture trained on ImageNet dataset”. The training set is inconsistent here. Minor:
    3. The application of the proposed method is limited as the accuracy on the cross domain seems far way from training supervisedly.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No code is provided, but the details of the training and network architecture are provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    As suggested in the main weaknesses section, the baselines are not very strong for comparison. The authors are suggested to use some domain adaptation methods as the baselines.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The baselines are not strong enough, no domain adaptation method is tested as the baseline. The application of this method is limited as the performance on the cross domain are not satisfied. However, the writing of this paper is pretty good.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    There are non-converging review recommendations. The authors are encouraged to address esp. the issues raised by the reviewers including but not limited to empirical evaluations (e.g. lack of SOTA baselines), presentation (e.g. clarify motivations of the work and the proposed method).

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7




Author Feedback

Dear Area Chair, dear Reviewers,

We would like to thank you for your time to assess our work and the constructive feedback on our manuscript.

All reviewers acknowledged the novelty and importance of our approach to robustly classify single-cell images via unsupervised feature extraction (R1: “The idea is interesting and might contribute towards building generalized models for WBC classification.”; R2: “The results are quite impressive and open the field for further improvements”; R3: “Using features from the Mask R-CNN is smart as the object detection algorithm is pretty robust and the proposed method is a combination of self-supervised learning and domain adaptation.”)

However, reviewers also raised a few issues. Most importantly, R1 and R3 missed a SOTA comparison. While many domain adaptation methods have been published, to the best of our knowledge, there is no autoencoder-based instance feature extraction method that is able to work across different domains without labels. Published methods we are aware of are altering the classifier by retraining on the new domain to change decision boundaries. Most of them require at least a few labels in the target domains, making any comparison to our unsupervised approach askew. Following the reviewers’ remarks, we compared our method to the most natural baseline we could come up with, an adversarial domain adaptation via a domain discriminator along Tzeng et al. (Adam optimizer, 150 epochs, lr: 0.001). The accuracy of our proposed AE-CFE approach when trained on Matek-19, WBC, and Acevedo-20 and tested on the remaining two unseen datasets is (0.48, 0.21), (0.73, 0.31), and (0.45, 0.21) (see Table 1). For the adversarial domain adaptation baseline we now performed, this decreases to (0.31, 0.18), (0.63, 0.17), and (0.39, 0.17). We thank the reviewers for raising this point and would be more than happy to include this comparison in the camera-ready version of our manuscript.

Following the suggestion of R1 and R2, we revised the methodology section and now better describe and motivate the different components involved. In short:

  • The Mask R-CNN is used as a single white blood cell detector and feature extractor. Surrounding objects like red blood cells and artifacts, which are irrelevant to the task, are thus eliminated from the feature vectors (see autoencoder reconstructions in Fig.3). This stabilizes any downstream computer aided diagnosis task.
  • The autoencoder receives feature vectors for each cell and reconstructs these in the first stage. In the second stage the single-cell images are reconstructed (Fig. 1). Thanks to the low dimensionality of the feature vector and fewer network parameters, training this autoencoder is much more efficient than training an autoencoder for images.
  • Domain adaptation with group normalization and the distribution-based maximum mean discrepancy has been shown to align the latent space representation of different datasets (see Mekhazni et al., 2020). We use it to adapt the three datasets, which differ in resolution, size, and color.

As requested by R1 and R2 we also revised the introduction and now better motivate our approach.

With respect to reproducibility, code and all three datasets will be made publicly available via our GitHub account with an incorporated link in the manuscript.

Finally, we would like to thank the reviewers for pointing out typos and suggesting terminology, which improved the manuscript considerably.

Tzeng, Eric, et al. “Adversarial discriminative domain adaptation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. Mekhazni, Djebril, et al. “Unsupervised domain adaptation in the dissimilarity space for person re-identification.” European Conference on Computer Vision. Springer, Cham, 2020.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper deals with blood cell image classification with unsupervised cross domain feature extraction method. The idea is overall interesting and empirical evaluations are carried out on different data scenarios. Meanwhile, the reviewers have raised a number of concerns including empirical evaluations (e.g. lack of SOTA baselines), and presentation (e.g. clarify motivations of the work and the proposed method), among others. The authors need to seriously go through the issues raised by the reviewers and address them properly.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I agree with the reviewers who found this work to have novel aspects. There were some concerns about missing comparisons to SOTA methods, e.g. unsupervised domain adaptation. I believe the rebuttal has mostly addressed these concerns.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes a cross domain feature extraction technique for WBC classification. Although we had diverging reviews and concerns from reviewers regarding comparison to baseline, I think the rebuttal addresses these issues if the numbers with the baseline can be added as a line to Table 1. Overall, this paper stands for its novelty and I think the ideas presented here would be of great interest to the MICCAI audience.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



back to top