Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Jonathan Tarquino, Jhonathan Rodríguez, Charlems Alvarez-Jimenez, Eduardo Romero

Abstract

Atypical bone marrow (BM) cell-subtype characterization defines the diagnosis and follow up of different hematologic disorders. However, this process is basically a visual task, which is prone to inter- and intra-observer variability. The presented work introduces a new application of one-class variational autoencoders (OCVAE) for automatically classifying the 4 most common pathological atypical BM cell-subtypes, namely myelocytes, blasts, promyelocytes, and erythroblasts, regardless the disease they are associated with. The presented OCVAE-based representation is obtained by concatenating the bottleneck of 4 separated OCVAEs, specifically set to capture one-cell-sub-type pattern at a time. In addition, this strategy provides a complete validation scheme in a subset of an open access image dataset, demonstrating low requirements in terms of number of training images. Each particular OCVAE is trained to provide specific latent space parameters (64 means and 64 variances) for the corresponding atypical cell class. Afterwards, the obtained concatenated representation space feeds different classifiers which discriminate the proposed classes. Evaluation is done by using a subset (n = 26, 000) of a public single-cell BM image database, including two independent partitions, one for setting the VAEs to extract features (n = 20, 800), and one for training and testing a set classifiers (n = 5, 200). Reported performance metrics show the concatenated-OCVAE characterization successfully differentiates the proposed atypical BM cell classes with accuracy=0.938, precision=0.935, recall=0.935, f1-score=0.932, outperforming previously published strategies for the same task (handcrafted features, ResNext, ResNet-50, XCeption, CoAtnet), while a more thorough experimental validation is included.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_70

SharedIt: https://rdcu.be/dnwKt

Link to the code repository

N/A

Link to the dataset(s)

https://doi.org/10.7937/TCIA.AXH3-T579


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a new method for automatically classifying atypical bone marrow cell subtypes using a one-class variational autoencoder (OCVAE) approach. The OCVAE-based representation is obtained by concatenating the bottleneck of four separate OCVAEs, each trained to capture a specific atypical cell pattern. The concatenated OCVAE characterization successfully differentiated the proposed atypical BM cell classes with high accuracy, precision, recall, and f1-score, outperforming previously published strategies. The method was validated on a subset of an open-access image dataset, demonstrating low requirements in terms of the number of training images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The work is well motivated and the experimentation is extensive and well conducted. The authors did a particular strong evaluation with 20,800 images for training and 5,200 for testing. Each split represents the four most common atypical BM cell-subtypes: myelocytes, blasts, promyelocytes, and erythroblast and the results obtained are in line or superior with the state of the art. Also, the algorithm proposed is novel and fits well in the scenario analyzed. It would be extremely interesting to see how the method performs with further, less common, classes of the dataset.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Page 7, line 8: The authors indicate that, compared with the work [26], the proposed OCVAE feature extractor requires a smaller amount of images to work adequately. However, the fact that 20,000 images per class were used in the work [26] does not necessarily mean that all of them were needed. Moreover, the 20,000 images generated per class are made using data augmentation for the purpose of avoiding class imbalance.

    • I think, considering the results obtained by the method in terms of feature extraction and classification, it would be important to describe (even with XAI techniques) which areas of the images are the most important ones for classification purposes and/or the most important features? Also, a justification (again this could be visual) as to why the metrics of EBO and MYB are lower than BLA and PMO would be important to improve the quality of the work done and allow the reader to understand the reasons for these results.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Based on what was reported by the authors of the article and based on the Reproducibility Checklist, I believe that the reproducibility details are adequate. However, if it were possible to focus my concerns on the experimental part described above, the reproducibility of the paper would also benefit.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Dear authors, Your paper is very well motivated and the experiments are comprehensive and well done. I have only some minor concerns reported above (point 6) and also the hope the see the extension of this work extended to address the classification of the remaining classes of the dataset.

    Also, I report some typos/minor issues found during my readings: Minor:

    • abstract: “eryhtroblastm” -> “eryhtroblasts”
    • Page 2, line 22: “a much more complicated…” seems to be a sentence fragment
    • page 2, line 23: why is “average accuracy mentioned only here and “accuracy” after?
    • Figure 1, caption: why is only the SVM (without the kernel specification) mentioned here, even though the authors exploited two SVM versions and RF?
    • page 4, penultimate line: do “all testing images” mean every image of all the classes or just all the images of a particular class? I suppose the second option, but it should be better to be more precise here.
    • page 5, line 2: remove “unlike”
    • page 6, line 8: remove the comma after “Regarding”
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Novelty of the proposal
    • Strong experimental evaluation
  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors satisfactorily addressed all of my concerns and the other reviewers’. In particular, the subset selection of 4 classes is now more precise and valuable, as using all the classes could “facilitate” the classification performance (as indicated in Tusneem A. Elhassan, et al, 2023). Also, the standard deviation values are low. Therefore, the improvement claim in mean accuracy is fully justified. Finally, the detail about feature normalization is essential and needed for completeness regarding the concatenation.



Review #2

  • Please describe the contribution of the paper

    The paper suggests a method for classification of 4 cell type in bone marrow smears. They have a OCVAE approach that each vae is trained on only one class and then form a concatenate latent space together. Cells are classified with a SVM.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea behind OCVAE and how it is used for feature extraction seems to be an interesting approach to this problem. Comparison with several baselines is interesting and can lead to more discussion.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • No standard deviation is reported although results are close to each other
    • For a real world application: Why should authors focus on 4 classes while having all 21 classes readily available?
    • Discussion of the results is very limited in the paper, especially table 2.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method description is clear enough, and authors are publishing the code. Dataset is publicly available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • According to Table 2, handcrafted image features are outperforming a resnet and xception trained on the dataset. This seems to require further discussion and elaboration in the paper. How and why does this happen? was training of the neural networks done in a fair way?

    • limited number of classes (4 out of 21) changes the dataset from a real world application mattering for patient’s life to a toy dataset

    As a suggestion to the Authors, I think for future work a good direction for extension might be checking the possibility of gradually adding classes in a class incremental learning scenario and seeing how robust the proposed method is. With VAE the need for storing an exemplar set is eliminated and might possibly be generated on the fly.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    limited dataset - lack of std - limited discussion of results

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    One-class variational autoencoders are used to distinguish four types of pathological bone marrow cells. For each cell type, one autoencoder is trained and the latent spaces representations of all autoencoders, created for each image and all autoencoders, are then concatenated and used to train simple classifiers. Results show that the method outperforms other baseline approaches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The strength of the method comes with the simplicity of the approach. Variational autoencoders are a rather old technique but have gained attraction recently as they are known to improve performance of image analysis tasks when used for pretraining decoder network weights. Concatenating multiple representations, each created for a specific cell type, depicts a strong feature space, as results in table 1 indicate.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The method is trained and evaluated on pathological bone marrow cell types only, which is a rather artificial task, as these cells will not appear separatedly. It would have been worth to include other cell types, e.g. as additional class (using another single variational autoencoder). By not including different cell types, the overall merit of the approach cannot be estimated. Training details are missing, this holds true for the main method, but also for the baseline approaches (authors should always try to be as transparent as possible to enable readers to estimate the quality of the baseline predictions).

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducibility should be possible, as training and evaluation code is provided (based on the checklist).

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The overall structure of the manuscript is good and it is written in good english. The method ist simple but nevertheless interesting, as it could be applied to other use cases without extensive manual labeling. Questions to the authors: -) I wonder how results would have changed if more cell types would have been included, as the appearance of the 4 types used seems to be rather distinct according to Figure 1. -) Are the feature vectors normalized upon concatenation and before they are fed into the classifiers? -) The performance drop between using only variances and including means is not obvious. Can you argue or speculate what the reason might be?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The approach does not introduce a novel method, but stands out due to its simplicity. Using variational autoencoders is known to help deep neural networks to properly adjust their weights in pretraining (especially if masked reconstruction is used, what could also be of benefit for the chosen approach). The results are convincing, although the representative power is somehow limited (as only the four pathological cell types were included, which is not a typical setting in clinical diagnostics). A more thorough evaluation of the method (on other datasets, by using multiple, distinct subsets of the bone marrow dataset) could have shed light onto the general applicability of the approach.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper presents a method for classifying atypical bone marrow cell subtypes. The proposed one-class variation autoencoder method is quite interesting. However, there are questions about why only a small number of classes have been incorporated and whether such a setup is just artificial rather than representing a real clinical question. More result discussion would be good too.




Author Feedback

First of all, we want to thank the reviewers for all the comments and suggestions that surely will help us to improve the paper. Regarding the major weaknesses that the reviewers identify in the manuscript, we want to clarify all possible misunderstandings as follow:

*For R#2, R#3 and metaR It is still unclear why the authors selected a subset of 4 classes from a database of 21 available classes, suggesting that the work is not presenting “a real world application”, “as these cells will not appear separately”.

In this comment, we thank the reviewers for the opportunity to extend the motivation of our work in this rebuttal. This is a real word application since the 4 cell subtypes are the ones which must be counted for a correct diagnosis and morphological characterization of acute myeloid leukemia and myelodysplasic disease [Vanna, R., et al, Analyst, 2015], currently the most common and aggressive hematological disorders in adults. Most studies have gathered these four subtypes together as the atypical class and have compared this against the typical cells (the rest of 21 classes), a much easier task (accuracy 0.97 reported by Tusneem A. Elhassan, et al, 2023) since in such cases, differences correspond to the lineage differentiation stage of each group. Furthermore, it has been reported that intermediate myelopoiesis stages may be misclassified, particularly consecutive maturation stages, i.e., the 4 subtypes herein compared [Tusneem A. Elhassan, et al, 2023]. This classification task is so complex that even common immunohistochemical hematology stains like CD117 may also misclassify them[G Jiang, et al, 2020].

  • For R#2 and metaR “No standard deviation is reported although results are close to each other”.

We thank the reviewers for pointing this out. We only report mean values (without std) when either variances or means of the OCVAE were used for classifying, considering that obtained stds (acc_std=0.006, f1_std=0.006, prec_std=0.0058) are similar to the ones shown in table 1 (when combining variances and means of the OCVAE). Furthermore, we consider that the mean accuracy improvement with respect to the state of the art (3%) is fair enough for comparison purposes, given the low std (OCVAE_acc=0.938, Coatnet_acc=0.908) .

  • For R#1, it is not clear enough to claim better performance with smaller amount of images, given “the fact that 20,000 images per class were used in the work [26] does not necessarily mean that all of them were needed”

In this topic we want to thank the R#1 for addressing this issue. However, the presented approach is computationally simpler and has a smaller risk of overfitting, a frequently reported problem when using data augmentation that may affect model generalization. This may be particularly true with myelocytes class since the set of 6,557 images is converted into 20,000.

  • All reviewers addressed the need of more discussion about the results and baseline implementation details, to be included in the paper, in case of acceptance. However, some OCVAE implementation details were included at the final paragraph of the “Atypical BM cell OCVAE latent space representation” subsection (number and type of layers, dimensionality reduction within the bottleneck, and training/validation scheme). Additional VAE details will be shared with the code for reproducibility purposes. In case of the baseline approaches, we provide all references for detailed description of these methods since we used either the same implementations described in the original papers, or we report the results presented in the original publications provided these authors used the same database herein used.

  • Regarding the comment about feature normalization from R#3. It is a detail that we missed in the paper, however for clarifyin this issue is impotant to highlight that the herein presented results are obtained by normalizing the feature matrix after concatenation, decrasing thebias given the separately trained OCVAEs.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal has clarified most of the reviewers’ comments. The authors should revise the paper according to the reviewer’s comments and include more in-depth discussion as well.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors propose to utilize individual VAEs for cell sub-typing. The results show that the proposed method is able to classify the 4 classes with high accuracy and is superior to competitors. However, the rebuttal does not address the reviewers’ major concerns on the methodology and the results. I am not convinced why 4 classes need to be used. The further discussion and insights into the results and methods are still missing. Therefore, it is hard to accept the paper in the current status.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The representativity of the classes and the difficulty of the task seem to be valuable to convince us that the paper shall be published. Pls. use this communication opportunity and pls. consider all the remarks of the reviewers as your commitments during the rebuttal phase.



back to top