Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Tan Nhu Nhat Doan, Kyungeun Kim, Boram Song, Jin Tae Kwak

Abstract

An automated segmentation and classification of nuclei is an essential task in digital pathology. The current deep learning-based approaches require a vast amount of annotated datasets by pathologists. However, the existing datasets are imbalanced among different types of nuclei in general, leading to a substantial performance degradation. In this paper, we propose a simple but effective data augmentation technique, termed GradMix, that is specifically designed for nuclei segmentation and classification. GradMix takes a pair of a major-class nucleus and a rare-class nucleus, creates a customized mixing mask, and combines them using the mask to generate a new rare-class nucleus. As it combines two nuclei, GradMix considers both nuclei and the neighboring environment by using the customized mixing mask. This allows us to generate realistic rare-class nuclei with varying environments. We employed two datasets to evaluate the effectiveness of GradMix. The experimental results suggest that GradMix is able to improve the performance of nuclei segmentation and classification in imbalanced pathology image datasets.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_17

SharedIt: https://rdcu.be/cVRro

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a data augmentation technqiue, termed as GradMix, to improve nuclei segmentation and classification performance, particularly for imbalanced pathology image datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A new way of data augmentation techinique is proposed to improve the nuclei segmetnation and classification tasks. Compared with GAN model, the proposed method doesn’t need the supervised datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The proposed nuclei augumention is not very impressing in improving the nuceli segmentation and classification performance. On the public CoNSeP dataset, the improvement to segmentation is quite limited, and to classification, the improvement on the Miscellaneous nuclei is with the performance sacrifice on the inflammatory nuclei, which actually is also no the major-class.
    2. The experimental results of CutMix is really disappointing, which is nearly worse on all evaluated metrics even compared to without using any data augmentation. Though CutMix is the compared method, but the authors shall compare to a more competitive method.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The description of GradMix is quite clear and shall be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The authors comapre to CutMix, which shows quite disappointing performance. The authors shall explain why CutMix behave worse here, otherwise I’d assume the way the authors apply CutMix is problematic.
    2. In the evaluation of nuclei classification, besides the F1-score for each type of nuclei, the overall performance is also suggested.
    3. I’d also suggest to split the data into train/test multiple times, and report the mean/std. Then the results would be more convincing.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors propose a new nuclei augmentation method to boost the rare nuclei segmentation and classification. While based on the experiments, the improvement is limited and not convincing enough.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    The paper describes a synthetic data generation method to increase training dataset size and address class imbalance to train more robust and accurate nucleus segmentation and classification models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper proposes a method for synthetically creating nuclei of rare class types in a given imbalanced training dataset to improve class balance and achieve a more balanced diversity in training data. The experimental evaluation with two datasets and one deep learning network shows performance improvements compared with using real training data only.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors use their own deep learning network to evaluate the performance impact of the synthetic data generation method. This makes it hard to evaluate if performance gains shown in the experimental evaluation are due to the specifics of the deep learning network and whether the synthetic data generation method will lead to similar performance gains with other networks. For example, the nucleus segmentation performance numbers for the CoNSeP dataset (Table 2) with the proposed method are lower than or equal to those achieved by Hover-Net [2] using real data only (Table III in [2]; e.g., 0.853 DICE with Hover-Net vs 0.846 DICE with the proposed method).

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The proposed method, the dataset, and the deep learning method used in the paper are clearly described. If the codes and the datasets were to be provided, I believe the results could be reproduced.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This work targets an important problem; it is difficult to generate large, representative training data for nucleus segmentation. It proposes a synthetic data generation pipeline to increase the number of nuclei of rare class types in imbalanced datasets. The experimental evaluation shows performance improvements.

    The experimental evaluation is limited because it uses a single deep learning method (developed by the authors). While there are performance improvements, they are relatively modest (around a few percent on average; Tables 2 and 3). The authors should include another network (e.g., Hover-Net) to show if the performance improvements from the proposed method are due to the specific deep learning network used in the paper or other networks can benefit as well. For example, nucleus segmentation performance numbers from the Hover-Net paper [2] (Table III in [2]) are generally higher or equal to the performance numbers obtained by the proposed method (Table 2 in the paper). The Hover-net paper uses real data only.

    The authors should also provide a better comparison of their approach to other approaches ([a] and [b]): [a] Gong, Xuan, et al. “Style consistent image generation for nuclei instance segmentation.” Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021. [b] Hou, Le, et al. “Robust histopathology image analysis: To label or to synthesize?.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

    Both [a] and [b] use in-painting and nucleus generation via simple segmentation or deformation of real nuclei as part of synthetic data generation steps. Both works also implement variations of an approach in which an end-to-end training approach is used that integrates training of a task specific network and the synthetic data generation process. The goal is to guide the synthetic data generation and model training to generate training data that can better optimize the task specific model. That is different than the approach used in this paper which separate synthetic data generation and training of the segmentation model. Readers would benefit from a comparison to those approaches.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper targets an important problem. The proposed method shows performance improvements on two datasets.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors addressed some of my concerns; i.e., performance difference between the proposed method and Hover-Net on the ConSeP dataset. I still think they should have included an experimental comparison to Hover-Net (or another SOTA method) in their work in order for the reader to better evaluate if other methods can benefit from the synthetic data or a combination of their deep learning network and synthetic data primarily led to the performance improvements. My original rating was accept. I am keeping my original rating.



Review #3

  • Please describe the contribution of the paper

    The paper introduces a data augmentation technique (GradMix) for nuclei segmentation and classification when cells between classes are imbalanced. The proposed method generates patches showing both major-class nuclei and rare-class nuclei. The proposed method is tested on two public datasets and the results show an improvement in classifying rare-class nuclei.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors propose a method to augment rare-class nuclei with major-class nuclei to overcome class imbalance. This is a common problem because nuclear classes are indeed imbalanced. The results show an improvement in classifying rare-class nuclei. Specifically, Table 3 shows results of nuclei classification are improved. F1-scores of miscellaneous nuclei (a rare class) in both datasets increased more than other nuclei types when using GradMix. For the first dataset, F^M increased by 0.035 while F^E and F^L changed less than 0.01. For CoNSeP, F^M increased by 0.011 while F^E increased by 0.016 and F^I and F^S changed less than 0.01. Showing more improvements in miscellaneous nuclei would meet the main purpose of GradMix.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    This work would be more appreciated if the authors compare between CutMix and GradMix in terms of the methodology. CutMix was not described in the introduction, so without understanding CutMix, it was difficult to follow the result when comparing CutMix and GradMix.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors use two publicly available datasets and the authors plan to make their code available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. This work would be more appreciated if the authors compare between CutMix and GradMix in terms of the methodology. CutMix was not described in the introduction, so without understanding CutMix, it was difficult to follow the result when comparing CutMix and GradMix.
    2. In Figure 1, I recommend to avoid using blue color for rare-class. It was difficult to see rare-class nuclei because hematoxylin-stained nuclei are also blue.
    3. What is the definition of “major” and “rare”?
    4. Figure 3 should show more instances of rare-class nuclei, because the novelty of this work is to classify rare-class nuclei when imbalanced. I only see one instance of miscellaneous.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Class imbalance in nuclei is a common and realistic problem for nuclei segmentation and classification in pathology. With GradMix which is the proposed data augmentation technique to overcome class imbalance, Table 3 shows results of nuclei classification are improved. Specifically, F1-scores of miscellaneous nuclei (a rare class) in both datasets increased more than other nuclei types when using GradMix. For the first dataset, F^M increased by 0.035 while F^E and F^L changed less than 0.01. For CoNSeP, F^M increased by 0.011 while F^E increased by 0.016 and F^I and F^S changed less than 0.01. Showing more improvements in miscellaneous nuclei would meet the main purpose of GradMix.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    As the authors promised to add an extended description of CutMix and an extended discussion of CutMix results, the paper would be in a good shape. Including additional experiments with other deep neural networks (e.g., Hover-Net) would a great idea but I understand conducting the additional experiments may not be done during the limited time period.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This manuscript presents a data augmentation method specifically for nuclei segmentation and classification in pathology images. While the reviewers appreciated that the proposed method can address the class imbalance problem and provide higher accuracy of nuclei segmentation and classification than the baseline, they also raised some concerns/questions. R1 commented that the performance improvements obtained by the proposed method are not significant, the competitor (i.e., CutMix) produces worse performance than the baseline without any data augmentation but no detailed explanation is provided, and the experimental results are not convincing. R2 pointed out that the experiment evaluation is limited, the performance improvement is modest, and it is unclear whether the data augmentation technique or the specific deep neural network contributes to the performance improvement. R2 also mentioned that some other networks using real data only, e.g., Hover-Net, can produce a higher DICE value than the proposed method. R3 asked for a comparison between CutMix and the proposed method in terms of the methodology. Please consider addressing these concerns/comments in the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5




Author Feedback

● The reviewers commented that the experiment evaluation is limited: We agree with the reviewer that the experiments that we conducted are limited. It would have been much better if we could include other methods like Hover-Net. Unfortunately, we were not able to do so. We leave it for the future study. But, we will discuss this as the limitation of our study in the final manuscript. ● The reviewers pointed out that the performance gain by our method is not significant or modest and a previous method without using synthetic dataset (e.g., Hover-Net) recorded a higher DICE than the proposed method: We would like to note that there is a difference in the performance gain by our method between nuclei segmentation and classification. As the reviewer mentioned, the improvement in nuclei segmentation may be modest or not so significant. It is true that Hover-Net, according to the original paper, outperforms our method by a small margin, e.g., 0.007 in DICE and 0.002 in PQ. We ask the reviewer to pay more attention to PQ since it is now considered as the best metric for nuclei segmentation. With respect to PQ, the two methods are almost equivalent. However, in nuclei classification, our method is clearly superior to Hover-Net by a larger margin, including 0.032 in Fd, 0.043 in FE, 0.020 in FI, 0.050 in FM, and 0.032 in FS. We would like to emphasize the improved performance in nuclei classification since our goal is to tackle data imbalance among different nuclei types. Although there is a small performance drop in nuclei segmentation, the added value by the improved nuclei classification would be much bigger as analyzing tissue images and making diagnostic decisions based upon them. Such difference in nuclei segmentation and classification was not apparent, and thus we will further discuss them in the final manuscript. ● The reviewers mentioned that CutMix resulted in worse performance than the one without any data augmentation but no detailed explanation is provided and asked a comparison between CutMix and the proposed method in terms of the methodology: The poor results with CutMix demonstrate the difficulty of data augmentation or generation to overcome data imbalance. A simple way of increasing the number of instances does not help at all, but rather, there is an adverse effect. Hence, the results with CutMix highlight the strength of our method, which is computationally cheap but effective in improving the overall performance. We will provide an extended discussion of CutMix results in the final manuscript. In short, CutMix simply replaces one nucleus by another using a rectangular region encompassing each, which results in a substantial discontinuity around the boundary. In particular, for overlapping nuclei, this will generate awkward-looking nuclei since it will cut out parts of neighboring nuclei. We believe that such discontinuity and artifacts introduced by CutMix are the main causes of the poor performance of the model with CutMix. Moreover, we accept that the description of CutMix was insufficient. To improve the readability of the manuscript, we will describe CutMix and other related techniques in Introduction. We will also compare it to our method with respect to the methodology in Experiments and Results. ● The reviewer pointed out that it is unclear whether the data augmentation technique or the specific deep neural network contributes to the performance improvement: To the best of our knowledge, an ablation experiment is the way to show the contribution of a specific component. Although we only utilized one model, it was tested on two distinct datasets. We believe that the consistent results between two datasets indicate that it is due to the GradMix not just the specific deep neural network that is used here. Also, as we mentioned above, the poor results with CutMix further emphasize the effectiveness of GradMix.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper introduces a simple data augmentation method for nuclei segmentation and classification, which produces improved performance compared with the baseline models. The rebuttal has addressed the major concerns of the reviewers, such as performance difference between the proposed method and the Hover-Net, and the poor performance of CutMix. The manuscript could be further improved by adding a comparison between the proposed method and recent state-of-the-art relevant approaches in the experiments.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors propose a data augmentation method to enhance nucleus segmentation and classification in imbalanced datasets. Two reviewers agree to accept this paper because the proposed method shows improvements in two datasets. After reading all reviews and the paper, I have a similar concern with Reviewer #1 - the improvements are not convincing enough or might not be significant. Cross-validation and statistical test might be needed because the GradMix only has little bit higher F1 scores than the compared baseline. And the first dataset only includes 59 images from an unclear number of patients. Whether the validated results are generalizable is questionable. However, in overall consideration, I recommend acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Two reviewers recommended accept and one weak reject. The main concerns raised by reviewers are about experimental validation, and also insufficient explanation/comparison with other methods such as CutMix. The rebuttal is very strong, in which the authors clarified that performance gain using the method is clear for nuclei classification although not that obvious for nuclei segmentation. The explanation for why CutMix does not perform well is rebuttal. The authors also answered clarification questions by reviewers satisfactorily. Considering the strong rebuttal and novelty of the proposed method, the paper is acceptable.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



back to top