Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Junjia Huang, Haofeng Li, Weijun Sun, Xiang Wan, Guanbin Li

Abstract

Automatic nuclei detection and classification can produce effective information for disease diagnosis. Most existing methods classify nuclei independently or do not make full use of the semantic similarity between nuclei and their grouping features. In this paper, we propose a novel end-to-end nuclei detection and classification framework based on a grouping transformer-based classifier. The nuclei classifier learns and updates the representations of nuclei groups and categories via hierarchically grouping the nucleus embeddings. Then the cell types are predicted with the pairwise correlations between categorical embeddings and nucleus features. For the efficiency of the fully transformer-based framework, we take the nucleus group embeddings as the input prompts of backbone, which helps harvest grouping guided features by tuning only the prompts instead of the whole backbone. Experimental results show that the proposed method significantly outperforms the existing models on three datasets.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43993-3_55

SharedIt: https://rdcu.be/dnwN0

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    Summary: This study presents an efficient fully transformer-based framework for classifying nuclei and grouping the nucleus embeddings to predict the cell types by using the semantic similarity between nuclei and their grouping features.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper investigated an interesting problem in the well known public datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) One of the main challenges in the paper is understanding the experimental results. The training procedures are not well explained and there is insufficient discussion on how the results compare with state-of-the-art methods. I’m curious to know if the authors implemented the state-of-the-art methods themselves and reported their results, or if they are basing their results on the original papers. It seems that the reported results for methods do not match the results reported in the original paper(For example, For HoVerNet in CoNSep dataset, the original paper reports F_d score of 0.748, while in the current manuscript it has been reported as 0.621).

    2) In addition, it appears that the hyperparameters have not been fully analyzed and determined. Could you please provide more information on how they were tuned?

    • (ω1, ω2, ω3 are weight terms, number of training epochs, learning rate, optimizer and etc. )

    3) The methods used for comparison in this study, including HoverNet [11], DDOD [7], TOOD [9], MCSpatNet [1], SONNET [8], DAB-DETR [20] and UperNet with ConvNeXt backbone [23] are not adequately explained or discussed in either the related work or experiments section.

    4) The effectiveness of the proposed method is not well established in comparison to other existing methods

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    If the code is available online and all of the hyperparameters tuning explain, it will be easy to use it.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    1) It would be helpful if the captions of the figures were expanded to better explain the content of the figures. 2) It seems that the variables in the formulas are not fully defined. Could you please provide more information on the variables used in the formulas?

    3) Need more explanation in the comparison with the state-of-the-art methods.

    • Discuss more about the pros and cons of each method.

    4) Hyperparameter tuning should be well explained. How are they selected? Based on the validation set or based on the train or test! Could you please provide more information on how they were tuned?

    • The sensitivity of the Hyper-parameters needs to be further discussed.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I have some concerns with the comparison to state-of-the-art methods and experimental results, and I feel that more information is needed to address these concerns.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    3

  • [Post rebuttal] Please justify your decision

    I regret to express my dissatisfaction with the explanation provided, as there appear to be numerous disparities in the results between the method outlined in the original paper and the one presented in the current manuscript. Just as an example: the F_d score for MCSpatNet in the CoNSep dataset. While the original paper reports a score of 0.762, the current manuscript states it as 0.722. The authors claim to have used the same settings as the MCSpatNet method, but these differences raise concerns.

    Furthermore, This question is still remained without response which is very important: 2) In addition, it appears that the hyperparameters have not been fully analyzed and determined. Could you please provide more information on how they were tuned?

    • (ω1, ω2, ω3 are weight terms, number of training epochs, learning rate, optimizer and etc. ) I find the authors’ response to be unsatisfactory, Their answer is: “Our code and the setting of hyper-parameters are based on an open-source library, MMDetection (Sec 3.1). Our code will be released and please check for more details at that time.”

    Moreover, I must emphasize that the efficacy of the proposed method in comparison to other existing methods is not convincingly established

    Detailed and constructive comments for the authors 1) It would be helpful if the captions of the figures were expanded to better explain the content of the figures. 2) It seems that the variables in the formulas are not fully defined. Could you please provide more information on the variables used in the formulas?

    3) Need more explanation in the comparison with the state-of-the-art methods.

    • Discuss more about the pros and cons of each method.

    4) Hyperparameter tuning should be well explained. How are they selected? Based on the validation set or based on the train or test! Could you please provide more information on how they were tuned?

    • The sensitivity of the Hyper-parameters needs to be further discussed.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a novel end-to-end nuclei detection and classification framework based on a grouping transformer-based classifier. The proposed method significantly outperforms the existing models on three datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is written in a very intuitive way.
    2. This study presented proper solutions using “Group prompt” and “Prompt tuning” to solve the existing problem of having to consider the surrounding semantics similarity.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. statements and tables of test results are well organized, but the interpretation of the results is lacking. (Section 3.)
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    1. In terms of data, we used public datasets for our study, so it’s highly reproducible. 2. It would be even better if the code was publicly available.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Please check the CoNSeP results for Hover-Net in Table 1. Compared to Table V in the Hover-Net preliminary study, the performance of Hover-Net is lower. Please discuss whether the problem is due to the different data organization in your experiment.
    2. Consider how you can prove the effectiveness of the Grouping Transformer based Classifier. The main idea of your research seems to be “Group prompt”, but “Prompt tuning” seems to contribute more to performance improvement.
    3. this is a very minor issue, PGT is not declared in the manuscript, I think PGT is probably the prompt-based grouping transformer you suggested.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper was written in a very intuitive way, and also presented a good solution to the existing problem of considering semantic similarities.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors present a novel approach to nuclei identification using a Grouping Transformer-based Classifier to detect nuclei and their clusters leveraging a Swim Transformer backbone, centroid detector using 2 fully connected layers, and grouping-based classifier. Their platform includes a framework for a prompt-based grouping transformer and a novel grouping prompt learning mechanism to exploit nuclei clusters that guide feature learning. This platform is shown to be superior when tested for precision and recall by F-score against 7 other platforms using 3 public benchmarks including colorectal cancer, breast cancer, and a variety of colonic tissue. Notably, the comparisons are performed using normal cells rather than malignant cells that may appear differently with respect to nucleus morphology.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The platform is well-designed and technically well-explained.
    • Several publicly available datasets are utilized to train the platform to detect the nuclei of normal cells.
    • Interestingly, the F-score for epithelial nuclei detection is the highest across the 3 platforms which one may suspect would be the most challenging to identify as a variety of nuclei type would have to be accounted for such as dormant cells, actively dividing benign cells, and actively dividing malignant cells which would all have distinctly different features.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The mathematical description for the models are exceptionally complicated, especially for a clinical reviewer to comprehend.
    • It isn’t immediately clear what the clinical purpose would be for this platform. Nuclei detection is helpful but why not refine this model to detect actively dividing cells with mitotic nuclei or pathologic cells with aberrant nuclei?
    • Oddly the epithelial category is reported broadly, however, there may be value in separating out epithelial cells of benign and malignant processes to better define the strength of the platform to identify nuclei.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Authors have filled out a reproducibility checklist without any glaring issues.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Consider separating epithelial cells of benign and malignant processes to better define the strength of the platform to identify nuclei from other source
    • Can the authors speculate on what clinical role this platform may be of utility for? There is no specific future plan described which would be ideal to next translate this platform to something tangibly applicable to practice.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • This manuscript has extensive mathematical detailing and leverages 3 publicly available datasets to reveal a relatively reliable platform for nuclei detection but the clinical applicability and next steps are not clear. The manuscript would benefit if the authors explained whether considering malignant cells, which were present in 2 of the 3 datasets utilized to train the platform, should be accounted for differently in their modeling and whether the resulting F-scores vary when testing for these subtypes. Doing so would additionally build in a clinical role that the platform and authors may seek to explore in future work.
  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    The paper describes introduced prompt-based grouping transformer framework for detection and classification of nuclei that outperforms multiple state of the art solutions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • introducing multiple novel concepts
    • development of gropuping transformer based classifier
    • new and simple learning strategy
    • achievement of end2end framework
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • complication of training procedure and separation in different phases
    • statistical comparison is only in suppl
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    seems reproducible because authors claim to distribute the code and work on available datasets

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Overall, the paper presents great ideas and is well presented. One minor issue is the acronym PGT used only in Table2 I would suggest introduction earlier in the text or dropping it entirely. The other suggestion is more important and concerns the statistical significance test. The p-values are listed in Table3 in suppl but I would suggest to sum it up in the main text as well. With such minor improvements (such as 0.5%) it is important to show that the increment is really statistically significant.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Multiple novel ideas, good/promising results, clear presentation of results and ablation tests.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This manuscript introduces a prompt-based grouping transformer for nucleus detection and classification in histopathology images. Specifically, it uses a learnable grouping transformer-based classifier to explore the similarity between nuclei and their cluster representations for nucleus identification. It also presents a grouping prompt-based learning mechanism for efficient model tuning (instead of inefficient backbone fine-tuning). The experimental results on multiple datasets are promising.

    While the reviewers gave positive comments on the idea of using a grouping transformer for nucleus detection and/or the model evaluation on multiple public datasets, they raised several concerns as follows:

    1. The description of experimental setup is not clear, and the results in this submission do not match those reported in the original papers when using the same dataset, e.g., Hover-Net (Reviewers #1 and #2).
    2. The description of hyperparameter value selection is not clear, such as ω1, ω2, and ω3 (Reviewer #1).
    3. The discussion/explanation of the comparison with other approaches is not sufficient, e.g., HoverNet [11], DDOD [7], TOOD [9], MCSpatNet [1], SONNET [8], DAB-DETR [20] and UperNet with ConvNeXt backbone [23] (Reviewer #1).
    4. Please clarify if “Prompt tuning” but not “Group prompt” contributes more to performance improvement and clearly explain the contribution of the paper (Reviewer #2).
    5. Please clearly define “PGT” (Reviewers #2 and #4).
    6. Please explain the clinical impact of the proposed method and how it can be translated to clinical practice (Reviewer #3).
    7. Please consider moving the statistical tests from the supplementary document to the main text (Reviewer #4).

    Please consider addressing these comments in the rebuttal.




Author Feedback

We thank the reviewers for their thoughtful feedback and the support: R1: “…investigated an interesting problem in the well known public datasets. R2: “…consider the surrounding semantics similarity”. R3: “…is well-designed and technically well-explained. R4: “introducing multiple novel concepts/ new and simple learning strategy”. We address reviewer comments below.

Q1(R1/R2): Different reported results between HoverNet and our paper. As mentioned in Sec 3.1 of the draft, we use the CoNSeP dataset of 20x magnification to evaluate all models including HoverNet, following the setting of the MCSpatNet paper (ICCV 2021). The original HoverNet uses WSIs of 40x magnification by resizing image patches to 1000x1000, so its results are different from the HoverNet results reported by our paper. In the following table, we use the same magnification as the original HoverNet. As the results show, our method still surpasses the original HoverNet by 0.4% in F1d and 2.1% in F1c. Methods (40x) F1d F1c-infl. F1c-epi. F1c-stro. F1c-mis. F1c HoverNet 0.748 0.631 0.635 0.566 0.426 0.565 PGT*(Ours) 0.752 0.64 0.654 0.594 0.457 0.586

Q2(R1): Hyperparameter setting. Our code and the setting of hyper-parameters are based on an open-source library, MMDetection (Sec 3.1). Our code will be released and please check for more details at that time.

Q3(R1): The explanation of the comparison with other approaches is not sufficient. HoverNet, SONNET and MCSpatNet are the existing methods for detecting and classifying cells in pathological images. TOOD, DOOD, DAB-DETR and UpperNet are the state-of-the-art methods for object detection and classification in natural images. We compare with them to show the strength of our model. We will provide more explanations of these models in the revision.

Q4(R2): The effectiveness of the Grouping Transformer based Classifier (GTC) and Grouping Prompts. As shown in Tab 2 in the draft, the standalone utilization of GTC (w/o PT) obtains 0.588 in F1c, while the combination of detached GTC and Prompt Tuning (PT) (w/ detached GTC & PT) merely achieves the F1c of 0.582. Both results are even worse than the baseline (w/o GTC & PT) of 0.590 F1c. It may be due to that the naive GTC works in the final stage and lacks the use of low-level features. Thus, we propose the Grouping Prompts based Tuning, which takes the group embeddings in GTC as prompts for model tuning. Compared to the regular PT (w/ detached GTC & PT), the grouping prompt approach (PGT(Ours)) shows an improvement of 3.1% in F1c and 2.4% in F1d. The above results suggest that the proposed grouping prompts enable the feature interaction with the model encoder at initial stages and hence effectively supplement GTC with essential low-level information.

Q5(R2/R4): The definition of “PGT”. Thanks for the correction and we will introduce it earlier in the revision. PGT means Prompt-based Grouping Transformer, the overall cell detection and classification framework.

Q6(R3): Separating epithelial cells of benign and malignant; the clinical impact of the proposed method. We totally agree with your perspective, most of the existing studies always combine these two categories (benign and malignant epithelial) into one for research. In fact, our work is more focused on the detection and classification of inflammation or lymphocytes, such as the tumor infiltrating lymphocytes (TILs). The involvement of TILs is a critical prognostic variable for the evaluation of breast/lung cancer. Our method demonstrates excellent capability in identifying lymphocytes, and we will collaborate with pathologists in the future to annotate actively dividing cells with mitotic nuclei or aberrant nuclei, enhancing the model’s ability to detect a broader range of cell types.

Q7(R4): Moving the statistical tests to the main text. Thanks for the suggestion! We will summarize the statistical tests in the main text!




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This manuscript introduces an interesting method for nucleus detection and classification, mainly based on a prompt-based group transformer. The reviewers raised several significant concerns, and the rebuttal has addressed some of them. However, there is a major concern that is not well addressed, i.e., the experimental results do not match those reported in the original paper, i.e., MCSpatNet [1]. The rebuttal mentioned that it followed the experimental setting of the MCSpatNet [1] (CoNSeP dataset of 20x magnification), but the results of MCSpatNet in this submission are not the same as those presented in the original MCSpatNet paper [1]. This makes the experimental setting questionable. In addition, the rebuttal did not clearly explain how to tune or select optimal hyperparameter values, and this makes the proposed study difficult to reproduce. Finally, the discussion of the comparison with other approaches should be improved, e.g., adding the pros and cons for other methods (as pointed out by R1). Based on these concerns, this manuscript may need another round of significant revision and review before publication.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper has mixed reviews and some concerns are not well addressed. After reading the rebuttal and reviews, I think most concerns are solved and would like to accept this paper and encourage the authors try their best to revise their final version according to reviewers’ comments.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Although some of the reviewers adjusted the score, and the final score became a slightly lower than the original, but the final score is still among the ones on the higher-side in my pool.



back to top