Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Ruining Deng, Yanwei Li, Peize Li, Jiacheng Wang, Lucas W. Remedios, Saydolimkhon Agzamkhodjaev, Zuhayr Asad, Quan Liu, Can Cui, Yaohong Wang, Yihan Wang, Yucheng Tang, Haichun Yang, Yuankai Huo

Abstract

Multi-class cell segmentation in high-resolution Giga-pixel whole slide images (WSI) is critical for various clinical applications. Training such an AI model typically requires labor-intensive pixel-wise manual annotation from experienced domain experts (e.g., pathologists). Moreover, such annotation is error-prone when differentiating fine-grained cell types (e.g., podocyte and mesangial cells) via the naked human eye. In this study, we assess the feasibility of democratizing pathological AI deployment by only using lay annotators (annotators without medical domain knowledge). The contribution of this paper is threefold: (1) We proposed a molecular-empowered learning scheme for multi-class cell segmentation using partial labels from lay annotators; (2) The proposed method integrated Giga-pixel level molecular-morphology cross-modality registration, molecular-informed annotation, and molecular-oriented segmentation model, so as to achieve significantly superior performance via 3 lay annotators as compared with 2 experienced pathologists; (3) A deep corrective learning (learning with imperfect label) method is proposed to further improve the segmentation performance using partially annotated noisy data. From the experimental results, our learning method achieved F1 = 0.8496 using molecular-informed annotations from lay annotators, which is better than conventional morphology-based annotations (F1 = 0.7015) from experienced pathologists. Our method democratizes the development of a pathological segmentation deep model to the lay annotator level, which consequently scales up the learning process similar to a non-medical computer vision task. The official implementation and cell annotations are publicly available at https://github.com/hrlblab/MolecularEL.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_48

SharedIt: https://rdcu.be/dnwJ3

Link to the code repository

https://github.com/hrlblab/MolecularEL

Link to the dataset(s)

https://github.com/hrlblab/MolecularEL


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a study on cell annotation in histopathological images from non-expert users, using two modalities of realigned images. The objective is to show that it is possible to obtain annotations of sufficient quality to train a multiclass classification model from these annotations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The approach of using lay annotators working on two disctinct modalities is interesting.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • lack of comparison with previous work on crowdsourcing annotations
    • lack of interexpert agreement evaluation
    • scores calculated only on one experiment
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The study presented should be easily reproduced as all details are in the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Several aspects of this work raise questions. First of all, the whole study is based on F-score calculations from a gold standard given by a pathologist. It would be interesting to know the score obtained by a second expert performing the same task, i.e. a manual annotation from the two image modalities. I think the difference would be quite large. This puts into perspective the results obtained, for which the differences are not really significant (Table 2).

    Secondly, it seems that the scores presented were obtained with a single experiment. Given the differences, it would be better to carry out several distributions of the data and give an average score with the corresponding standard deviation.

    Finally, the benefit of using non-experts to carry out the annotations rather than a confirmed pathologist is reduced by the fact that a second immunofluorescence image modality is necessarily required. This acquisition is not obvious and also has a cost.

    I think it would also be interesting to compare this type of approach with work on WSI annotations based on crowdsourcing with non-experts

    • AMGAD, Mohamed, ATTEYA, Lamees A., HUSSEIN, Hagar, et al. NuCLS: A scalable crowdsourcing approach and dataset for nucleus classification and segmentation in breast cancer. GigaScience, 2022, vol. 11.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See comments.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The new version includes some more information that were missing.



Review #2

  • Please describe the contribution of the paper

    Training AI model for the multi-class cell segmentation from WSI always requires labor-intensive pixel-wise manual annotation from domain experts. This paper proposes a molecular-empowered learning for multi-class cell segmentation using partial segmentation. The proposed method alleviated the annotation difficulties from expert level to lay annotator level. An efficient semi-supervised learning strategy is also introduced to offset the impact of noisy label from lay annotations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Lay annotation is an interesting method. When labeling data, it can release the high requirements for knowledge in the medical field. It can benefit from easy access to large labeled data, thus scaling up the learning process.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The illustration of lay annotators is not very clear. How about the quality of lay annotations? How much noisy labels it would introduce? How these variances would affect the model performance?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper claimed that the official implementation would be publicly available when ready.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Molecular-informed lay annotation is interesting. But I still can not understand the advantage of lay annotation compared with morphology-based annotation from pathologists. Why this kind of lay annotation is less labor-intensive? why it does not need medical domain knowledge? I will appreciate it if the author could give more illustration on the lay annotation.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    1.Reducing the high requirement on medical image annotation is a important topic in this area.
    2.The content is well-organized. 3.The model is well evaluated with sufficient experiments.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The author clearly explains my concerns in the rebuttal.



Review #3

  • Please describe the contribution of the paper

    The authors describe a novel method “molecular-informed” annotation by lay people. This study addresses a pressing need, as experts are typically not available and domain knowledge is often needed for appropriate medical annotations. The authors could show that the F1 metric clearly improves w.r.t. a pathologists that has only access to HE slides.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors describe a corrective learning approach that results in competitive labeling w.r.t. the gold standard. This approach enables lay people to annotate with very high quality medical images that clinicians cannot create as measured by several metrics.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The only significant problem I see with this paper is the rather “unfair” comparison. The authors show that lay people that have access to molecular information (and corrective learning etc) outperform pathologists that have only access to HE slides and NOT the molecular information. In an ideal world, I would love to see a comparison between pathologists that have access to both, HE and molecular information, and how the lay people compare. This would give insights how “bad” lay people actually annotate. With the current paper, I know that more information may lead to better results.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors provide access to cell annotations and code implementations. I therefore believe that the paper should be largely reproducible. However, hyperparameters for AI training are not provided within the text, and the use of ImageJ is mostly non reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    As addressed in the “weakness” section, the main drawback of the study is the comparison of an expert with HE slides only and a lay person that has more information available. Ideally, add this confirmatory comparison. At the very least, please adequately discuss this point in the manuscript.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I like the approach to enable lay people to annotate data. I agree with the authors that this is a significant bottleneck that needs to be addressed. My major criticism is I think important, but overall I believe the study is of interest for the MICCAI community.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    I initially voted for accept, I am tending now between accept and weak accept, as it was not clear to me, that only 1 expert and 1 lay annotator was used in the initial study. However, the authors provide in the rebuttal information about additional raters, which strengthen their point. My criticism was addressed adequately. Therefore, my final decision stays as is.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper presents a method for “molecular-informed” annotation by lay annotators, and thus touches on a highly relevant topic to reduce the need for experts with detailed domain knowledge.

    Major concerns include a missing interexpert and interrater evaluation and that the comparison is only between lay annotators working on multimodal information and expert annotators who only have access to the H&E images. Additionally, discussion of prior work on crowdsourcing in histopathology would be of interested.

    In their rebuttal, the authors should clarify the experimental setup (number of expert/lay annotators & experiments), discuss the difference between the setting for the lay annotators and the expert, as well as discuss the additional costs of molecular imaging vs. expert annotation (which should also be included in the manuscript in case of acceptance).




Author Feedback

We appreciate all the reviewers and AC for the insightful reviews and summarize the comments into two major and two minor concerns. [Major concerns] C1. Missing inter-expert and inter-rater evaluation (AC,R1,R3): In the original manuscript, we had 1 lay annotator and 1 expert. As suggested, we have now included 3 lay annotators and 2 experts. It allows us to compute the inter-rater variabilities between lay annotators and experts on a small set of patches. From the new experiments, the average F1 scores (Podocyte: 0.8496, Mesangial: 0.8473) from the proposed method are higher than those of the 2 experts who only used anatomical images (Podocyte: 0.7016, Mesangial: 0.6568). Statistically, the Fleiss’ kappa test shows that the proposed method has higher annotation agreements (Podocyte: 0.6406, Mesangial: 0.5978) than the annotation by experts (Podocyte: 0.3973, Mesangial: 0.4161). C2. Clarify experimental setup and costs along with the advantages of using lay annotators (AC,R1,R2): Experimental setting: All anatomical and molecular patches of glomerular structures are extracted from WSI on a workstation equipped with a 12-core Intel Xeon W-2265 Processor, and NVIDIA RTXA6000 GPU. An 8-core AMD Ryzen 7 5800X Processor workstation with XP-PEN Artist 15.6 Pro Wacom is used for drawing the contour of each cell. ImageJ (version v1.53t) is used to collect the cell contours and masks. The experimental setup for the 3 lay annotators and the 2 experts is kept strictly the same to ensure a fair comparison. There are 3 benefits of democratizing the cell annotation to lay annotators with Molecular-empowered Learning: (1) lower the long-lasting domain knowledge bar of annotation in pathology AI: An long lasting issue of deploying pathology AI is the high domain knowledge bar for annotation. Accurate annotation necessitates decades of expensive medical training. The proposed annotation pipeline reduces the difficulty of annotation by shifting from distinguishing the knowledge-intensive cell characteristics to recognizing simple molecular biomarkers. (2) Improvement in annotation accuracy: The morphology patterns of each cell type can vary due to different staining and lesions, leading to a large variability and potential error during the annotation. However, a molecular biomarker is a stable characteristic of each cell that enables accurate cell quantification. The result (C1 and Tab.1) demonstrates that lay-annotators, without any specialized knowledge, achieved better reliability and accuracy with molecular-empowered learning compared with the experts. (3) Reduction in pathologist costs: The proposed pipeline liberates pathologists from the time-consuming annotation workload, allowing them to focus on other expert examination and research. This reduces the time cost from annotating 1 cell type on 1 WSI in 9 hours to staining and scanning 24 IF WSIs (as a batch) in 3 hours, while it is impractical to hire massive experienced pathologists for cell annotation. [Minor concerns] C3. annotation by an expert with molecular images (R1,R2,R3) In our paper, we regard 1 pathologist (over 20 years’ experience) with both anatomical and molecular images as gold standard (Fig.1). The annotation errors provided by lay annotators and experts are calculated against this gold standard, demonstrating the quality of the annotations and the noisy labels introduced. C4. discussion of prior work on crowdsourcing in histopathology (AC,R1) In C1, we provided an evaluation of annotation from 3 lay annotators and 2 experts, as well as their annotation agreement. In this paper, we primarily introduced an annotation pipeline that reduces the difficulty of cell annotation, while achieving better annotation performance by lay annotators. In future work, we will explore crowdsourcing technologies to generate better annotation for AI learning from multiple annotations. We will add all of the results and discussion in the final version of the manuscript.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The key strength of the paper include the discussion of a relevant topic in biomedical research (e.g., in the context of citizen science) and how a segmentation task could be framed such that lay annotators achieve high performance. The main weaknesses included the limited number of annotators, no further evaluation of the gold standard, and the fact that the expert annotators did not have access to the ground truth. It is unfortunate that these results were not included in the original study as the review-rebuttal process at MICCAI is not intended / not fully suited to assess the role and validity of new results. At the same time, the additions seem to be fairly straight-forward and do not change the message of the paper considerably.

    Given that this paper was assessed fairly positively by all reviewers, I would therefore see it as a borderline accept.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work presents an interesting method that tries to overcome one of the major issues in digital pathology, which is obtaining a high-quality, sufficient ground truth annotations. I do agree with the problem raised by the authors and the work shows improved experimental results. However, the rebuttal is not very convincing, especially the 3 benefits that the authors listed. (1) lower the long-lasting domain knowledge bar of annotation in pathology AI: Don’t we need domain knowledge to use molecular images? And, it is not just using the images. One needs to register them as well, which is not a trivial task. (2) Improvement in annotation accuracy and (3) Reduction in pathologist costs: These are benefits of the molecular images per se, not the proposed method. Overall, the proposed method improves the quality of the annotations at the expense of additional molecular images and image registrations. The paper does not sufficiently provide the sole contribution of their work and the experimental design is not very convincing as well. Hence, I am not convinced this paper merits acceptance in its current form.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The discussions among AC and reviewers, after the rebuttal do not allow us to move on with this paper. Moreover, even if the challenge seems interesting, this sensitive topic would need a solid foundation to really become adoptable.



back to top