Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Jiangdong Cai, Honglin Xiong, Maosong Cao, Luyan Liu, Lichi Zhang, Qian Wang

Abstract

Vulvovaginal candidiasis (VVC) is the most prevalent human candidal infection, estimated to afflict approximately 75% of all women at least once in their lifetime. It will lead to several symptoms including pruritus, vaginal soreness, and so on. Automatic whole slide image (WSI) classification is highly demanded, for the huge burden of disease control and prevention. However, the WSI-based computer-aided VCC screening method is still vacant due to the scarce labeled data and unique properties of candida. Candida in WSI is challenging to be captured by conventional classification models due to its distinctive elongated shape, the small proportion of their spatial distribution, and the style gap from WSIs. To make the model focus on the candida easier, we propose an attention-guided method, which can obtain a robust diagnosis classification model. Specifically, we first use a pre-trained detection model as prior instruction to initialize the classification model. Then we design a Skip Self-Attention module to refine the attention onto the fined-grained features of candida. Finally, we use a contrastive learning method to alleviate the overfitting caused by the style gap of WSIs and suppress the attention to false positive regions. Our experimental results demonstrate that our framework achieves state-of-the-art performance.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_23

SharedIt: https://rdcu.be/dnwJF

Link to the code repository

https://github.com/caijd2000/MICCAI2023-VVC-Screening

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This article proposes a progressive attention-guided method for Vulvovaginal Candidiasis screening. Aiming at the challenges of Candida appearing in long strips and occupying a small proportion of the image, and the staining difference of WSI image is large, a pipeline is proposed.There are three main contributions. First, use the detection task to initialize the encoder, making it easier for the network to focus on Candida object.Then, use jumping self-attention to fuse multi-scale features, and finally use contrastive learning to ease the style impact of differences. The effectiveness of the method is verified on two datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method presents a novel approach to VVC screening by a progressive attention-guided method. This approach has the potential to improve the accuracy and efficiency, making it a valuable contribution to the field. For the morphological characteristics of Candida, a Transformer that can capture long-range relationships is used, and multi-scale features are fused, so that Candida can be better identified.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper did not provide enough information about method. In the classifier with SSA module, the structural description of the transformer is not clear. Besides, the calculation of attention map is also not clear enough.

    2.Comparative experiments with other methods are not rich enough to prove the superiority of the method.

    3.The introduction to the dataset is not detailed enough.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    1. The paper does not provide enough information about the data used in the study. This lack of information makes it difficult to obtain and use the same data for their own studies.
    2. Some experimental details in the paper are not clear enough.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. For the problem of pathological image recognition, it can be compared with some multi-instance learning methods in recent years. Such as CLAM[1], DSMIL[2], TransMIL[3], DFTD-MIL[4]…

    2. The description of the information about the dataset used is not sufficient. If it is a public dataset, it should be clearly introduced. If it is a private dataset, the paper should also introduce how the data was collected, pre-processed, or selected for analysis.

    3. The two FC layers in Figure 2 are marked (shared), are their weights shared?

    4. The calculation method of the attention map in the attention extractor is not novel enough.

    5.For image-level classification, does each image have a category label?

    1. The structure description in the Transformer in the classifier with SSA section is unclear.

    2. It is better to explain more about the calculation of attention map, such as channels and dimensions.

    [1] Data-efficient and weakly supervised computational pathology on whole-slide images, Nature Biomedical Engineering2021 [2] Dual-Stream Multiple Instance Learning Network for Whole Slide Image Classification, CVPR2021 [3] TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification, NIPS2021 [4] DTFD-MIL:Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification, CVPR2022

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    1.Comparative experiments with other methods are insufficient to prove the superiority of the method.

    1. There are some unclear points in the description of the method.
    2. The description of the dataset is not clear enough.
  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors have made some of the unclear issues clearly. Thus, I change my opinion to weak accept.



Review #3

  • Please describe the contribution of the paper

    The authors introduced a novel attention-guided method for VVC screening, which can progressively correct the attention of the model. In the model, a SSA module was proposed to fuse features from coarse and fine-grained and the contrastive learning was introduced to avoid overfitting the image styles.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) A novel idea, the skip attention mechanism (SSA), is used for Whole slide image screening. (2) The proposed method has greatly improved the performance compared with previous works. (3) Well writing and organization.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) Although significant improvements were displayed in the Table 1, the compared methods are few. (2) It would be better to add some legends to Fig 1 and Fig 2, e.g., red arrows in Fig 1 and dashed arrow in Fig2. (3) It has been mentioned that contrastive learning can alleviate the overfitting caused by the style gap. I did not find related experiments, e.g., how the problem impacts the performance? and how it was alleviated?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducible with some efferts, the code was not published.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    (1)Will the parameters in the encoder be updated after the pre-training stage?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See the weaknesses and strengths

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    The paper proposes an attention-guided method to obtain a robust diagnosis classification model for candida infections. The method utilizes a pre-trained detection model to guide the initialization of the classification model and a Skip Self-Attention module to refine the attention onto the fine-grained features of candida. To address the challenge of style gap in WSIs, a contrastive learning method is used to suppress the attention to false positive regions and alleviate the overfitting caused by the style gap.

    The proposed method improves the accuracy and robustness of the diagnosis classification model for candida infections.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Use skip self-attention (SSA) to consider the multi-scale semantics of low-level and high-lever, and improve the network’s attention to Candida with severe occlusion or long hyphae
    2. Use contrastive learning to alleviate the overfitting risk caused by the style gap and to improve the ability to discern candida. This is where performance improvement and innovation are reflected.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. This way of dividing into patches may lose the relevant information of the whole image
    2. The design of SSA only uses high-level and low-level information in the construction of Q, K, and V. In the overall model design, it only introduces comparative learning to solve the problem of the style gap. There is no great innovation in the model.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The model in this paper is mainly divided into three parts,

    1. The pre-training model part is clearly described and can be reproduced
    2. The skip self-attention (SSA) part only uses high-level and low-level information in the construction of Q, K, and V. The attention part is the classic soft attention, which can be reproduced
    3. Contrastive Learning is an important means to improve performance, but the specific implementation is not very clear, and there may be recurrence problems.

    In general, the difficulty of reproducibility is mainly in the Contrastive Learning part, it is recommended to add. At the same time, it is also recommended to open source the model parameters of the pre-training part to enhance reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Compared with other algorithms, whether the parameter quantity is in the same order of magnitude, and whether the performance improvement is better than that brought about by the increase of the model parameter quantity.

    2. The typesetting of the picture is not suitable. For example, Figure 2 is only used in Section 2.3 Contrastive Learning, but the picture is arranged on the front page.

    3. The final fusion method and score design did not see the description.

    4. The description of implementation suggestions in the Contrastive Learning section is clearer.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The degree of innovation in the model in this paper is not great, but the introduction of contrastive learning has greatly improved the performance of the algorithm. There are some clinical practical application values. So I gave a rating of weak reception

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper presents an attention-guided method for detecting vulvovaginal candidiasis in whole slide images in vulvovaginal candidiasis screening. Three reviewers reviewed your paper. They were concerned about a lack of detailed descriptions about the proposed method and a lack of comparisons. See their comments to improve your paper.




Author Feedback

Q1: AC concerned lack of detailed descriptions according to review comments. We will release code, which contains detailed implementation of our method.

  1. Classifier with SSA module (R1). We mostly follow the classical transformer design. The goal of SSA is to fuse low-level fine-grained features with high-level coarse-grained features for classification. Hence, we modify the transformer decoder in SSA, where values/keys are from high-level feature embeddings and queries from low-level.
  2. Calculation of attention map (R1). We realized there was confusion in Fig. 2, which would be clarified in the final paper. Specifically, for “Classifier with SSA” in Fig. 2, only CLS token needs to pass the last FC to complete classification, and all other tokens are directed to the “Attention Extractor”, where FC parameters are shared with “Classifier with SSA”. Features are compressed to 2562 by FC, reshaped to 16162, and up-sampled to 10241024*2. Finally, we obtain the attention maps from feature maps using max-min normalization.

Q2: AC mentioned lack of comparisons. We notice that the task of candida diagnosis is very rare in literature, and it’s truly an obstacle for us when preparing this paper. We have tried our best to compare with as many methods as possible. At the image level, we haven’t found any alternative method that could effectively solve our problem. At the WSI level, we have tried TransMIL on “Dataset-Small” following suggestion of R1. The original TransMIL performs poorly, which is challenged by the unique morphology of candida. We then replace the original encoder in TransMIL, from ResNet50 pre-trained on ImageNet to the encoder used in our method. In this way, we can achieve AUC=95.35, which is slightly inferior to ours. Note that the modified TransMIL is very similar to our method, while the main difference is “PPEG” in TransMIL. This module encodes the positional information of image patches, which proves to be effective to histopathology. But it doesn’t work in cytologic WSIs to diagnose candida. This phenomenon echoes a recent survey arguing that methods for histopathology may not be directly applicable to cytology (doi.org/10.21203/rs.3.rs-2680912/v1).

Q3: R1 expected more introduction to the dataset. Our samples were collected by a collaborating clinical institute by 2021. Each sample is scanned into a WSI following standard cytology protocol, which can be further cropped to ~500 images sized 1024*1024. Details of our data can be found in the beginning of Section 3. In addition, the image-level training, the WSI-level training (“Dataset-Small”), and a further validation (“Dataset-Large”) are done on three non-overlapping datasets. For the image-level dataset, each image has a category label determined by pathologist; for the two WSI-level datasets, each WSI has only a WSI-level label.

Q4: R3/R4 requested clarification to contrastive learning and its contribution to alleviate style gap. Details of our implementation can be found in Section 2.3, and we will release code. Due to the low amount of candida WSIs (a few hundred), the high-style-variation versus small-sample-number is a major threat to overfitting. Without contrastive learning, the performance drops significantly (c.f. the last two rows in Table 1).

Q5: R3 asked whether encoder parameters would be updated after pre-training. A: The first three layers of the pre-trained encoder are frozen, and the last layer can be updated. We will release code. Q6: R4 questioned the magnitude of parameter quantity. The params of the image-level classifier is 47M, which is in the same order of magnitude as other methods. Q7: R4 questioned the final fusion method and score design. The description is in Fig. 1b and Section 2.4. The scores of images output from the image-level classifier indicate their possibilities of having candida. We rank all images in a WSI according to the scores. Top-k images pass their features to transformer, which performs the final fusion.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Although the authors provided their rebuttal, none of the reviewers changed their original scores. Since the scores for other papers increased after the rebuttal, the final score became on the boarder line in my pool. The concern about the lack of detailed descriptions of the method raised by one of the reviewers and AC was not fully addressed, as the authors just said they would release the code, but they did not provided details nor a summary of the details.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Reviewers rightly raised questions about dataset description and performance comparison. While the authors have addressed some major points (dataset details) in the rebuttal some other points have not been adequately addressed. I also agree with the authors that comparison methods may not be optimal due to the novel nature of the problem.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I believe that the rebuttal period was successful. authors clarified the main points of the paper and code will be released. Comparisons may be not enough but the field is limited already.



back to top