Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Linhao Qu, Yingfan Ma, Zhiwei Yang, Manning Wang, Zhijian Song

Abstract

Active learning (AL) is an effective approach to select the most informative samples to label so as to reduce the annotation cost. Existing AL methods typically work under the closed-set assumption, i.e., all classes existing in the unlabeled sample pool need to be classified by the target model. However, in some practical clinical tasks, the unlabeled pool may contain not only the target classes that need to be fine-grainedly classified, but also non-target classes that are irrelevant to the clinical tasks. Existing AL methods cannot work well in this scenario because they tend to select a large number of non-target samples. In this paper, we formulate this scenario as an open-set AL problem and propose an efficient framework, OpenAL, to address the challenge of querying samples from an unlabeled pool with both target class and non-target class samples. Experiments on fine-grained classification of pathology images show that OpenAL can significantly improve the query quality of target class samples and achieve higher performance than current state-of-the-art AL methods. Codes will be available.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43895-0_1

SharedIt: https://rdcu.be/dnwxJ

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #3

  • Please describe the contribution of the paper

    The paper proposes a new framework for active learning (AL) called OpenAL, which addresses the challenge of querying samples from an unlabeled pool with both target class and non-target class samples. The problem is formulated as an open-set AL problem, where the unlabeled pool may contain not only the target classes that need to be classified but also non-target classes that are irrelevant to the clinical tasks. Existing AL methods cannot work well in this scenario because they tend to select a large number of non-target samples. OpenAL adopts an iterative query paradigm and uses a two-stage sample selection strategy in each query.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novelty: The paper proposes a novel framework called OpenAL for active learning under an open-set scenario, which is a practical problem in clinical tasks. This is the first open-set AL work in the field of pathology image analysis.
    2. Two-stage sample selection strategy: OpenAL adopts a two-stage sample selection strategy in each query, which includes a feature-based target sample selection strategy and a model-based informative sample selection strategy. This approach can effectively select the most informative samples from the target classes and reduce the number of non-target samples selected.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Limited datasets: The paper only uses one public dataset for the experiments, which may limit the generalizability of the proposed framework to other datasets and clinical applications. In addition, the selected dataset has the same number of images in each category. However, in real scenarios, this balanced dataset cannot be always guaranteed. Some terminologies are confusing: For example, the term NORM is represented as the normalization function as well as the normal colon mucosa. Some choices of metrics are not justified: choice of using category and Mahalanobis distance-based feature distribution modeling approach for calculating the distance between target and non-target samples is not adequately justified. The paper does not provide a thorough explanation of why this particular distance metric was chosen over other possible alternatives, which may limit the understanding and reproducibility of the proposed framework.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    In terms of reproducibility, the paper seems to be reasonably well-documented and could potentially be reproduced by other researchers. Authors include details on the hyperparameter settings, such as the number of epochs, optimizer, momentum, weight decay, initial learning rate, and batch size, which are important details for the reproducibility of the experiments.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • One of the weaknesses of the paper is that some choices of metrics are not justified. For example, the authors use the Mahalanobis distance for feature distribution modeling, but they do not explain why this distance was chosen over other alternatives. It would be beneficial if the authors could provide a rationale for their metric choices.
    • The authors only compare their proposed OpenAL framework with a few state-of-the-art AL methods. It would be more comprehensive if the authors could also compare their method with other related approaches, such as transfer learning-based AL or meta-learning-based AL, to show the superiority of their proposed framework.
    • The term “Histopathology” appears in the title and no where else in the paper. Hence, please make sure to briefly describe this term in the introduction.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper seems to make a valuable contribution to the field of active learning for pathology image analysis, but there is still room for improvement in terms of clarity and robustness.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper
    1. This paper presents a novel approach to the histopathology image classification task by combining open-set and active learning methods.

    2. In this paper, a feature-based target sample selection strategy is proposed for selecting samples within the target class distribution. The strategy involves using self-supervised feature representation to form a candidate set and selecting samples with high entropy values for labeling.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper introduces a novel active learning framework called OpenAL, designed specifically for open-set problems. OpenAL involves two steps: first, a feature-based target sample selection (FTSS) strategy is used to select samples within the target class distribution, and second, samples with high entropy values are selected for labeling based on a candidate set.

    2. The first step of the strategy is innovative and distinguishes it from the baseline work, as it leverages the feature space of unsupervised learning to identify in-distribution samples, instead of relying solely on limited labeled data.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The originality of this paper is not enough. The originality of this paper could be strengthened. While the method presented in this paper is based on the open source CVPR22 LfOSA method, the authors have made two key modifications that differentiate it from the original: first, they designed a feature-based target sample selection (FTSS) based on unsupervised features in the first step, and second, they used a different histopathology dataset. The focus of the paper is primarily on the FTSS method, with the claim that it outperforms the MAV method used in CVPR22 LfOSA. However, the discussion of the FTSS method’s shortcomings could be expanded upon.

    2. The FTSS strategy may lead to sample redundancy. Formula (1) in the paper is used to calculate the score and ranking of samples selected by the FTSS in the first step. However, this formula may lead to the selection of samples that are too redundant, especially those close to the center of the distribution. As a result, the candidate set may lack diversity, and ranking by entropy in the second step may not be helpful in addressing this issue.

    3. it may not reasonable to avoid labeling non-target class samples in this task. The motivation of this paper is to avoid labeling non-target class samples in the open-set to improve model performance. However, the results show that even when the LfOSA method avoids selecting non-target class samples and has high precision and recall, its final performance is still inferior to the random method that does not consider the target distribution. This suggests that avoiding labeling non-target class samples may not be as effective as previously thought, and alternative strategies could be explored.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper appears to be reproducible because the authors have stated that the code will be made available and the experiments are based on a publicly available dataset. This means that other researchers can access and verify the results presented in the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. This paper should consider avoiding sample redundancy in the selection strategy. Sample diversity is crucial for achieving good performance, especially when the data size is small.
    2. The authors should investigate why the LfOSA method is weaker than random, despite its ability to detect samples from the target category distribution. This raises questions about the motivation of this paper to avoid selecting non-target class samples in this dataset and task.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The paper’s originality is not particularly strong, and the improved strategy (FTSS) based on the baseline may be prone to the issue of sample redundancy.
    2. The motivation of the paper may not be well-suited to this histopathology image classification task. Although the paper proposes avoiding the selection of non-target class samples, the performance of the LfOSA method is not as good as random, even when it detects target class samples. This suggests that other factors affecting data selection performance, such as diversity, representativeness, and class balance, may be overlooked in this task.
  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #6

  • Please describe the contribution of the paper

    The main contribution of this paper is to develop a new framework, OpenAL, to solve the Open-set histopathology image classification task with active learning. The method focuses on actively selecting target class samples for annotations and achieving state-of-the-art performance compared with other methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of this paper are listed as follows,

    1. A novel active learning framework, OpenAL, is proposed to address the non-target class samples issue in histopathology image classification using active learning.
    2. They design a two-stage query scheme to select informative target-class samples to get annotations.
    3. The proposed method is evaluated on a public dataset with two different settings and achieves state-of-the-art performance compared with other methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The main weakness of this paper from my side is how to maintain the balance of the number of samples for different classes during active learning. For the initialization step, 1% of samples are randomly selected to be labeled. Here, we do not know the distribution of each class in target class samples. It is possible to contain samples from only part of all the classes. Then, during the two-stage query, only distance $s_i$ and uncertainty is considered. There are no operations to maintain class balance during active learning. We do not know whether the model is well-trained or not. Please give some explanations.
    2. There are also several items that need to be addressed in No. 9 for your reference.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The idea of this paper can be reproduced by following the paper, the evaluation dataset is public, and their code will be released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. In Figure 1 (Page 2), for the “Ours” branch, actually some non-target samples can also be selected for Oracle to get annotations. You need to change the figure and avoid misleading readers.
    2. In Section 2.1 (Page 4), could you please give more details about how to use the new labeled samples? Did you train the model from scratch with all labeled samples, or did you continue training the model with only the new labeled samples, or another way?
    3. Eq. (1) (Page 5), $s_i$ can be negative values with the current format. Could you please reconsider it?
    4. Eq. (2) in Section 2.2, (Page 5), could you please give more explanations and motivations why the Mahalanobis distance is selected in your paper? If this distance is not proposed by you, please list the reference for this distance.
    5. Eq. (4) in Section 2.2, how did you set the $W$ for the non-target class samples? What value did you use for your paper? Why?
    6. Section 3.1 (Page 6), “42% (3 target classes, 4 non-target classes)”, I think it contains typos. Please double-check it.
    7. Section 3.1 (Page 6), Why did you choose TUM, LY, and NORM as the target class but not the rest six classes? Did you select them randomly? Could you give us some explanations?
    8. In Fig. 3, sub-figure C, what does “ w/o $S_{nt}$” stand for?
    9. Page 9, ref. 14 and ref. 15 are duplicates. Please double-check them.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is a well-written paper and tries to solve a practical problem - dealing with non-target class samples during active learning. However, it contains some typos and unclear places, I cannot accept this paper currently, but I would like to reconsider my decision after reading the authors’ rebuttal.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I think most of my questions and comments have been addressed by the authors. So I changed my overall rating to weak accept. Thank the authors for their efforts.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper presented an OpenAL framework for active learning under an open-set scenario, which is interesting. Please the authors address the following questions/concerns in the rebuttal: (1) the novelty of the proposed method, particularly for the papers mentioned by R#4; (2) how to maintain the balance of the number of samples for different classes during active learning as pointed by R#6; (3) experimental data set is limited as pointed by R#3; (4) as well as other questions and concerns as listed in the review comments.




Author Feedback

(Q1) About novelty. (AC, R4) (A) Our method significantly differs from LfOSA in target class sample selection and final training sample selection. LfOSA utilizes an auxiliary network for selecting target class samples while we propose a feature-based strategy. In addition, LfOSA directly uses the selected target class samples for model training, but our method further assesses each target class sample’s labeling value using an uncertainty-based method.

(Q2) The balance of different classes. (AC, R6) (A) First, we apologize for the incorrect statement on Line 9, Page 6, “The number of images in each category is equal.” In fact, the image number of each category is not equal (LYM: 12%, NORM: 9%, TUM: 14%, ADI: 10%, MUC: 9%, DEB: 11%, STR:10%, MUS:14 %, BACK:11%).

Second, our method maintains class balance in target class sampling. Under the 33% matching ratio, we provide the cumulative sampling ratios of our method for the target classes LYM, NORM, and TUM across QueryNums 1-7: 1- 19%: 11%: 70%, 2- 42%: 16%: 42%, 3- 28%: 23%: 49%, 4- 35%: 26%: 39%, 5- 32%: 27%: 41%, 6- 32%: 28%: 40%, 7- 32%: 29%: 39%.

Third, we constructed a more imbalanced setting for the target classes LYM (6000 samples), NORM (3000 samples), and TUM (9000 samples), and the cumulative sampling ratios of our method for the 3 target classes are still fairly balanced. 1- 43%: 14%: 43%, 2- 49%: 14%: 37%, 3- 52%: 18%: 30%, 4- 42%: 18%: 40%, 5- 36%: 19%: 45%, 6- 38%: 20%: 42%, 7- 39%: 20%: 41%.

(Q3) Data set is limited. (AC, R3) (A) Multi-class WSI classification dataset is rare. We designed two experiments with different matching ratios and in this rebuttal, we add an experiment on more imbalanced target classes (see Q2).

Reviewer3 (Q4) Why use Mahalanobis distance (MD). (A) MD is widely used to measure the distance between a point and a distribution because it takes into account the mean and variance of the distribution. In contrast, most other distances such as Euclidean distance can only be used between points.

(Q5) Compare with other related methods (A) More methods in related fields will be considered in our future work.

Reviewer4 (Q6) Diversity of The FTSS strategy. (A) The selection is done in each cluster, and the clusters change in each round, which helps improve diversity. Nevertheless, our method selects more target class samples while maintaining a certain degree of diversity. (also see Q2)

(Q7) LfOSA is weaker than random. (A) We further analyze LfOSA and list its cumulative sampling ratios for the 3 target classes at QueryNums 1-7: 1- 0%: 0%: 100%, 2- 0%: 6%: 94%, 3- 0%, 25%, 75%, 4- 1%, 31%, 68%, 5- 14%, 29%, 57%, 6- 27%, 26%, 47%, 7- 29%, 26%, 45%. We can find that in the first 4 rounds, it selects no or very few LYM samples. The severe imbalance of selected samples makes LfOSA weaker than random at the beginning. On the other hand, our method selects more balanced target class samples (see Q2).

Reviewer6 (Q8) Detailed comments. -1 The figure will be revised. -2 In each query round, each sample is first labeled as a target or non-target class sample, and a target class sample is further labeled as one of the 3 target classes. We use the labeled target samples to train the classifier. All labeled target samples and non-target samples are used for feature-based target sample selection in the next round. We continue training the model with only the newly labeled target samples. -3 In fact, s_i can have negative values, but this does not influence our sample selection. Negative s_i means the sample is very close to target class samples and far from non-target class samples. -4 Please refer to (Q5). -5 In this paper, we set W=9. We conducted new robustness experiments with W=1, 5, 9 and 13. The results show that our method is very robust to W. -7 We selected TUM, LYM, and NORM as the target classes to simulate a possible scenario for pathological cell classification in clinical practice. Technically, target classes can be randomly chosen.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    In the post-rebuttal phase, all the reviewers consistently rated this paper with positive ratings. Accept.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper is an important contribution to the active learning community and will definitely evoke discussions and analysis within the miccai community. The rebuttal has addressed the major concern of reviewers and it definitely deserves to be accepted



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper proposes a new active learning framework OpenAL, which addresses the challenge of querying samples from an unlabeled pool with both target class and non-target class samples. It uses a two-stage sampling strategy: the first stage sampling data using feature-based information based on unsupervised training, whilst the second stage using model-based information based on uncertainty measure. The proposed method is evaluated on a multi-class colon dataset (kather data). In the author rebuttal, authors further address the problem of unbalanced training data, which is convincing. I recommend to accept the paper



back to top