Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Wei Fu, Yufei Chen, Wei Liu, Xiaodong Yue, Chao Ma

Abstract

Near Out-of-Distribution (OOD) detection is a crucial issue in medical applications, as misdiagnosis caused by the presence of rare diseases inevitablely poses a significant risk. Recently, several deep learning-based methods for OOD detection with uncertainty estimation, such as the Evidential Deep Learning (EDL) and its variants, have shown remarkable performance in identifying outliers that significantly differ from training samples. Nevertheless, few studies focus on the great challenge of near OOD detection problem, which involves detecting outliers that are close to the training distribution, as commonly encountered in medical image application. To address this limitation and reduce the risk of misdiagnosis, we propose an Evidence Reconciled Neural Network (ERNN). Concretely, we reform the evidence representation obtained from the evidential head with the proposed Evidential Reconcile Block (ERB), which restricts the decision boundary of the model and further improves the performance in near OOD detection. Compared with the state-of-the-art uncertainty-based methods for OOD detection, our method reduces the evidential error and enhances the capability of near OOD detection in medical applications. The experiments on both the ISIC2019 dataset and an in-house pancreas tumor dataset validate the robustness and effectiveness of our approach. Code for ERNN has been released at https://github.com/KellaDoe/ERNN.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_30

SharedIt: https://rdcu.be/dnwA2

Link to the code repository

https://github.com/KellaDoe/ERNN

Link to the dataset(s)

https://challenge.isic-archive.com/data/#2019


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper selects unlabeled data based on the L2 distances between the unlabeled data and labeled data in the feature space for the open-set semi-supervised learning task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method seem works well.
    2. The proposed method is very simple.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The contribution is limited. The proposed method just calculates the distances of labeled and unlabeled data in feature space, and sets a threshold to identify those unlabeled data that are near the label data. While more advanced methods that identify the outliers based on the distance in feature space already exist, here the outliers are similar to the OOD data in this paper.
    2. In high-dimension space, L2 distance may not a good choice. In fact, the ‘nearest neighbor’ based on L2 distance may be meaningless [1].

    [1] Beyer et al. When Is “Nearest Neighbor” Meaningful?. ICDT. 1999

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The source code is not currently available, while the method is mostly reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Analyze why L2 distance can be helpful.
    2. Some methods in manifold learning may be helpful for improving the method, like t-sne, which encodes data samples into distributions.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The contribuction is limited as discussed above.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    I wrongly put comments for another paper here, sorry for my fault. My original recommendation is weak accept, since the experimental and theoretical demonstrations are convincing.



Review #2

  • Please describe the contribution of the paper

    The authors propose a method for OOD detection. It is an extension of Evidential Deep Learning (EDL) incorporating a new block proposed by the authors, the Evidential Reconcile Block. The authors show this improves OOD detection performance in the near-OOD domain.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Method is well explained

    • Comparison with a number of other methods, with all methods implemented use the same trained backbone. Method doe show improvement over baselines.

    • An ablation study is included

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • No mention of code being made available and some training details missing.

    • It’s hard to judge just how near-OOD the samples are - the authors don’t describe the datasets in much detail or provide any images of the different classes.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Seems OK. Some minor details on the training missing (e.g. batch-size, hardware). These are the sorts of things that would be best included in accompanying code, but the authors make no mention of releasing code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The dataset description is lacking in detail. The authors don’t even describe what type of images are contained in the ISIC dataset. I think this is fundamental enough information that a reader should not need to follow references to look it up. The authors also don’t explain the class types.

    I think the authors should make it clearer exactly how they flag an input as OOD using their method.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has novel methodology, is well explained, has decent evaluation and some promising results.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    I stick by my recommendation to accept. My main concerns were around the lack of certain details about the dataset choices, which the authors have provided in the rebuttal. The authors have also promised to release code.



Review #3

  • Please describe the contribution of the paper

    The paper proposes a new neural network based method for addressing the near out of distribution detection problem in the medical image classification task with associated uncertainty estimation. The main novelty is in the proposed evidence reconciliation block, which fine tunes the decision boundaries for improved near OOD. Comparisons with state of the art uncertainty-based classifier are made on two clinical datasets, showing strong results for the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is very well written, clearly motivating the problem and the proposed methodology while also providing proper background. The provided theoretical analysis is helpful to understand the behavior of the method. Extensive comparisons are made to relevant competitor methods, showing promising results for the author’s work on multiple clinical datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The methodological contribution seems to be limited, amounting to a new loss function that can be used with existing EDL architectures. Some of the clarity in exposition of the method could be improved.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Authors indicate code to recreate experimental results will be released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The latter half of Section 2.2 is a bit unclear to this reviewer. Specifically, providing additional context/derivations to where equations 4 and 5 come from (perhaps in the appendix) or the relevant citations would help improve clarity of the work and strengthen the paper.

    Minor

    Fix grammar in Figure 1 caption

    Section 3.3 “As shown in Table 2, It is clearly ” it should not be capitalized and clearly -> clear

    Section 2.3 “Due to the lower loss the proposed method derived, the better accuracy of classification is achieved and the empirical loss is reduced, “ -> fix grammar

    Please clarify “…since the outliers are absent in the training set, the detection is a non-frequentist situation.” To this reviewer, the issue seems less about frequentist inference than about biased/incomplete training data.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written, the method well motivated and interrogated and the comparisons made are extensive and relevant.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper extends Evidential Deep Learning theory to the near-OoD task, and apply it to deal with a public skin classification dataset and a private pancreatic tumor classification problem. Two reviewers voted for Acceptance, but R1 preferred to recommend Rejection. In light of this, I am sending this work to the Rebuttal phase.

    In my opinion, the authors should make an effort to describe better the data they are using in their experiments. I totally agree with R2 that the reader should not be forced to look up a reference of the ISIC dataset in order to understand it. Also, it is not clear how near-OoD is defined in each dataset, what the acronyms for each category mean, not even the type of images in the private dataset: is this CT, MRI? Is this 3d volumes? there is a concerning lack of detail in the experimental section. Also some lack of detail in parts of the training details, it would be great if authors could release their code for the sake of reproducibility.

    The authros should also probably focus on R1’s comments, since they are recommending Rejection. Unfortunately, R1 did not provide much detail on what they find so negative about the paper, or what could be done to improve it, only “Analyze why L2 distance can be helpful.” and discuss a bit the possibility of improving the proposed technique with manifold methods like t-sne, so this review will probably not contribute too much to the final decision.




Author Feedback

We sincerely thank all Reviewers, ACs and PCs for their time and efforts. The responses for all comments are below, and we hope that they will address the questions you have raised.

Q1 (R1): “This paper selects unlabeled data based on the L2 distances between the unlabeled data and labeled data in the feature space for the open-set semi-supervised learning task.” And “In high-dimension space, L2 distance may not a good choice. In fact, the ‘nearest neighbor’ based on L2 distance may be meaningless.” A1: Thanks for your comments. However, our method is not for “semi-supervised learning”, and there is no “L2-distance” mentioned in our paper. Therefore, we suspect that these comments may go beyond the scope of our paper. It might be comments for another paper.

Q2 (Meta&R2): Details of datasets and near-OOD settings. A2: We appreciate your comments, and here are our supplementary explanations. The ISIC2018 dataset consists of skin lesion images in JPEG format, which are categorized into eight different classes[1][2]. The private pancreas tumor dataset has eight classes, which comprises CT slices with the largest tumor area for each sequence. In line with the settings presented in [3][4], we define the categories with relatively scarce samples as near-OOD classes for each dataset. In medical applications, the near-OOD categories are mostly rare diseases and have similar features to in-distribution (ID) categories. In our experiments, we use DF (239), VASC (253) for ISIC dataset as same as [4], and SCN (37), ASC (33), CP (6), MCN (3), PanIN (1) for private pancreas tumor dataset as the near-OOD categories.

[1] “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)”; arXiv:1902.03368, 2019. [2] “The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions.” Scientific data 5.1 (2018): 1-9. [3] “Does your dermatology classifier know what it doesn’t know? detecting the long-tail of unseen conditions.” Medical Image Analysis 75 (2022): 102274. [4] “Out-of-distribution detection for long-tailed and fine-grained skin lesion images.” MICCAI (2022): 732-742.

Q3 (Meta&R2): Training details and code reproducibility. A3: Thanks for your comments. During our training process, the images were first resized to 224x224 pixels and normalized, then horizontal and vertical flips were applied for augmentation. The training was performed using one GeForce RTX 3090 with a batch size of 256 for 100 epochs. The code will be released on GitHub after acceptance as we indicated in the paper submission.

Q4 (R3): Clarity in exposition of the method. A4: Thanks for your detailed comments and useful suggestions. Partial derivations of our method have been included in the appendix, and here are some brief supplementary details. Referring to subjective logic[5], opinions for non-frequentist process should be uncertainty maximized. Since the parameter ‘b’ in opinion should be non-negative, the upper bound of uncertainty can be calculated as Eq.4. As for Eq.5, by mapping subjective opinions to Dirichlet distributions in section 2.1, the evidence prediction ‘e’ establishes a proportional relationship with the opinion parameter ‘b’. With the uncertainty maximized, the evidence should be transformed accordingly, which is called “Reconcile” in our proposed method. In a non-frequentist situation, “we could in principle know the outcome of a specific time, but we do not have enough evidence to know it exactly”[5]. It is the biased/incomplete data that render the prediction non-frequentist, which serves as the main motivation for our method.

[5] “Subjective logic”. Cham: Springer, 2016.

Once again, we would like to express our gratitude to all the reviewers. All the mentioned revisions and improvements will be addressed and incorporated in the revised version.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    One of the reviewers initially recommended rejection because they made a mistake, they meant weak accept. This reviewer updated their rating to accept, so there is now wide consensus on the acceptance of this paper.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper demonstrates evidential deep learning for (near) out of distribution detection within medical imaging. While the reviewers were mostly positive, there were some concerns regarding the description of the clinical data/task as well as training details; these were provided in the rebuttal and should be added to the paper as well.

    Based on this, I am happy to recommend acceptance.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    agree with MR1, accept



back to top