Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Mingyu Wang, Yi Li, Bin Huang, Chenglang Yuan, Yangdi Wang, Yanji Luo, Bingsheng Huang

Abstract

CAD is an emerging field, but most models are not equipped to handle missing and noisy data in real-world medical scenarios, particularly in the case of rare tumors like pancreatic neuroendocrine neoplasms (pNENs). Multi-label models meet the needs of real-world study, but current methods do not consider the issue of missing and noisy labels. This study introduces a multi-label model called Self-feedback Transformer (SFT) that utilizes a transformer to model the relationships between labels and images, and uses a ingenious self-feedback strategy to improve label utilization. We evaluated SFT on 11 clinical tasks using a real-world dataset of pNENs and achieved higher performance than other state-of-the-art multi-label models with mAUCs of 0.68 and 0.76 on internal and external datasets, respectively. Our model has four inference modes that utilize self-feedback and expert assistance to further increase mAUCs to 0.72 and 0.82 on internal and external datasets, respectively, while maintaining good performance even with input label noise ratios up to 40% in expert-assisted mode.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43990-2_49

SharedIt: https://rdcu.be/dnwL4

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    For multi-label prediction, the authors proposed a novel model SFT based on a transformer encoder. The model integrates label semantics and image informations, and iteratively uses its own prediction by self-feedback mechanism to improve the utilization of missing labels and correlation among labels

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1)An interesting idea to ultilize the information of labels, the authors established the relations amone labels and images, which cleverly alleviates the problem of missing label and label noise problem. (2) The experiment results showed significant improvements. (3)Well writing and clear orgnization

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There have been a lot of works on multi-label classification task. And the methods compared are a bit less.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    positive if code are released

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    (1)How the postive and negative states are embedded when t=1? (2)The equation (2)~(3) illustrate the attention process, are norm layers used? (3)why the variance of the experimental results is not displayed, will it be very large? e.g., quantification results in Table 1. (4)In the label noise experiments, how the noisy labels generated? random sampling? Moving labels to confusing categories?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Paper is well written but need to answer a few question and clarify them.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper propose a Transformer-based model for multi-label prediction based on CT images. The Self-Feedback Strategy is proposed to deal with the missing and noisy labels.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method is simple and effective, the experiment is sufficiently comprehensive, the writing is quite fluent, and the description is quite clear.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The authors stressed “multi-disciplinary” multiple times in the introduction section, but it is not reflected in the methodology or the actual task.
    2. The “Soft State Embedding” is not reflected in Fig 1, and the description should be improved, it is rather hard to understand now. Grammar mistake in “For labels with continuous values such as age, the value normalized to 0 ∼ 1 is w_i^p”.
    3. Section 3.1, what is the selection criteria for the main task?
    4. Table 1, it can be seen that the simple PS performs better than the SFT, and for a fair comparison, you should add another baseline: adding a classification token to the Transformer model, and the output of the cls token is classified by a classification head. This simple modification is a trivial idea, and the baseline PS is too weak (the Transoformer is totally removed, but performs better than SFT). The effectiveness of the model can not be effectively substantiated.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    More detailed description of details like GPU memory requirements, computational time should be included. Open-sourcing the code is highly encouraged.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    See above.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The research topic, the problem to be solved and the idea of the proposed model is of novelty, and the proposed model is simple and effective. However, considering the weakness, especially the lack of an important baseline (and especially the simple PS outperforms SFT**, the missing baseline should be better than PS at least), I can only give a weak accept.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors did not address all my concerns about the baseline comparisons, but the work and the idea are still of quality. Considering the review of other reviewers, I would still recommend a weak acceptance. I would like to see it to be accepted. However, there is still room for improvement. I respect the opinion of the meta-reviewers.



Review #4

  • Please describe the contribution of the paper

    The paper proposed a method called Self-feedback Transformer (SFT) to improve label utilization, which aims at multi-label prediction even under the condition of missing and noisy labels. They applied the algorithm to predict various clinical variables related to pancreatic neuroendocrine neoplasms (pNENs) and investigated the effectiveness of their model in comparison with some existing methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The idea of incorporating the self-feedback mechanism into the transformer-based architecture is interesting and sounds reasonable to improve the label utility to handle label-noisy situations.
    • The various inference modes, including expert-assisted and expert-machine combination, are also interesting. Such an inference mode can be a basis for realizing a more trustworthy application of artificial intelligence in medicine.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Various real clinical scenarios are mentioned in the Introduction; however, the description of the problem’s scope seems somewhat scattered. It would be beneficial to narrow down the scope of the problem slightly, making it easier to understand what the proposed method aims to solve.
    • The focus on pNENs as a rare disease is interesting, but it is worth considering if the small dataset was the best choice for learning a transformer-based architecture.
    • A variety of clinical factors have been listed as targets for prediction, including some related to the response to anticancer drugs (e.g., RECIST) and others related to the success or failure of surgery (e.g., surgical recurrence). It is also not clear whether all cases were treated uniformly. For example, patients for whom surgery was the primary treatment strategy may not have received anticancer therapy and therefore RECIST data may be missing. In such cases, there is no clinical significance in predicting RECIST data. It may be necessary to organize the content of the prediction task from a clinical perspective.
    • Purely in terms of forecasting performance, there is weak evidence that the proposed method is clearly superior: in the validation on the external dataset in Table 1, the proposed method loses to the comparison method on several indicators.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility of the study would be low due to the lack of available source codes and datasets.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The clinical significance of the missing label situation is considered weak. It should be clarified whether the information is missing although it is really necessary or simply because it is clinically unnecessary.

    On the other hand, the idea of the self-feedback mechanism itself is interesting. For example, it could be applied to a system that identifies valuable information as an auxiliary input for predicting certain clinical factors.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the proposed method itself is interesting, its clinical issues are unclear, its predictive performance does not necessarily exceed that of existing methods, and its reproducibility as a study is not sufficiently ensured.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Thanks to the author’s responses, most of my concerns have been addressed. Therefore, I have raised my score from 4 (weak reject) to 5 (weak accept). While I don’t fully understand the rationale behind predicting counterfactual clinical variables in a treatment strategy that was not taken actually, I anticipate that the resultant high performance will provide a good discussion point at the conference.



Review #5

  • Please describe the contribution of the paper

    The authors present a model self-feedback model using a transformer. An image embedding (by a CNN) is repeatedly fed into multiple steps of “self-feedback” iterations where masked labels are reconstructed, before being used by a fully-connected layer for predictions. The approach is applied and evaluated on pancreatic neuroendocrine neoplasm datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Thorough evaluation on two datasets
    • Evaluation of the sensitivity to noise
    • Ablation experiments to compare the proposed approach to a) simpler techniques, and b) quantify the influence of sub-components
    • Demonstrated benefit on multiple metrics
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The writing of the paper could be greatly improved (grammar, spelling, and style)
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The models and networks used are described in the necessary details. One point I see room for improvement would be to provide information on the demographics of the patients in the dataset (e.g., in the supplementary information). I would appreciate if the authors made the code available when the paper is published.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Please spend some serious effort to improve the writing of the paper and spell/grammar check the text
    • The figures are well-prepared. Please extend the captions to be more self-contained and explain what they are showing.
    • Abstract: Please explain the abbreviation CAD at its first occurrence
    • The extensive results and ablation studies in the main document and supplementary material are greatly appreciated.
    • Consider having the same y-axis range in subplots.

    Minor / for later follow-up:

    • You mention that a label you used was tumor shrinkage. Would it not have needed a reference image/follow-up to judge this?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Method and evaluation is sound
    • Extensive experiments
    • The writing is severely lacking and reduces the clarity significantly
  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    I appreciate the clarifications by the authors and am moving my rating to accept.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This submission presents a novel method (‘Self-feedback Transformer’) to perform mult-label diagnostic assessment of pancreatic neuroendocrine neoplasms in CT data.

    The submission received mixed reviews that were overall positive (R2 - accept, R3 - weak-accept, R4 - weak-reject, R4 - weak-accept). As such, I recommend that the authors are invited to respond to the main criticism in a rebuttal.

    The authors should carefully consider all the reviewer comments, and address the main negative comments in the rebuttal, which can be broadly categorised as follows:

    • Unconvincing benefits of the proposed method (R3 - item 6.4, R4 - item 6.4)
    • Unfocused clinical goal (R4 - item 6.1), and imprese practical aspects / approach limitations regarding clinical issues and associated missing data (R4 - items 6.3 and 9)
    • Limited comparison of prior multi-label classification models (R1), and lack of appropriate baselines (R3 - item 6.4)
    • Limited technical description (R1 - items 6.1-6.4, R3 - items 6.2-6.3, R5 - item 8)
    • Improper use of ‘multi-disciplinary’ term (R3)




Author Feedback

We thank all reviews for their positive comments:

  • Great contribution (“R2-cleverly alleviates the problem”, “R3-simple and effective”)
  • Novelty (“R2-interesting”, “R3-the proposed model is of novelty”, “R4-interesting”)
  • Accurate results (“R2-significant”, “R3-sufficiently comprehensive”, “R5-Extensive experiments”)
  • Nice written. (“R2-well written”, “R3-quite clear”)

We thank AC for supporting our work as “novel” and summarizing 5 suggested rebuttal points.

  1. (R3&R4) Unconvincing benefits of the proposed method: Proposed LMS and SFS applied to SFT effectively improve performance, with the highest performance shown in Table 2.
    • (R3-“PS performs better than the SFT**”) The reduced performance of SFT compared to PS is anticipated as it involves training a transformer from scratch on small dataset. Our proposed LMS and SFS boost input variety by embedding label values and random masking, optimizing the transformer’s capabilities and yielding optimal performance.
    • (R4-“weak evidence…in Table 1”) The highest performance of proposed SFT on external datasets is shown in Table 2 (inference mode SF and EMC).
  2. (R4) Unfocused clinical goal, and imprecise practical aspects / approach limitations regarding clinical issues and associated missing data:
    • (Clinical goal) Our focus is to “accurately predict preoperative PNENS-related treatment indicators,” such as surgical results, drug effectiveness, prognosis, and immunohistochemical markers, vital for creating an efficient treatment strategy.
    • (Clinical significance of the missing label) Concerning “missing” labels, we argue that patients should have distinct labels for multiple treatments, and missing labels arise when a patient doesn’t undergo a particular treatment due to reasons like health or financial limits. Our SFT cleverly employs the SFS mechanism to link existing and missing labels, delivering improved outcomes compared to using only existing labels. This method emphasizes the clinical value of fully leveraging missing label data.
  3. (R3) Limited comparison of prior multi-label classification models (R2), and lack of appropriate baselines:
    • (R2-“compared are a bit less”) Our method comparison is comprehensive, featuring a selection of representative methods, including SOTA methods like CTran and ML-decoder.
    • (R3-“baseline PS is too weak”) We argue that our CNN-based PS method is a suitable baseline. The key innovation lies in the state embedding of labels, which enhances input data and transformer performance. Using only 1 CLS token, as suggested, might enrich the ablation experiment, but would not leverage label value embedding and would yield results similar to multi-class tokens, ultimately being inferior to our SFT method. Our comparative experiments sufficiently demonstrate the effectiveness of proposed approach.
  4. (R2&R3&R5) Limited technical description: Most content has been explained in the original text.
    • (R2-“embedded”, “norm layers”, “ experimental variance”, “noisy labels”) States (t=1) are embedded using predicted probabilities from t=0; Layer normalization is used; Cross-validation is used to reduce variance, and the substantial mAUC variance across multiple tasks (refer to Fig 4) is justifiable considering the multi-task approach; Noise is generated by randomly negating labels (changing label values x to 1 - x).
    • (R3-“Soft State Embedding”, “selection criteria”) “Soft State Embedding” is represented in Fig 1 with the gradient from blue to red; Doctors make the selection criteria for the main task based on treatment and prognosis markers.
    • (R5-“demographics”) Already in submitted supplementary material.
  5. (R3) Improper use of ‘multi-disciplinary’ term: The term “multi-disciplinary” is commonly used in clinical contexts to describe diverse medical fields such as radiology and pathology, etc. This aligns with our multi-label model, which accurately predicts pNENs imaging features, immunohistochemical markers, and treatment effectiveness.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Thanks to the strong rebuttal, all 4 reviewers recommend this work to be accepted. I also feel the authors did a strong job in the rebuttal to address the issues of R4. Because of the consensus agreement among all reviewers, I recommend that this work is accepted.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have clarified most of the major concerns of reviewers, specifically benefits of the method, comparison, baseline selection, clinical goal and significance, including pointing out some technical descriptions in the supplementary. I found the proposed method interesting, and the results demonstrate the effectiveness of the method and thus suggest acceptance. I will recommend the authors include all these additional clarifications in the final version.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    A decent work. Two reviewers increased their scores after reading rebuttal, making all scores from 4 reviewers at the level of acceptance.



back to top