Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yucheng Tang, Yipeng Hu, Jing Li, Hu Lin, Xiang Xu, Ke Huang, Hongxiang Lin

Abstract

Segmentation of the carotid intima-media (CIM) offers more precise morphological evidence for obesity and atherosclerotic disease compared to the method that measures its thickness and roughness during routine ultrasound scans. Although advanced deep learning technology has shown promise in enabling automatic and accurate medical image segmentation, the lack of a large quantity of high-quality CIM labels may hinder the model training process. Active learning (AL) tackles this issue by iteratively annotating the subset whose labels contribute the most to the training performance at each iteration. However, this approach substantially relies on the expert’s experience, particularly when addressing ambiguous CIM boundaries that may be present in real-world ultrasound images. Our proposed approach, called pseudo-label divergence-based active learning (PLD-AL), aims to train segmentation models using a gradually enlarged and refined labeled pool. The approach has an outer and an inner loops: The outer loop calculates the Kullback–Leibler (KL) divergence of predictive pseudo-labels related to two consecutive AL iterations. It determines which portion of the unlabeled pool should be annotated by an expert. The inner loop trains two networks: The student network is fully trained on the current labeled pool, while the teacher network is weighted upon itself and the student one, ultimately refining the labeled pool. We evaluated our approach using both the Carotid Ultrasound Boundary Study dataset and an in-house dataset from Children’s Hospital, Zhejiang University School of Medicine. Our results demonstrate that our approach outperforms state-of-the-art AL approaches. Furthermore, the visualization results show that our approach less over-estimates the CIM area than the rest methods, especially for severely ambiguous ultrasound images at the thickness direction.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43895-0_6

SharedIt: https://rdcu.be/dnwxO

Link to the code repository

https://github.com/CrystalWei626/PLD_AL

Link to the dataset(s)

CUBS: https://data.mendeley.com/datasets/fpv535fss7/1


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents a new active learning segmentation method. The method uses a teacher and student network, with the samples for active learning being selected using divergence between the two networks. The method is compared to other active learning procedures on the public CUBS dataset, and shown to obtain better performance. The authors also performed an ablation study to show the contributions of different parts of the model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Approaching active learning through the prism of a student-teacher network system is interesting and seems to be novel.
    2. The segmentation problem has real life importance (segmentation of the carotid artery) and the need for active learning is well justified.
    3. The experimental results are convincing, in that the algorithm does indeed achieve a higher performance than the contenders.
    4. The ablation study is well thought.
    5. The experiments on the in-house dataset do a good job of demonstrating noisy vs clean samples.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    “We set the number of AL iterations to 5 with a fixed labeling budget of 200, initial labeled data 159, unlabeled 1875, and test 1204.”

    The labeling budget is 200 samples. If that is provided to the algorithm “at once”, that seems like a very large batch. If this procedure is trying to simulate what might be required of human annotators, 200 samples might take hours to label, which I don’t think is realistic for a system with a human in the loop. If that is not the setting in which the system is designed to operate, the authors should be clearer about it.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The papers appears detailed enough for the result to be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    One thing that is typically shown for active learning methods is a plot of number of labels vs performance. This does not seem to be present in the manuscript, and it makes it difficult to assess how “fast” the proposed model learns compared to the baselines.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a compelling application, the method is sufficiently novel and the experimental results are good compared to the baselines.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors a novel active learning (AL) based method for segmentation of Carotid Intima-Media (CIM) Segmentation called Pseudo-label Divergence-based Active Learning which uses KL divergence between the predictions of teacher and student network to select candidate from unlabelled data for labelling. In addition, they also refine noisy labels as training progresses based on MIOUs.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    C1. The paper includes comparison of presented AL method with other methods in the field like entropy, coreset, etc. The results are detailed and includes the prediction comparison, test-performance metrics and clear winner indication C2. The proposed method seems to achieve better performance in comparison to other acquisition methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    C1. The general writing in the paper is not clear and can be improved a lot. To mention a few examples second last line in section 2.1, last lines in section 2.3 C2. The writing in 2.3 needs to be elaborated including the mathematical form of the loss function used in the manuscript, the language and flow needs to be clear, and the last paragraph is a mess to understand. C3. Most of the pseudo code in Algorithm 1 is not clear. See below: C3.1. In algorithm section, does line 9 indicate “fitness soars sharply at first but slows down after the model begins to fit noise” in second paragraph of section 2.3? C3.2. In algorithm section, line 10 is not clear and not properly explained in the method section. Please provide a detailed explanation. For example, since it is a sum of L2 of difference between $M^~$ and $M^l$ what would argmin even mean? What is $M^~$ to begin with? There needs to be consistency between theory and algorithm which is lacking here C3.3. What does M’ mean? Derivative of IOU M? If so, derivative w.r.t to what?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Authors meet all reproducibility criterias.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    C1. Could you provide comparison of different acquisition function and a baseline using same percent of data as in each acquisition round? Just wondering because at the end of active learning, the labelled set has ~60% data so would like to see how it works for each acquisition rounds C2. How is the acquired data from proposed method differ from acquired data from other acquisition function C3. How does the tuned/refined label look visually?

    Also see points in weakness section

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper provided results showing better performance in compare to the existing AL methods but the description of methodology particularly section 2.3 was not clear enough. There were certain discrepancy between the presented Algorithm and Methodology for example line 10 in Algorithm was not clear (also mentioned in weakness). Due to these reasons, I scored the paper with score of 5.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a new approach for carotid intima-media (CIM) segmentation in medical imaging using an active learning (AL) method called pseudo-label divergence-based active learning (PLD-AL).

    The paper contributes a novel strategy to train segmentation models using a gradually enlarged and refined labeled pool, providing accurate CIM segmentation results contributing to the clinical diagnosis of obesity and atherosclerotic disease.

    While active learning methods can help to reduce the amount of labeled data required, they still require human annotation, which can be time-consuming and expensive. The issue of dealing with ambiguous boundaries is also a common challenge in medical image segmentation, as different experts may have different opinions on the boundary location.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    One of the strengths of the paper is the use of a semi-supervised learning approach, which utilizes unlabeled data to partially address the lack-of-label issue. This is particularly useful in the case of CIM segmentation, where a large quantity of high-quality labels is not readily available.

    Another strength of the paper is the evaluation of the proposed approach on both the Carotid Ultrasound Boundary Study dataset and an in-house dataset from a pediatric hospital.

    The proposed approach could potentially contribute to the clinical diagnosis of obesity and atherosclerotic disease.

    Regarding the results, the authors include a statistical analysis of the values where the p-value has been listed.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the paper mentions that CIM segmentation can provide diagnostic evidence for obesity and atherosclerotic disease, it does not demonstrate the clinical feasibility of using the proposed approach in a clinical setting. It would be beneficial to demonstrate the usefulness of CIM segmentation in diagnosing these conditions and how the proposed approach can help clinicians in making better diagnoses.

    The way of using the data is not very clear. In particular, the proposed approach is a combination of active learning, pseudo-labeling, and divergence, which have been used in other contexts.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Overall, the paper provides enough information to allow someone to replicate the study with similar datasets and settings. Some exact details of the data preprocessing, augmentation, and hyperparameter tuning are not specified. For example, how many epochs were used? Is it 1000 training iterations?

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The topic is timely and important, and the approach you have taken is novel and innovative.

    The discussion section (embedded in conclusion) of your paper is rather brief and does not provide much insight into the implications of your findings. It would be helpful to expand on the discussion to provide more context for your results and discuss the potential impact of your work.

    It is not clear from the paper whether the datasets used in this work are publicly available or if they will be made available in the future. I am referring here to the in-house dataset. It would be helpful if you could clarify this.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper seems to make a valuable contribution, but there is still room for minor improvements in terms of clarity and implications.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This submission proposes an active learning approach for segmenting carotif intima-media ultrasounds. The originality resides in a pseudo-labeling approach that keeps consistency between an inner and outter loop of successive active selections. The methology builds upon a student-teacher model. The evaluation is on a public carotif ultrasound image datasets and an inhouse pediatric dataset.

    All three reviewers recommend Acceptance. The consensus is on the relevance of the proposed strategy for segmentating carotid ultrasound images. The validation supports the contribution with a comparison of several active learning approaches. While active learning for segmentation or the leverage of a semi-supervision-based student-teacher model may not be considered novel, its application contributes to the field with demonstrated advances using challenging ultrasound images.

    For all these reasons, recommendation is towards Acceptance.




Author Feedback

We would like to express our gratitude towards all the referees for offering high praise without hesitation. For the purposes of perfection and completion, we write the responses below to their constructive and professional comments:

  1. Novelty of PL-DAL [AC, R2] We would like to emphasize the technical novelty of PL-DAL that generalizes the original AL framework by adding the novel functions of augmenting and refining labeled pool. Specifically, the proposed student-teacher model can automatically select a portion of unlabeled data with the top-k divergence for expert annotating, which facilitates the AL model fast convergent to most sound data in unlabeled pool. On the other hand, the refinement module provides the process with the way to improve the weak annotation from experts in labeled pool, which makes the AL approach more robust. We have rephrased Sect 1 by adapting the above statements.

  2. Concerns on experiment and labelling budget [R1, R2, R3] We thank the referees for this valuable comment. In Sect 3, we design a preliminary experiment to verify the feasibility and effectiveness of our approach in a clinical scenario, while the used protocol for annotating CIM ultrasound images is not completely identical to the clinical practice. Upon considering labeling budget per annotation time, ultrasound experts may manually select representative sub-region and annotate discrete point pairs at the CIM interface instead of a complete mask at image level. Although we have simulated this process by means of appending the expert’s labels on small but representative ROIs, more delicate design of experiment will need to be considered in the future work, such as only weak annotations by experts. We have reflected those statements in Sect 3.1 and 3.2, resp.

  3. Request on extended experiments and more insightful discussion [R2, R3] Due to the page limit, we leave those requests and the extended experiments, such as the downstream task for measuring carotid intima-media thickness, to the extended journal paper in the future work. Moreover, we discuss the robustness of the proposed method to different types of weak labels generated by expert annotation points. We have appended those discussions into Sect 4.

  4. Minor revision on mathematical symbols and expressions [R2, R3] Thank you for assisting us in elaborating on all the mathematics in our first copy. We confirm that all the spotted typos and the unclear mathematical symbols have been clarified and corrected in the camera-ready version.



back to top