Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yiqun Lin, Huifeng Yao, Zezhong Li, Guoyan Zheng, Xiaomeng Li

Abstract

Segmentation of 3D knee MR images is important for the assessment of osteoarthritis. Like other medical data, the volume-wise labeling of knee MR images is expertise-demanded and time-consuming; hence semi-supervised learning (SSL), particularly barely-supervised learning, is highly desirable for training with insufficient labeled data. We observed that the class imbalance problem is severe in the knee MR images as the cartilages only occupy 6% of foreground volumes, and the situation becomes worse without sufficient labeled data. To address the above problem, we present a novel framework for barely-supervised knee segmentation with noisy and imbalanced labels. Our framework leverages label distribution to encourage the network to put more effort into learning cartilage parts. Specifically, we utilize 1.) label quantity distribution for modifying the objective loss function to a class-aware weighted form and 2.) label position distribution for constructing a cropping probability mask to crop more sub-volumes in cartilage areas from both labeled and unlabeled inputs. In addition, we design dual uncertainty-aware sampling supervision to enhance the supervision of low-confident categories for efficient unsupervised learning. Experiments show that our proposed framework brings significant improvements by incorporating the unlabeled data and alleviating the problem of class imbalance. More importantly, our method outperforms the state-of-the-art SSL methods, demonstrating the potential of our framework for the more challenging SSL setting.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_11

SharedIt: https://rdcu.be/cVRYO

Link to the code repository

https://github.com/xmed-lab/CLD-Semi

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    Novel proposed techniques to improve semi-supervised learning. Specifically the techniques are 1) using weights for the loss that are dependent on the class, designed to address class imbalances that are common in segmentation problems. 2) patch selection that is dependent on the class. 3) sampling that is dependent on an estimate of uncertainty of the patch.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel methods for semi-supervised learning within a cross pseudo supervision framework are proposed and applied to a knee segmentation task o Specific methods include  A weighted loss that puts more weight on volumes that have fewer volumes. This favours the smaller classes, which is particularly important for the task of segmenting cartilage  Training patch selection that is dependent on the classes present that again is designed to favour improving the segmentation task  Patch selection based on the class distribution in the z direction o The observation that these features of the data and problem can be exploited is novel and clever. It is useful to a broad audience because these aspects could be useful in other image analysis tasks
    • ablation studies are considered to examine the effect of the 3 different strategies to improve performance
    • comparison with state-of-the-art methods is presented
    • the authors demonstrate significant improvements particularly in the cartilage segmentation tasks
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • There is little consideration of clinical translation. The methods developed do not seem to be designed specifically for knee cartilage segmentation this is rather a task to demonstrate the utility of the methods for improving semi-supervised learning. However the authors do not comment on how these methods can be translated for other tasks or how they affect clinical translation
    • The ablation studied do not consider all combinations of the methods. It is unclear how well PRC or DUS work in isolation
    • The authors do not present cross-fold validation results which would be helpful to understand the variability of the effect. It seems that the methods would be highly dependent on the choice of labeled and unlabelled specimens particularly because of the increased dependency on the segmentation distribution. Are the methods increasing performance at the expense of greater variability?
    • There are other methods based on similar ideas that could be commented on, focal loss or oversampling

    R. Zhao et al., “Rethinking Dice Loss for Medical Image Segmentation,” 2020 IEEE International Conference on Data Mining (ICDM), 2020, pp. 851-860, doi: 10.1109/ICDM50108.2020.00094.

    R. Mohammed, J. Rawashdeh and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” 2020 11th International Conference on Information and Communication Systems (ICICS), 2020, pp. 243-248, doi: 10.1109/ICICS49469.2020.239556.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducibility of the paper seems adequate. The data used is from an open dataset so would be accessible to future investigators wishing to reproduce the result. The methods and experiments are thoroughly described.

    Issues with reproducibility: Some of the methods could be more clearly described (see constructive criticism below). The code does not appear to be open. There is not very much detail on the software used to implement the algorithm and experiments. It is stated that Pytorch was used, the version was omitted, further the operating system was omitted. The hardware used as has only 1 detail, that a 3090 was used.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • There are grammatical errors throughout for example: o Abstract  “we statistic” should be “we did the following” o Introduction  “no ionizing radiation” should be “without radiation”
    • the word “statistic” is often misused
    • Acronyms should all be defined with their first use, for example CPS was not defined with its first use Methods
    • Script L and capital L in the equations seem to be used interchangeably. Pick one and be consistent or better define the distinction.
    • Sub-volume vs cropping patch vs volume. Are these terms used to refer to the same thing or is the distinction important? This should be made clear with definitions for each term or by using the same term. In the description of DUS and WL, the term volume is used; in the descriptions of PRC sub-volume is used; and in the experiments description cropping patch is used. It may be they are different things and this could be an important part, but this reviewer is confused on this point Experiments
    • How does the patch size relate to the thickness of the segmentations and the original image size?

    Figure 3

    • Is this a case where both CPS and the proposed method did not do well but the proposed method did much better. Can you quote DSC here?

    Table 2

    • Why did you not examine the effect of PRC or DUS or the combination of PRC and DUS (without WL) The analysis would be improved if these experiments were included to better understand the effect of each technique.
    • How sensitive is the result to the samples used as the labels? Did you do any cross-fold validation or experiments where you chose different samples as the labeled data. If these could be included this would help better understand the robustness of the result and also the reproducibility.

    The paper would be improved by a conclusion or discussion that puts the results in better context in terms of translation or generalization. It seems that the authors are not that concerned with the specific segmentation task but this is more of a convenient dataset to use to test out the methods. What barriers exist to using this method for other more challenging image analysis tasks? How useful will these approaches be beyond this task? How will the authors use the methods developed?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Significant improvement in performance
    • Novel approaches within an existing framework to improve semi-supervised learning
    • An important topic with potential for broader application however this could be better explained
    • Experiments could be more thorough
    • Discussion of the importance of the result could be better
    • not very much consideration of clinical translation
  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    In this paper, the authors regard the MR knee bone and cartilage segmentation as a class imbalance problem with barely labeled data. In order to handle the segmentation under this situation, they utilize the cross pseudo supervision (CPS) to build a baseline of semi-supervised segmentation. Then the authors adjust the label distribution using the proposed probability-aware random cropping, class-aware weighted loss, and dual uncertainty-aware sampling supervision (i.e., the proposed CLD method). In the experimental part, the CLD obtains the best performance by comparing with other related approaches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    I think the proposed method is an effective extension of the baseline CPS. The proposed strategies to calibrate the distribution of labels could work well for knee bone and cartilage segmentation. Especially for the knee cartilages, even under the few-sample condition, the proposed modules could obtain relatively high results for the FC and TC in a semi-supervised framework. From an engineering point of view, this is an effective work.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The basic semi-supervised framework used in the paper is based on CPS [3] (i.e., section 2.1), and its ethodological contribution is not notable. The major contribution could be in the section 2.2, to relieve the class imbalance problem in the segmentation of knee bones and cartilages using 3D MR data. In section 2.2, the “Probability-aware random cropping” may be effective, but this part could only be counted as an improvement for the preprocessing part, not for the core part of the semi-segmentation framework.

    And the “Class-aware weighted loss” module is a minor improvement of class weighting, comparing with some traditional way, like the inverse rate of the volume of one class to the whole.

    At last, other papers, e.g. [5], it also had some similar adaptive/dynamic weighting approaches. I do not find some substantial improvement in yours.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I think the reproducibility is OK, for the baseline has opened their code, and the implementation of the three modules for addressing the class imbalance problem is not difficult.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. Although your experimental results beat some related papers, I do not find a clear description about how your modules have substantial improvement or difference to others, and how your method obtains the increase in quantitative values for the knee segmentation problem.

    2. Visual comparisons in 3D are important. With these comparisons, I can clearly “improved” segmented results in detail in the 3D space.

    3. If possible, you should also give segmentation results with different number of labeled and unlabeled data, respectively. And the unlabeled data could also be different modalities (The OAI usually uses DESS MR data, but in hospital, T1 and even T2 are very common). With these experimental settings, you could show your method have a higher extensibility and effectiveness.

    4. In the second section in the introduction part, you did not clearly state how your work addresses the shortcomings of these related articles.

    5. In Fig. 3, “Comparison of segmentation results with CPS [13]”, the CPS is citation [3]?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Some effectiveness for a medical application/problem.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper focuses on a class imbalance problem in automatic deep-learning based multi-class knee structure segmentation and proposes a novel solution by combining class-aware weighted loss, probability-aware random cropping, and uncertainty-aware sampling supervision. The ablation study supports the addition of each approach.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This is a well written manuscript focused on a common problem of class imbalance that often occurs in multi-class segmentation processes. The study provides equally novel solution based on a model centric strategy of combining weights to distribute class imbalance, probabilities for image cropping, and sampling supervision for uncertainty arising due to unlabeled data. The experiments demonstrate the clinical utility of the approach on a publicly available dataset with a strong evaluation of network predictions and using an ablation study.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The conclusion is not at all written in the manner it should be and it repeats the contribution of the study as already mentioned in its introduction. This is the only weakness of this paper. Apart from this, all sections are clearly written.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Data is available as a public repository. Algorithms are not available, but methods are described very well, so their implementation will be very easy.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This paper is well written. There are not many comments from the reviewer. However, there are many typos in the paper that need author’s attention. Also, abbreviations used should be defined at their first occurrence.

    As mentioned earlier, the conclusion is not at all what is expected. Conclusion should focus on what this study achieves - specifically, and can it be made general in certain way. Also, few discussion points on why you see improvements in the ablation study is important.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Most of the paper is neatly written with a good problem definition and novel methodological solution. Only conclusion needs rework.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper addresses the class inbalance problem in semi-supervised learning for segmentation models by extending the previously published cross pseudo supervision method. Two of the reviewers commented that the paper is sufficiently novel with enough experiments to warrant acceptance. One of the reviewers commented that this is an effective engineering application but had concerns regarding the incremental technical contributions. My opinion is that while the technical contributions are somewhat incremental, they are interesting nevertheless. Furthermore, the papers demonstrated improvements for a specific application, segmentation of soft tissues in knee MRIs which is a contribution in its own right.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3




Author Feedback

Meta-Reviews: Thank you for the comments. We will revise the manuscript accordingly.

Reviewer #1: Thank you for the comments. We will revise the manuscript for better understanding. – Q1: The effectiveness of PRC or DUS. A: Table 2 shows the effectiveness of the proposed modules, including PRC and DUS. We have conducted experiments without WL (only with PRC or DUS), and the improvements were consistent with Table 2. – Q2: Cross-fold validation. A: We have tried to use different selections of labeled data, and the improvements are consistent with Table 1 and Table 2. Due to the page limitation, we cannot provide all experimental results (Q1&Q2) in the current version. – Q3: Generalization to other datasets/tasks. A: Weighted loss (WL) and uncertainty-aware supervision (DUS) can be used in other semi-supervised segmentation tasks to solve the class imbalance problem. The cropping (PRC) is a specific design for the knee segmentation dataset, where the cartilages are extremely thin and have a smaller cropping probability along the z-axis than hard tissues. – Q4: Comparison with oversampling. A: In fully-supervised learning, we can crop more (oversample) sub-volumes in cartilages for training. However, in semi-supervised learning, most of the input images have no segmentation labels. In this situation, cropping with positional distribution would be a better way to process on-the-fly than utilizing pseudo labels. – Q5: Reproducibility. A: Code will be available at https://github.com/xmed-lab/CLD-Semi. ——– Reviewer #2: Thank you for the comments. – Q1: Methodological contribution. A: In this work, the major contribution is the proposed framework that simultaneously addresses the class imbalance and barely supervision in the knee cartilage segmentation task. In the probability-aware random cropping, although the sampling probability is pre-computed before training, this module is used as an augmentation strategy during training (similar to random cropping) and should be one part of the framework. This module is a specific design for the knee segmentation dataset, where the cartilages are extremely thin and have a smaller cropping probability along the z-axis than hard tissues. In addition, directly adapting [5] to CPS brings trivial improvement (see Table 1). We re-formulate the update equation and utilize two uncertainty banks for cross sampling supervision, which is more effective and reasonable. – Q2: Introduction. A: We have addressed these problems in the third paragraph. – Q3: More experiments and visual results. A: With a higher labeling ratio, the improvements compared to CPS will decrease. 1% labeled data: +3.8% dice; 2% labeled data: +2.6% dice; 5% labeled data: +1.5% dice. Due to the page limitation, we cannot provide detailed results and more visual examples in the current version. ——– Reviewer #3: Thank you for the comments. We will revise the manuscript for better understanding.



back to top