Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Alvaro Gonzalez-Jimenez, Simone Lionetti, Philippe Gottfrois, Fabian Gröger, Marc Pouly, Alexander A. Navarini

Abstract

This paper presents a new robust loss function, T-Loss, for medical image segmentation. The proposed loss is based on the negative log-likelihood of the Student-t distribution and can effectively handle outliers in the data by controlling its sensitivity with a single parameter. This parameter is updated during the backpropagation process, eliminating the need for additional computation or prior information about the level and spread of noisy labels. Our experiments show that T-Loss outperforms traditional loss functions in terms of dice scores on two public medical datasets for skin lesion and lung segmentation. We also demonstrate the ability of T-Loss to handle different types of simulated label noise, resembling human error. Our results provide strong evidence that T-Loss is a promising alternative for medical image segmentation where high levels of noise or outliers in the dataset are a typical phenomenon in practice.



Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_68

SharedIt: https://rdcu.be/dnwB2

Link to the code repository

https://robust-tloss.github.io/

Link to the dataset(s)

https://challenge.isic-archive.com/data/

https://www.kaggle.com/datasets/yoctoman/shcxr-lung-mask


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces a novel loss function for medical image segmentation that aims to be more robust to annotation errors. The proposed loss is based on the negative log-likelihood of the student-t distribution, which is assumed to be more tolerant to outliers and noise compared to the normal distribution. The authors gradually increase the amount of simulated error in the segmentation labels during training and evaluate the different methods using accurate labels during testing. Results demonstrate that the T-Loss outperforms other methods in the presence of increasing levels of label noise. Overall, the paper presents a promising approach to improving the robustness of medical image segmentation models to annotation errors.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is generally well-written and easy-to-follow, except for some minor typos and grammatical errors.
    • The proposed loss function is novel and well-explained. The authors provide a good motivation for using the student-t distribution.
    • Results demonstrate that the T-Loss outperforms other methods in the presence of increasing levels of label noise.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The authors split the data into training and test sets without using a validation set. This is not a major issue if the test set is used only to evaluate the performance of the different methods. However, if the test set was used to select the best model and tune the hyperparameters, then it would have been better to use a validation set.
    • It would be interesting to see the performance of the proposed loss function on more challenging datasets, where the contrast between the foreground and background is lower, and the annotation errors are more frequent, maybe in a future work.
    • It would also be interesting to see more qualitative samples to better understand the impact of the proposed loss function on the segmentation results.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors used two public datasets, provided most of the implementation details needed to reproduce the results. They provide a toy example for the loss function but it not clear if they plan to release the code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The paper needs to be revised for some typos and grammatical errors.
    • In equation 1: please define what $v$ is.
    • Please provide more details about the morphological transformations, e.g. the erosion/dilation structure and values of its parameters (a disk of radius X)?
    • Table 1 and 2: What are the values in brackets? Standard deviation? I would prefer to use (plus or minus), the more standard notation, ie $mean \pm std. dev$.
    • Figure 1: This is the performance on which set, the test set?, write that to be clear.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is generally well-written. The loss function is novel, well-motivated and well-explained. The evaluations are thorough and demonstrate the effectiveness of the proposed loss function. The paper is relevant to the community and tackles an important problem.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors propose a novel loss based on negative log-likelihood of the Student-t distribution that can handle outliers in the data by controlling noise sensitivity with a learnable parameter. The proposed method was validated on two public datasets: ISIC and Shenzhen. Results show statistically significant differences for high noise levels compared to other losses.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • In general, the paper is well written and well designed for experiments.
    • The paper presents a novel loss function based the t-distribution for improving noise resilience.
    • The method is extensively validated against other loss functions.
    • Results show that the additional learnable parameter introduced by the loss function converges to a stable solution.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • There is a small error in the initial formulation of the t-distribution probability density function, but it does not effect the final loss formulation.
    • Results indicate that the method is more resilient to label noise compared to other loss function. However, it looks like there are other methods, which are not based on loss modification, that achieve better noise resilience, for example the method in [1]. The authors state rightfully that these methods are more complex, and therefore a solution based on loss function modification has its place.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors provided the loss function implementation along with a toy example, which makes it highly reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The is an error in the initial formulation of the t-distribution probability density function (equation 1): the “ \left | \Sigma \right |^{1/2}” part of the equation should be in the denominator instead of the numerator. Consequently, in equation 2 the “-\frac{1}{2}log|\Sigma |” should be “\frac{1}{2}log|\Sigma |”. Note that the error does not effect the final loss function formulation as this term is equal to zero and thus discarded later on.
    • The loss formulation (equation 4) is not the final formulation as it includes \mu term which is approximated by \mu=f_{w}(x_{i}). Then, \delta is given with the term f_{w}(x_{i}) which makes it difficult to follow. I think it will be useful to provide the formulation steps at least in the supplemental material.
    • It is not very clear why the loss formulation is resilient to noise. The final loss equation needs a more detailed explanation. It will be useful to provide mathematical derivations of this sentence to make it more clear (either in the main text or in supplemental materials): “For δ → 0, the functional dependence from δ reduces to a linear function of \delta^{2}, i.e. MSE. For large values of δ, though, eq. (4) is equivalent to log δ, thus penalizing large deviations even less than the much advocated robust Mean Absolute Error (MAE).”
    • This paper compares the t-loss to other loss functions but does not provide a comparison to other noise-robust solutions such as the one presented in [1]. Comparing the results of the same noise levels for the ISIC dataset, it looks like the method presented in [1] reaches better segmentation results compared to the proposed method based on the T-loss (e.g. for α = 0.3, β = 0.5 it reaches a Dice score of 84 compared to a Dice score of 0.809 reached by the T-loss method. This should be mentioned as a limitation in the Conclusion section.

    [1] Shuailin Li, Zhitong Gao, and Xuming He. “Superpixel-Guided Iterative Learning from Noisy Labels for Medical Image Segmentation”. In: MICCAI. Vol. 12901. 2021, pp. 525–535. doi: 10.1007/978-3-030-87193-2_50.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper presents a novel T-loss function which was shown to be more robust to label noise compared to other losses. The additional parameter introduced by the loss was shown to converge to a stable solution. Despite its high robustness to label noise compared to other loss functions, looks like it is not the state-of-the-art solution for label noise.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors propose a new loss function for robust segmentation of medical images. Motivated by the desirable properties of Student’s t-distribution, specifically its tolerance of noise, the new T-loss handles shows robustness in the presence of noisy segmentation labels, simulated by morphological operations and affine transformation on ground truth masks to emulate human error. Experiments on ISIC 2017 (not 2018) and Shenzhen dataset show the superiority of T-loss over the competing robust losses.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well structured, with a very well-written introduction supported by appropriate references.

    2. Student’s t-distribution appears to be a good fit given its noise tolerance.

    3. Quantitative results show a clear improvement with the T-loss on the ISIC dataset, and small improvements on the Shenzhen dataset. The latter can likely be explained by the fact that the noise-free Dice scores are high on Shenzhen, and that other robust losses also do a decent job on Shenzhen.

    4. The Jupyter notebook is helpful for understanding the loss implementation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The final loss term is not explained clearly. In the paragraph after Eqn. (4), the authors write that for large values of \delta, Eqn. (4) is equivalent to log (\delta). It is not clear how.

    2. The authors acknowledge that “pixel annotations in an image are not independent”, yet their formulation relies upon it for efficient computation. How limiting is this assumption and how do the segmentation outputs get affected by it?

    3. Regarding the \nu-Dice sensitivity (Fig. 2 third column), the y-axis scale (Dice) makes it extremely difficult to assess how sensitive Dice is to changes in \nu. There is no need to have the scale span from 0 to 1. A much narrower range, say [0.7, 0.8], would be much better. Moreover, the authors should add at least a couple of sentences about the key takeaways from this analysis, since at the moment, there is no discussion about this figure in the text.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have done a good job providing details about the train-test splits including the stratification for Shenzhen, the training hyperparameters (learning rate, batch size, optimizer), the hardware used (GPU), the repeated experiments (3 random seeds), and the statistical testing. The code implementation of the loss is also helpful.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The way standard deviations have been reported in Tables 1 and 2 is not intuitive/easy to follow. I had to refer to the text to see that 0.788(7) means 0.788 ± 0.007. Please consider formatting the standard deviation differently.

    2. The dataset used is ISIC 2017 challenge dataset, not ISIC 2018. See the ISIC website (https://challenge.isic-archive.com/data/#2017) for more details.

    3. There are some typos in the paper:

    • Sec. 1, page 2: “robust loss functions enable joint optimize model parameters and variables” -> “robust loss functions enable joint optimization of model parameters and variables”.

    • Sec. 2, page 3: “be the its noisy annotated binary segmentation mask” -> “be its noisy annotated binary segmentation mask”.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While some components of the paper could be improved for clarity (see weaknesses), this is a good paper, well-written for most parts, that proposes a new robust loss function based on sound properties of the Student’s t-distribution, and the authors have fairly extensively validated it on 2 datasets and against competing methods.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    In this paper, the authors propose a novel loss based on negative log-likelihood of the Student-t distribution that can handle noisy labels, by controlling noise sensitivity with a learnable parameter. Two public datasets: ISIC and Shenzhen are used for validation.

    Strengths:

    • well-written paper, as unanimously noted by the reviewers
    • convincing method and experimental setting

    Weaknesses:

    • missing notation, error in formulas, inaccuracies, missing explanation, have been noted by all 3 reviewers. This does not impact the scientific merit of the paper, but correcting them will definitely improve the readability of the paper.
    • it is noted that a method addressing noisy label by proposing not a robust loss, but a noise-resistant network architecture [Shuailin Li et al, MICCAI’21] seems to obtain better results on ISIC. The pros/cons of the proposed method and Shuailin Li et al’s method should be discussed in the conclusion.
    • additional experiments on more challenging datasets would help strengthen the scope/impact of the proposed loss.




Author Feedback

We would like to thank the reviewers for their valuable feedback and constructive comments on our paper. We have considered suggestions and addressed concerns to the best of our ability.

Concerning the formulation of the T-Loss, we fixed the typographical confusion between the scale matrix and its inverse, and provided extended formulae in the supplementary materials. These include explicit substitutions and a detailed derivation of the limits reported in the main paper.

We clarified the current limitations of our approach and indicated possibilities to improve them with an additional paragraph in the conclusions. This includes acknowledging more explicitly that more engineered approaches to learning with noisy labels outperform the T-Loss, while the latter provides superior results compared to other robust losses. Also, the independence assumption for pixel-wise annotations, which arguably limits the accuracy of our segmentations, has computational implications and cannot easily be lifted to check its validity. Given these observations, it will be very interesting to explore combinations of the T-Loss with super-pixel image representations, dense conditional random fields, or iterative label refinement.

Regarding the absence of a validation set in our study, we intentionally did not include one as we did not tune hyperparameters. The test set is therefore only used to report the performance of the different methods, as observed by reviewers.

Erosion and dilation operations for generating noisy synthetic labels are performed according to the cited prior work by Li, Gao and He. In particular, the binary_dilation and binary_erosion routines of scipy.ndimage are iteratively applied until a target percentage of pixels added or removed is reached. We added a reference to the original implementation for convenience.

Finally, some typographic choices are dictated by formatting constraints. The notation used in Tables 1 and 2 for uncertainties is fairly standard, and in our opinion improves readability given the template. Similarly, we cannot include significantly more qualitative samples due to the 2-page limit in the supplementary material.

We genuinely appreciate your feedback, and we strongly believe that these revisions have significantly enhanced the clarity and quality of our paper.



back to top