Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Thierry Judge, Olivier Bernard, Mihaela Porumb, Agisilaos Chartsias, Arian Beqiri, Pierre-Marc Jodoin

Abstract

Accurate uncertainty estimation is a critical need for the medical imaging community. A variety of methods have been proposed, all direct extensions of classification uncertainty estimations techniques. The independent pixel-wise uncertainty estimates, often based on the probabilistic interpretation of neural networks, do not take into account anatomical prior knowledge and consequently provide sub-optimal results to many segmentation tasks. For this reason, we propose CRISP a ContRastive Image Segmentation for uncertainty Prediction method. At its core, CRISP implements a contrastive method to learn a joint latent space which encodes a distribution of valid segmentations and their corresponding images. We use this joint latent space to compare predictions to thousands of latent vectors and provide anatomically consistent uncertainty maps. Comprehensive studies performed on four medical image databases involving different modalities and organs underlines the superiority of our method compared to state-of-the-art approaches.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_47

SharedIt: https://rdcu.be/cVVp2

Link to the code repository

https://github.com/ThierryJudge/CRISP-uncertainty

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The paper aims to integrate anatomical prior knowledge into uncertainty estimation via a novel method called CRISP via contrastive learning where the valid segmentations to the corresponding images are defined as positive samples.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The task tackled by the paper is highly relevant to the community especially for clinical deployments of deep learning methods. It is very important to have anatomical bias in the uncertainty maps to increase the interpretability of the confidence of the neural networks.
- The paper is well written and easy to follow.
- The evaluations are very rigorous with an evaluation metric that is proposed by the authors that i believe is very suitable for the task at hand.
- The qualitative results provided are very impressive an dpromising.
- Usually uncertainty estimation methods tend to boil down to edge detection as discussed in the paper and the method presented seems it goes fir the first time beyond being an edge detector.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- I would have liked to see more comparisons to generative approaches like using GANs or probabilstic U-Nets posterior distribution modeling to ground truth annotations. Given the complex latent space modeling of the paper more comparisons to simpler approaches for the latent space / similarity heuristics could be experimented.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I believe it is reproducible to a large extent.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

I believe the paper presents a novel idea for a very relevant task nowadays. iven the page limit this might not be possible; however, I believe more comparisons to baselines and different latent space modeling and similarity metrics for contrastive learning for a journal version of the paper would strebgthen the findings of the paper further.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method presented is novel and the results clearly show that the method is able to provide uncertainty estimates beyon edges. I believe the paper is promising and insightsfull for future research.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Not Answered
[Post rebuttal] Please justify your decision

Not Answered

Review #2

Please describe the contribution of the paper

The authors introduced the nearest neighborhood ensemble in the latent space as an uncertainty estimation. For that, the authors leveraged contrastive training between image and segmentation. The final uncertainty map is obtained from a weighted sum of error between the predicted mask and the k-nearest ground truth masks from the training data.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The authors leveraged contrastive learning for joint image-segmentation embedding, which leads to better calibration of predicted confidence and true probability.
2. While the formulation has drawbacks, the idea to have an uncertainty estimator based on an anatomically feasible shape prior has merit.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The authors did not clarify what kind of uncertainty this is. Given that they compared with Monte Carlo dropout, I assume it is epistemic. What guarantees that a perturbation in model weight will land on a similar latent vector in the hypersphere as found from the nearest neighborhood? It also raises questions about how compact is the latent space.
2. There is a major drawback to this formulation. The network produces perfect segmentation, which is identical to the ground truth. However, the nearest neighborhood ensemble will still produce uncertainty. Ideally, there should not be any uncertainty for a perfect prediction. How will one interpret the uncertainty in this scenario?
3. How is the method different from simply computing dice between y* and Y, then finding the k-nearest neighbor and computing the uncertainty?
4. Since shape is a major factor for medical image segmentation, how would the latent space disentangle shape and location information? How else has the location shift been taken care of in this study?
5. What are CRISP-MC and CRISP-LCE? These are not explained clearly.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors have provided adequate details to make the study reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
1. One needs to compare k-nearest neighbor uncertainty based on the mask level similarity like dice.
2. The uncertainty formulation has major flaws, as mentioned in weakness point no. 2. The authors need to fix this. Additionally, they need to explain what kind of uncertainty the ensemble captures. Is it epistemic or aleatoric?
3. Because segmentation from unseen data deviates from the training set, it does not necessarily have to be anatomically not consistent. Authors can try to encode the prior shape in the latent space to derive uncertainty in the predicted shape than just plausible segmentation ensembles.
4. Uncertainty-error overlap was introduced in the metric section but never used later in the experiment.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

3
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the experimental evidence is on par with existing uncertainty estimation, the proposed method has major flaws, as mentioned in point 2 in weakness. Further, it is unclear how the latent space k-nearest ensemble is different from a mask-level ensemble. Hence I recommend rejection.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

4
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

3
[Post rebuttal] Please justify your decision

In the rebuttal, the author did not answer convincingly regarding what uncertainty they are trying to model. The proposed uncertainty neither approximates the posterior nor the probabilistic likelihood. The proposed k-nearest neighbors-based solution is just an estimation of how the prediction differs from the samples seen during training. Further, this uncertainty is not calibrated as there will always be a bias in the data. Hence, I stick to my original rating.

Review #3

Please describe the contribution of the paper

The paper proposes a method for uncertainty estimation for image segmentation. The proposed architecture consists of a segmentation (shape) encoder, segmentation decoder, and image encoder. The architecture is trained such that image and corresponding shape representations are mapped to a similar location in the latent space while at the same time segmentation autoencoder is trained to reconstruct shapes. In test time, an image is encoded and M shape representations from the training set closest to the encoded image representation are found. Also, the most similar M images are reconstructed in the shape decoder. Uncertainty map is obtained as the weighted sum of the differences between the predicted segmentation and the reconstructed M shapes. Experiments are performed on 4 different datasets and the results are compared LCE, ConfidNet and MC_Dropout which shows improvement in many cases.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The idea of mapping images and segmentations to closer in the latent space and exploiting this information for uncertainty estimation is a quite interesting idea.
- The proposed method is evaluated on multiple dataset and show improvement compared to 3 existing methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- I think the only major weakness is the lack of summary of the relevant literature and comparison with them. There are methods that are developed to estimate uncertainty in image segmentation [1, 2, 3]. These methods are directly comparable with the proposed method and I believe the method should be compared to at lease 1-2 of them.
[1] Kohl et al. “A Probabilistic U-Net for Segmentation of Ambiguous Images”, Neurips 2018. [2] Baumgartner et al. “PHiSeg: Capturing Uncertainty in Medical Image Segmentation”, MICCAI 2019. [3] Monteiro et al. “Stochastic Segmentation Networks: Modelling Spatially Correlated Aleatoric Uncertainty”, Neurips 2020.

Although not directly related, there is another work which was proposed to estimate the segmentation accuracy [4]. I think mentioning this work would be interesting because working principles are very similar. While the proposed method selects M most similar images in the latent space, [4] exploits similarity in the image space after registration. I think the proposed method is more practical and sophisticated, it would be interesting to discuss similarities/differences with [4] in the paper.

[4] Valindra et al. “Reverse Classification Accuracy: Predicting Segmentation Performance in the Absence of Ground Truth”, TMI 2017.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Code is not available. The method is explained clearly in the paper and the experiments are performed on public datasets. So, I think it should be easy to reproduce the results in the paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- As I wrote in the weaknesses section, the only major weakness of the paper is the lack of comparison and discussion of the prior art. I would expect to see a summary of the more recent literature and comparison to some of them.
- There are some details that are unclear to me. 1) It is stated that “Edge” is applied to the predicted segmentation map. By which method were segmentation maps obtained?
2) CRISP is also evaluated on the outputs of MC-Dropout and LCE as stated in the first paragraph of page 6. Which samples of MC-Dropout and LCE were used for uncertainty estimation? Do methods CRISP-MC and CRISP-LCE in Table 1 correspond to these experiments?

3) I didn’t quite understand why domain shift is simulated in the test images as mentioned in the 2nd paragraph of page 6. If data augmentation was performed during training which is a standard approach while training networks, the simulated images may become in-distribution since they have been seen during training. What is the aim of applying data augmentation in test time? How would the results change if there was no this step?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I found the idea quite interesting and I am quite lean to suggest accept for the paper. However, lack of comparison and discussion of the relevant literature is a major weakness which prevent me to give a higher score. If the authors can justify why there is no comparison and discussion about the methods I mention, then I would be happy to increase my score.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Not Answered
[Post rebuttal] Please justify your decision

Not Answered

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper presents a new method for segmentation uncertainty quantification, based on a common latent space for images and segmentations, where k-nearest-segmentations from the training set are sampled at test time and aggregated to generate an uncertainty map for the test segmentation.

There is general concern regarding the placement in the literature and the comparison with alternative methods. In their rebuttal, the authors should discuss how their method relates to the state of the art.

Additionally, reviewer 2 has serious concerns regarding the method. While I do not agree with the statement that uncertainty should be zero for perfect predicted segmentations – in my opinion this depends on how certain the network is about that perfect segmentation – I believe the reviewer is correct that it remains unclear exactly what source of uncertainty the authors are modelling, and what would be desirable properties of this uncertainty. The authors should address this point in their rebuttal.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

8

Author Feedback

R1

More comparisons to generative approaches. -C.f. answer 1 of R3.

Comparisons to simpler approaches for the latent space/similarity heuristics. -It is hard to do this fairly as simpler latent space approaches such as [16] do not consider the relationship between the input image and the segmentation map and thus cannot be used “as is”.

R2

What kind of uncertainty? -Our uncertainty (Eq.(2)) contains \bar{y}_i and y. y* is the output of a third-party segmentation method which may suffer from aleatoric uncertainty (caused by data noise) or epistemic uncertainty (caused by a suboptimal model or a lack of training data). As for \bar{y}_i, it is the latent vector of an expert groundtruth and the summation over several \bar{y}_i estimates expert variability. In that sense, CRISP goes beyond other uncertainty estimation methods as it explicitly accounts for expert variability as well as aleatoric and epistemic uncertainty.

What guarantees that a perturbation in model weight will land on a similar latent vector in the hypersphere as found from the nearest neighbourhood? -CRISP is purely deterministic and contains no weight perturbation (Bayesian or dropout). Thus the same image always lands on the same point in the latent space.

There should not be any uncertainty for a perfect prediction. -Medical uncertainty does not imply being null in the absence of error since experts never totally agree (i.e. inter-expert variability). In our case, uncertainty should be low for good segmentation maps and close to the organ borders as shown in the paper.

Why not compute Dice between y* and Y, find the k-nearest neighbour and compute uncertainty. -The strength of CRISP is to account for the image x* when selecting the ground truth segmentation maps Y. If y* is erroneous, finding the nearest samples in Y regardless of x* often leads to incorrect uncertainty maps. We tested that alternative method but did not get any good results.

How would the latent space disentangle shape and location information? -Location and shape are encoded implicitly in the latent space. Both are encoded because comparing x* to Y without it would result in wrongful uncertainty maps.

What are CRISP-MC and CRISP-LCE? -CRISP estimates the uncertainty of segmentation maps produced by third-party methods. We tested it on the output of: 1) a baseline network, 2) an MC dropout network and 3) an LCE network. As uncertainty performance is related to segmentation accuracy, we believe this is a fairer evaluation.

Because segmentation from unseen data deviates from the training set, it does not necessarily have to be anatomically not consistent. Authors can try to encode the prior shape in the latent space. -We don’t fully understand this but encoding one shape prior doesn’t work since the network is not scale+rotation invariant. But since Y is made of ground truth shapes drawn independently of x*, their aggregation in Eq.(2) is in fact a generic prior.

Uncertainty-error overlap (UEO) never used. -Thank you, this will be clarified.. We proposed uncertainty-error mutual information as an alternative to UEO as it computes the same information but without a threshold.

R3

Lack of summary/comparison of the relevant literature. -Unlike CRISP, these methods must be trained on data annotated by multiple experts (not available for our datasets). They model uncertainty as inter-expert disagreement and not the uncertainty of erroneous predictions. Thus, a direct comparison with these methods is hardly feasible.

The “Edge” segmentation method. -“Edge” is an edge detector applied to a segmentation map.

Which samples of MC-Dropout and LCE were used? -C.f. answer 6 of R2.

Why domain shift is simulated in the test set. -It is to gauge how good methods are at evaluating uncertainty on poor segmentation maps. Since the test Dice scores of CAMUS are above 90%, test-time data augmentation reduces the Dice scores much like domain shift.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

I find that the authors answered the reviewers’ questions convincingly.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

7

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

I think that the main question the reviewers had was what uncertainty is being predicted. The authors have addressed this in their rebuttal in my opinion. Including rater uncertainty is important and is not covered by aleotoric and epistemic uncertainty. The paper presents a novel approach and could be an important contribution in uncertainty estimation.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

4

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

In this paper a method for estimating uncertainty in segmentation of medical images is introduced. The author’s apply their method to four different datasets, compare performance with SOTA and generate somewhat convincing results. I would expect to see confidence intervals or significance testing to convince that the better performance is statistically significant. The paper is well written and introduces a new approach and hence worthy for publication at MICCAI.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

8

back to top

CRISP - Reliable Uncertainty Estimation for Medical Image Segmentation