Authors

Shangqi Gao, Hangqi Zhou, Yibo Gao, Xiahai Zhuang

Abstract

Although supervised deep-learning has achieved promising performance in medical image segmentation, many methods cannot generalize well on unseen data, limiting their real-world applicability. To address this problem, we propose a deep learning-based Bayesian framework, which jointly models image and label statistics, utilizing the domain-irrelevant contour of a medical image for segmentation. Specifically, we first decompose an image into components of contour and basis. Then, we model the expected label as a variable only related to the contour. Finally, we develop a variational Bayesian framework to infer the posterior distributions of these variables, including the contour, the basis, and the label. The framework is implemented with neural networks, thus is referred to as deep Bayesian segmentation. Results on the task of cross-sequence cardiac MRI segmentation show that our method set a new state of the art for model generalizability. Particularly, the BayeSeg model trained with LGE MRI generalized well on T2 images and outperformed other models with great margins, \textit{i.e.}, over 0.47 in terms of average Dice. Our code is available at https://zmiclab.github.io/projects.html.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16443-9_35

SharedIt: https://rdcu.be/cVRyP

Link to the code repository

https://zmiclab.github.io/projects.html

Link to the dataset(s)

https://acdc.creatis.insa-lyon.fr/#

https://zmiclab.github.io/zxh/0/mscmrseg19/

Reviews

Review #1

Please describe the contribution of the paper

The paper proposes a method referred to as Bayesian segmentation, or BayeSeg, for probabilistic image segmentation with the purpose of improving method generalizability. The method uses a probabilistic autoregressive model consisting of a combination of ResNets and UNet to decompose the input image into a basis and a contour. The contour is used to infer the segmentation, as it is argued to be more site and modality independent. The approach is tested for cross-sequence and site cardiac MRI segmentation with superior results to U-Net and the probabilistic U-Net.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The approach tackles generalizability of deep learning segmentation models in medical imaging which is obviously relevant to the MICCAI community.
- As far as I am aware the approach, such as the use of image decomposition and modelling of image basis and contour to achieve more generalizable representation, is novel and the architectural choices in this make sense to me.
- The results are surprisingly good, with much smaller reductions in Dice scores when the method is applied to new modalities and datasets than the compared to approaches.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- While the language is fine, I find the paper hard to follow. I would have preferred more motivation and intuition in explaining the model details. This is understandable given the short MICCAI format, but the paper does repeat information in a number of places, so a more conscisely written manuscript could have had room for this.
- Only cardiac substructures and two problems - generalization to a different site and sequence is attempted. I would have preferred more experiments to see if this type of modelling is also useful for different modalities and structures.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- The reproducibility file mentions code will be available, but this is not mentioned in the paper. Given the relatively complicated modelling, code availability could be important for reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- Title is hard to make sense of before reading the paper.
- Bayesian segmentation or BayeSeg is a very general name. Consider coming up with something more specific that actually describes what the model does.
- The Introduction is repeated somewhat needlessly in the Methodoly section.
- I find what the image is decomposed into somewhat unclear. E.g. what is the basis and contour? Some intuitive explanation could help.
- Similarly with the graphical model, what is the line and boundary? What is the reasoning behind these variables?
- What is the matrix D_x in practice, can you give an example?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I am impressed by the solid improvements in generalizability compared to methods which are otherwise the goto techniques in the field. While I have problems following all details of the technique, it appears the auth ors have had a good idea that is worth publishing.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Somewhat Confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

A deep learning based segmentation algorithm that decomposes shape (“contour”) and appearance (“basis”) in order to better generalize across imaging protocols, centers and population bias.

A dedicated analysis on cardiac MRI data shows substantial improvements in terms of transfer learning (e.g. training on LGE MRI, testing on T2 MRI etc.) compared to vanilla u-nets and related state-of-the-art.

While the overall idea may not be brand new (e.g. tested for multimodal registration https://arxiv.org/pdf/1903.09331.pdf), the detailed model appears to be novel and seems to provide a thorough derivation and convincing results.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- definitely an interesting topic to all the segmentation & learning community, particularly given the challenges with image appearance and leveraging multimodal data.
- thorough derivation
- convincing evaluation results
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

A complex but strong pipeline - I don’t see any significant weaknesses.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- the authors use public datasets, which is great
- code would be nice as the model is complex
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Just a typo to correct on page 6: replace “underwent cardiomyopathy.” with “suffers from cardiomyopathy”.

I don’t have any other comments for this manuscript.

For future work (beyond MICCAI), it would be nice to test on larger pools of data, to see whether these benefits still reproduce (or could be mitigated by brute force “no data like more data”) :-)
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

8
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Highly relevant to the community and a seemingly new model, thoroughly evaluated with convincing results.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The author proposed a deep learning-based Bayesian segmentation framework that decomposes an input image into components of contour and basis. The segmentation label is then inferred from the contour component, which varies less across different MRI sequences. Unlike conventional Bayesian methods, the framework is implemented with neural networks, where three CNNs were trained to infer the posterior distributions. The author evaluated the proposed solution on public databases, significant improvement in segmentation accuracy was observed.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

I believe the proposed method is a novel approach and addresses an important challenge in the medical image segmentation field. The theoretical analysis of the framework design seems to be relatively thorough. And the use of CNNs for posterior distribution inference avoids some technical challenges that could be difficult for conventional Bayesian methods. Quantitative evaluation of the proposed method was done properly.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Even though the paper is relatively well written, I found some detail about the network training is missing, which could be helpful for the reader to understand the proposed method. The author used 3 CNNs to infer the posterior distributions. However, it is not clear whether the three networks were trained together in an end-to-end fashion (which seems to be what the text implies) or trained separately or in an alternating order with individual ground truth computed from ground-truth segmentation(which is easier for me to understand). I believe the latter is more plausible because, in an end-to-end setup, there is a lack of regulation to force the two ResNets to focus on either high-pass or low-pass components. Another missing detail is how to generate the final segmentation labels. As the U-Net outputs distributions, a post-processing step is needed to convert the distribution to segmentation labels. Finally, as the contour and basis decomposition remind me of the high-pass and low-pass filters, it might be good to include a baseline U-Net trained on high-pass filtered images to demonstrate the strength of the proposed method.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Public image databases were used. However, some training details are missing.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

The author proposed a deep learning-based Bayesian segmentation framework that decomposes an input image into components of contour and basis. The segmentation label is then inferred from the contour component, which varies less across different MRI sequences. Unlike conventional Bayesian methods, the framework is implemented with neural networks, where three CNNs were trained to infer the posterior distributions. The author evaluated the proposed solution on public databases, significant improvement in segmentation accuracy was observed.

I believe the proposed method is a novel approach and addresses an important challenge in the medical image segmentation field. The theoretical analysis of the framework design seems to be relatively thorough. And the use of CNNs for posterior distribution inference avoids some technical challenges that could be difficult for conventional Bayesian methods. Quantitative evaluation of the proposed method was done properly.

Even though the paper is relatively well written, I found some detail about the network training is missing, which could be helpful for the reader to understand the proposed method. The author used 3 CNNs to infer the posterior distributions. However, it is not clear whether the three networks were trained together in an end-to-end fashion (which seems to be what the text implies) or trained separately or in an alternating order with individual ground truth computed from ground-truth segmentation(which is easier for me to understand). I believe the latter is more plausible because, in an end-to-end setup, there is a lack of regulation to force the two ResNets to focus on either high-pass or low-pass components. Another missing detail is how to generate the final segmentation labels. As the U-Net outputs distributions, a post-processing step is needed to convert the distribution to segmentation labels. Finally, as the contour and basis decomposition remind me of the high-pass and low-pass filters, it might be good to include a baseline U-Net trained on high-pass filtered images to demonstrate the strength of the proposed method.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I believe the proposed method is a novel approach and addresses an important challenge in the medical image segmentation field. The theoretical analysis of the framework design seems to be relatively thorough. Even though some training details are missing, the overall quality of the paper is good.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper presents a deep learning segmentation algorithm that decomposes shape (“contour”) and appearance (“basis”) in order to better generalize across imaging protocols, centers and population bias. Reviewers noted the novelty of the work. I would recommend considering Reviewer #1 recommendation related to the title. In your camera ready version add details on network training as suggested by reviewer #2.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

2

Author Feedback

We thank the meta-reviewer (MR) and reviewers (R1,2,4) for their very constructive and thoughtful comments, which have greatly improved the clarity of our manuscript. We have summarized several main comments with corresponding responses.

-MR and R1 suggested that the title could be more informative. Thanks for this insightful comment. To make the title more informative, we will revise it to “Joint Modeling of Image and Label Statistics for Enhancing Model Generalizability of Medical Image Segmentation” in our camera ready version. This new title is aimed to point out our task (Medical Image Segmentation), goal (Model Generalizability), and method (Joint Modeling of Image and Label Statistics).

-MR and R4 asked whether the training is end-to-end. Thanks for this insightful comment. Our training is end-to-end. We will clarify the training manner in our camera ready version. Concretely, one ResNet could output “contour” while the other could output “basis”, due to the settings of hyper-parameters with respect to “line” and “inverse variance”. Once proper hyper-parameters are set, “contour” and “basis” can be adaptively balanced in the end-to-end training, so that “contour” is greatly consistent with labels, meanwhile the sum of “contour” and “basis” is close to images. Therefore, our method is particularly different with the two-stage training method: (1) extracting high-pass and low-pass components, and (2) obtaining segmentation labels from the resulting high-pass component.

-R1 suggested that intuitive explanation of modeling could be helpful. Thanks for this helpful comment. As MR and R2 commented, an intuitive understanding of “contour” and “basis” is shape and appearance, respectively. In our graphical model, “line” is aimed to detect whether there is a “jump” between two neighboring pixels of an image, while “boundary” is aimed to detect whether there is a “jump” between two neighboring pixels of a label. Therefore, D_x is used to measure the similarity of neighboring pixels, e.g., by computing the first-order difference.

-R4 asked how to generate the final segmentation labels. Thanks for this helpful comment. In statistics, mode indicates the sample with the most occurrences for a given distribution. Since the mode of Gaussian distribution is its mean, we takes the mean as the final segmentation label. We will clarify this in our camera ready version.

-R1 and R4 expected more validation in our future work. Thanks for this insightful comment. In our future work, we will extend our framework to semi-supervised and unsupervised segmentation, and validate its effectiveness using more modalities and anatomies.

back to top