Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Amith Kamath, Jonas Willmann, Nicolaus Andratschke, Mauricio Reyes

Abstract

The U-Net architecture has become the preferred model used for medical image segmentation tasks. Since its inception, several variants have been proposed. An important component of the U-Net architecture is the use of skip-connections, said to carry over image details on its decoder branch at different scales. However, beyond this intuition, not much is known as to what extent skip-connections of the U-Net are necessary, nor what their interplay is in terms of model robustness when they are subjected to different levels of task complexity. In this study we analyzed these questions using three variants of the UNet architecture (the standard U-Net, a “No-Skip” U-Net, and an Attention-Gated U-Net) using controlled experiments on varying synthetic texture images, and evaluated these findings on three medical image data sets. We measured task complexity as a function of texture-based similarities between foreground and background distributions. Using this scheme, our findings suggest that the benefit of employing skip-connections is small for low-to-medium complexity tasks, and its benefit appear only when the task complexity becomes large. We report that such incremental benefit is non-linear, with the attention-gate U-Net yielding larger improvements. Furthermore, we find that these benefits also bring along robustness degradations on clinical data sets, particularly in out-of-domain scenarios. These results suggest a dependency between task complexity and the choice/design of noise-resilient skip-connections, indicating the need for careful consideration while using these skip-connections.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_29

SharedIt: https://rdcu.be/dnwDr

Link to the code repository

https://github.com/amithjkamath/to_skip_or_not

Link to the dataset(s)

https://scholar.cu.edu.eg/Dataset_BUSI.zip

http://medicaldecathlon.com


Reviews

Review #2

  • Please describe the contribution of the paper

    This paper challenges a well-known architecture with one of the most crucial components, the skip connection. The majority of papers submitted to miccai and conference of this kind are papers with marginal modifications of algorithms with marginal improvements. Not this paper, this paper is very original in challenging probably one of the most well established architectures and as such I think this paper is worth an oral presentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is exploring a series of strategies to determine the real efficacy of the skip connections of U-Net tested with synthetic textures and medical data. The problems are graded from harder to easier and then compared with the different strategies. This approach is objective and easy to test.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Very minor, the figures are hard to read due to the font size, given that the authors had half a page left, they could have made the images bigger.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Data is public Code is offered in github but removed for anonymity

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    This is an interesting paper. I would suggest that the texture experiment could be improved. The formulation seems to create an image with regions of different intensities more than different textures. It would be better to test with actual textures. There are many datasets around, but probably the best is still that proposed by Randen and Husoy in 1999 where they used a series of textures from the brodatz album and they histogram-equalized them so that they could not be distinguished by intensity but rather by texture. The authors could consider this for more robust experimentation.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Well-written, well though with only very minor weaknesses, worth an oral presentation which will surely prompt interesting discussion.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    Authors validated the performance unet variants on different datasets with different image quality.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    –bring an insight for readers about model (skip connection) design on different tasks

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    –Lack of preprocessing steps. –would a simple preprocessing step improve the performance for hard task? –It is really hard to infer the difference between variants in synthetic experiments without dice score –It would be great if other datasets are included for out-of-domain scenario –need more experiments for the proposed work, e.g. why chose to use speckle noise with variance 0.1 as hard task. For example, what’s the performance if the variance is 0.05? –UDA (unsupervised domain adaptation) is well developed today. the proposed work would be more impact if a simple UDA method could not solve out-of-domain problem. Otherwise, in practice, people may not consider the design of skip connections for segmentation model.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Discussion of clinical significance is not clear to me.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    –it would be more convincible if the experiments are followed pipeline from nnunet –nnunet –> no new net –I understand the reason why not to use the data augmentation in the proposed work. However, would the simple data augmentations solve such problems, i.e. easy and hard tasks? Although, for a real out-of-domain sample, data augmentation would change the texture characteristics, and such augmentations could be the solutions. –in domain performance might be a little bit lower than I expected, It would be better to perform a cross-validation on entire dataset

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please see weaknesses and comments

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    This study provides insights into the importance of skip-connections in UNet architecture for medical image segmentation tasks and their impact on model robustness and performance under different levels of task complexity.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The use of a novel analysis pipeline to evaluate the robustness of image segmentation models as a function of texture-based similarities between foreground and background distributions.
    2. The controlled experiments on varying synthetic texture images and evaluation of findings on three medical image datasets, which provide a comprehensive evaluation of the interplay between skip-connections and task complexity in UNet architecture for medical image segmentation tasks.
    3. The measurement of task complexity as a function of texture-based similarities between foreground and background distributions, which provides a quantitative measure for evaluating model performance under different levels of task complexity.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The limitation of varying only the foreground in synthetic experiments, which could demonstrate unexpected asymmetric behavior when background variations are considered.
    2. The lack of comparison with other state-of-the-art image segmentation models, which could provide a more comprehensive evaluation of the effectiveness of skip-connections in UNet architecture for medical image segmentation tasks.
    3. The focus on texture-based similarities between foreground and background distributions as the sole measure of task complexity, which may not fully capture all aspects of task complexity in medical image analysis.
    4. The limited discussion on potential failure modes and how they can be addressed using the proposed analysis pipeline, which could limit the practical applicability of the study in quality assurance frameworks.
    5. The lack of exploration into the impact of skip-connections on model interpretability and explainability, which is an important consideration in medical image analysis where model transparency is crucial for clinical decision-making.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper provides sufficient detail and resources for researchers to replicate and build upon their findings.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The authors have presented an interesting study on the interplay between skip-connections and task complexity in UNet architecture for medical image segmentation tasks. The paper provides valuable insights into the importance of skip-connections in UNet architecture and their impact on model robustness and performance under different levels of task complexity. However, there are some areas where the paper could be improved. Firstly, the authors should provide more details on how they generated the synthetic texture images used in the experiments. This would improve the reproducibility of the study and allow other researchers to build upon their findings. Secondly, while the authors have provided a comprehensive evaluation of skip-connections in UNet architecture for medical image segmentation tasks, it would be beneficial to compare their results with other state-of-the-art image segmentation models. This would provide a more comprehensive evaluation of the effectiveness of skip-connections in UNet architecture and help to establish their relevance in medical image analysis. Thirdly, while texture-based similarities between foreground and background distributions are a useful measure of task complexity, they may not fully capture all aspects of task complexity in medical image analysis. The authors should consider exploring other measures of task complexity that could provide a more comprehensive evaluation of model performance under different levels of task complexity. Finally, the authors only consider the two-classes segmentation problems. Does the conclusion of the paper still hold while dealing with multi-category segmentation problem?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is a well-written paper that provides valuable insights into skip-connections in UNet architecture for medical image segmentation tasks. With some improvements in reproducibility and comparison with other state-of-the-art models, this study could have even greater impact on medical image analysis research.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Whereas most UNet users take its architecture for granted, this paper is exploring the impact of one of the major UNet’s components: the skip connections. The robustness of the model and out-of-domain scenario are particularly investigated. Experiments are proposed on synthetic texture images (to explore segmentation task of various complexity levels) and three medical image data sets.

    Strengths:

    • original approach (challenging UNet architecture)
    • extensive experiments with an objective approach, in a controlled setting 
    • brings insight to UNet users

    Weaknesses: reviewers are challenging the authors’ choice in their experimental setting:

    • the texture model could be improved (varying only the foreground in synthetic experiments)
    • add comparison with other state-of-the-art image segmentation models
    • study the impact of skip-connections on model interpretability and explainability,
    • add other datasets
    • out-of-domain problem: discuss the results obtained with what UDA (unsupervised domain adaptation) can offer. However the reviewers do not raise critical issues in the experimental setting and conclusion.

    Given the great impact/scope of this study, and the sound experimental protocol presented herein, I suggest to accept this paper.




Author Feedback

Many thanks to all the reviewers for their constructive comments and responses. We address some of the major points as follows:

  1. How are these textures generated? a. We use two textures from Hoyer et al. and vary the alpha blending on the foreground to generate images with varying textures. We will include code to reproduce this in a GitHub repository.
  2. Asymmetric behaviour if background texture is varied. a. This is correct. In future work, we will consider re-doing this with the fore- and backgrounds switched. The broad point here is that skip connections cause variations in behaviour across tasks with varying complexity.
  3. Does this truly represent task complexity? a. In a narrow sense, but there could be other ways to encode task complexity: altering the shape of the foreground object/s, artificial masks in the image foreground, etc. We would like to develop a library of such robustness tests to evaluate architectures more broadly.
  4. Using other texture data sets – which vary by intensity rather than texture. a. We thank the reviewer for this suggestion. In future work, we will consider this and other data in the texture recognition space.
  5. Why is pre-processing not used? a. We consciously chose not to add more steps in the pipeline that could confound our interpretation of the results and only modify the network architecture. It is possible that including pre-processing steps can result in differing results: however, we think this is outside the scope of this study.
  6. How would data augmentation come into play with this analysis? a. We chose not to use any data augmentation methods that could impact the texture of the fore/background of the image. We can hence attribute variation in performance solely to network architecture. We also use the same random seed in all experiments. Future work could be to understand if the choice of data augmentation and network architecture is interrelated, and if so, how.
  7. Lower in-domain performance than expected. a. We agree with this statement and justify this by reiterating that our aim was not to achieve the highest in-domain performance. On the contrary, we set all the hyperparameters to be the same so that we can attribute variations in the results directly to the network architecture choice of using skip connections or not (and what is done in the skip connection itself, in the case of the Attention-UNet). It is true that tuning hyperparameters individually for each model could result in better in-domain performance.
  8. Reasoning behind using specific data-corruption methods (speckle noise and blurring) a. We do this specifically to change the texture similarity, as indicated in Figure 3. Other corruptions that change the texture similarity could also be fair game.
  9. Unsupervised domain adaptation: how does this study consider advances in this space? a. We thank the reviewer for this note and will consider exploring this space in more detail in future work.
  10. Why is nnUNet not compared with? a. Although nnUNet is state of the art, purely from a network architecture perspective, we thought it was like the UNet and hence did not consider this separately. We will consider exploring the pipeline in future work.
  11. How about other state-of-the-art models (maybe transformer-backbones?) a. We thank the reviewer for this comment and have plans to investigate more architectures in future work.
  12. How does this work help handle failure modes? a. We think that using this framework of measuring out-of-domain robustness, better choices of network architectures can be made specific to the problem at hand. Particularly, during test time, a pre-analysis of input data can help choose the right model setup.
  13. Connections with interpretability and transparency: a. We thank the reviewer for this comment and will investigate this in future extended work.
  14. Figures are hard to read. a. We thank the reviewer and will update the figures.



back to top