Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Ho Hin Lee, Quan Liu, Shunxing Bao, Qi Yang, Xin Yu, Leon Y. Cai, Thomas Z. Li, Yuankai Huo, Xenofon Koutsoukos, Bennett A. Landman

Abstract

With the inspiration of vision transformers, the concept of depth-wise convolution revisits to provide a large Effective Receptive Field (ERF) using Large Kernel (LK) sizes for medical image segmentation. However, the segmentation performance might be saturated and even degraded as the kernel sizes scaled up (e.g., $21\times 21\times 21$) in a Convolutional Neural Network (CNN). We hypothesize that convolution with LK sizes is limited to maintain an optimal convergence for locality learning. While Structural Re-parameterization (SR) enhances the local convergence with small kernels in parallel, optimal small kernel branches may hinder the computational efficiency for training. In this work, we propose RepUX-Net, a pure CNN architecture with a simple large kernel block design, which competes favorably with current network state-of-the-art (SOTA) (e.g., 3D UX-Net, SwinUNETR) using 6 challenging public datasets. We derive an equivalency between kernel re-parameterization and the branch-wise variation in kernel convergence. Inspired by the spatial frequency in the human visual system, we extend to vary the kernel convergence into element-wise setting and model the spatial frequency as a Bayesian prior to re-parameterize convolutional weights during training. Specifically, a reciprocal function is leveraged to estimate a frequency-weighted value, which rescales the corresponding kernel element for stochastic gradient descent. From the experimental results, RepUX-Net consistently outperforms 3D SOTA benchmarks with internal validation (FLARE: 0.929 to 0.944), external validation (MSD: 0.901 to 0.932, KiTS: 0.815 to 0.847, LiTS: 0.933 to 0.949, TCIA: 0.736 to 0.779) and transfer learning (AMOS: 0.880 to 0.911) scenarios in Dice Score. Both codes and pre-trained models are available at: Both codes and pre-trained models are available at: https://github.com/MASILab/RepUX-Net.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_60

SharedIt: https://rdcu.be/dnwD8

Link to the code repository

https://github.com/MASILab/RepUX-Net

Link to the dataset(s)

AMOS: https://amos22.grand-challenge.org/

MSD Spleen: http://medicaldecathlon.com/

KiTS: https://kits19.grand-challenge.org/

LiTS: https://competitions.codalab.org/competitions/17094


Reviews

Review #3

  • Please describe the contribution of the paper

    The manuscript introduces RepUX-Net, a segmentation architecture for 3D images based on large kernel block design and element-wise kernel reparameterization with Bayesian prior inspired by the spatial frequency in the human visual system. The proposed methodology has been tested on 6 publicly available datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Good manuscript rationale and interesting technical solution.
    • Ablation study and comparison with SOTA
    • Tested on public datasets
    • Statistical analysis in the results
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Limited novelty
    • Some ablation tests are missing
    • Very few qualitative results
    • Convergence analysis was not clearly discussed
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper should be very reproducible: the method has been tested on public datasets, and the code will be released upon acceptance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The manuscript introduces RepUX-Net, a segmentation architecture for 3D images based on large kernel block design and element-wise kernel reparameterization with Bayesian prior inspired by the spatial frequency in the human visual system. The proposed methodology has been tested on six publicly available datasets.

    Major strengths:

    • The manuscript has a clear and well-motivated introduction, followed by a detailed explanation of the proposed methodology.
    • The ablation study and comparison with SOTA provide a good understanding of the contributions of each component of the proposed method.
    • Testing on publicly available datasets increases the reproducibility and generalizability of the results.
    • The authors provide statistical analysis, although there are some concerns about the results, which can be addressed by showing a boxplot.

    Major weaknesses:

    • The method presented is interesting, but it is a combination of existing state-of-the-art strategies relying only on a fixed reweighting schema for features, which limits its novelty. It would be more interesting to test several distributions, as the authors suggested.
    • Table 2 shows some missing configurations for the ablation tests, such as SGD with BFR, which the authors should either add or explain why they haven’t been carried out.
    • The manuscript would benefit from more qualitative results to complement the quantitative evaluation.
    • The authors claim that the proposed design can influence the weighted convergence diffused from local to global in theory, but it is unclear how this design impacts the convergence. This point could be elaborated on more.

    Minor issues:

    • The text in Figure 2 is small and difficult to read, and several acronyms still need to be defined in the caption. The authors should address this to improve the figure’s readability.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is an interesting technical solution and needs small improvements.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    -The authors propose a new 3D CNN architecture (RepUX-Net), which relies on a kernel block design that enables the usage of larger kernels, e.g. 21x21x21. The architecture improves the model’s learning convergence and also adapts the receptive field for 3D segmentation tasks.

    • The convolutional weights are re-parametrised by considering a Bayesian prior and the goal is to better learn local to global information.
    • The new architecture is compared against existing 3D medical segmentation approaches in 6 different multi-organ datasets that demonstrate the model’s superiority.
    • The model was evaluated in three different settings: supervised learning, external evaluation on unseen datasets, and transfer learning.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors manage to conduct extensive studies to compare the benefits of their approach against 3D SOTA segmentation models.
    • Experiments on public multi-organ datasets were well designed and included, not only, statistical metrics (i.e Dice), but also additional measurements on the model’s capacity: number of parameters and FLOPs.
    • Additional ablation experiments showed how the model performs under different settings, which included: optimiser vs. training steps vs. learning rate, to cite some of the configurations. As such, the reader can easily follow up the robustness of the proposed approach.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Regarding the trade-off between model’s performance and number of parameters, it seems to be that the model improves by a small margin the Dice scores in some datasets, e.g. Table 1; against 3D UX-Net (k=7), Spleen: 0.981->0.984, Kidney: 0.969->0.970, Liver: 0.982->0.983.
    • Based on the previous mentioned results, it would have been interesting to better explain why the model performs better in some organs/datasets than others?
    • In addition, one hypothesis was that larger kernels should usually help to improve the results, however, this seems to be not always the case by comparing the results between 3D UX-Net k=7 vs. k=21, or 3D UX-Net k=7 vs. RepUX-Net.
    • Furthermore, an extended visual interpretation (qualitative experiments) would have been useful to further understand the benefits and challenges of the proposed approach over existing baselines. Some results were shown on Fig. 1 in the Supplementary material for 2 organs.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    -The paper gives enough details which can help its reproducibility. This includes, for instance, information about the dataset splits provided in the supplementary material.

    • Furthermore, the proposed architecture is visually represented by blocks (Fig. 2) or mathematically (Eq. 4), which aid the understanding and implementation of the proposed model. Furthermore, the authors intend to release its source code.
    • Additional information is further provided regarding the training procedures of the model, i.e. optimisers.
    • Finally, the impact on the segmentation results by tuning the model parameters are described. This information can be used for validating how far/close an own implementation is from to the reported results.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Overall the paper is well written and the ideas are clearly structured. By presenting a visual representation of the proposed approach together with related approaches, the reader can easily understand the novelty of the proposed method and its main differences with existing baselines.
    • Besides the statistical results presented, it would have been interesting to see how the model performs in inference time against its related baselines.
    • The authors focus on 3D segmentation, but the model can also be applied for 2D tasks. In an extended version of the paper, maybe it’s worth to also evaluate this.
    • The authors mentioned that a current limitation of the prior is its fixed shape. Besides exploring different distribution families, would it be actually possible to automatically learn what shape to consider to rescale the element-wise convergence?
    • For additional comments, please refer to the weakness part.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Trying to bring more global context information into CNNs in the form of larger kernels is a relevant topic, especially as Vision Transformers seem to benefit from this property. The reported results on different datasets aiming 3D multi-organ segmentation help to validate the hypothesis of the authors, namely the relevance of adapting the receptive field. In addition, experiments across different datasets against different baselines and under different settings demonstrate the superiority of the proposed approach.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #6

  • Please describe the contribution of the paper

    The authors adapt extremely large kernel convolution in encoder network for medical image segmentation, propose to model the spatial frequency in the human visual system as a reciprocal function, which generates a Bayesian prior to rescale the learning convergence of each element in kernel weights. They claim that the proposed method outperforms the current state-of-the-art methods on 6 challenging public datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method could be potentially effective and useful.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The clarity and organization of this paper can be further improved (see comments).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Nil

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    1.I would suggest the authors to move the Fig 1 in supplementary material to the paper and include more segmentation results for representative images from different organs/datasets. 2.How would the kernel size affects the performance of the proposed method? I would suggest the authors to answer this question and find the optimal kernel size.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is novel and practically useful. The clarity and organization of the paper should be further improved for readers to follow. More segmentation results for representative images from different organs/datasets are desired to better demonstrate the improvements of their proposed method over the current state-of-the-art methods.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    The manuscript proposed RepUX-Net, which improved the CSLA module and added different learning rates for different branches to make the network converge to a better solution. This improvement is significantly meaningful, enabling large kernels s to be applied to 3D medical image segmentation and improving the performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This strengths of this paper: (1) improve the large convolution kernel reparameterization method. (2) alleviate the issue of performance saturation or degradation caused by the use of large convolutional kernels in medical image segmentation (3) reduce computational complexity during testing through reparameterization

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The motivation for adding different learning rates to different branches is not prominent enough.The author may need to provide a more intuitive and reasonable explanation.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The proposed method can be clearly implemented from the description of the author’s method in the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    This paper addresses the problem of performance saturation or degradation caused by the use of large convolutional kernels in medical image segmentation networks. Mainly by adding different learning rates to the multi branch structure during the training process, and reducing computational complexity through re-parameterization during testing. Overall, this paper improved the large convolution kernel reparameterization method. The motivation for adding different learning rates to different branches is not prominent enough.The author may need to provide a more intuitive and reasonable explanation.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The author pioneered the use of the large convolutional kernel re-parameterization method for 3D medical image segmentation and solved the problem of saturation or degradation of segmentation performance. Provided an exploration method that can replace Transfomer in the field of 3D medical image segmentation.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This submission proposes a new strategy to model large kernels using a reparameterization scheme during training. The originality resides in leveraging a bayesian frequency prior to reparameterize convolutional kernel weights during grandient descent. The evaluation is on six public datasets, demonstrating a diversity of clinical scenarios.

    All three reviews have a consensus acceptance. The noted strength is on the original approach to tackle larger kernel blocks. Minor elements could be clarified in a camera-ready version.

    For all these reasons, recommendation is towards Acceptance.




Author Feedback

We thank all the reviewers and AC for the positive feedback and constructive comments towards our paper. Our paper is provisionally accepted with 1) practical and novel technical solutions, 2) well written with high clarity in idea and 3) sufficient experiments for generalizability. In this response, we further address the remaining concerns raised by the reviewers.

Clarification of Motivation and Novelty: • Reviewers 2 and 3 have raised concerns regarding our motivation and novelty. Previous re-parameterization strategies only demonstrate the benefits of the parallel branch design with small kernels in enhancing locality learning in large kernels (as depicted in Figure 1). Additionally, through our theoretical derivation of Stochastic Gradient Descent (SGD) in parallel branch design (supplementary 1.1), we have observed that both locality (small kernel region) and global learning convergences can be controlled by utilizing branch-specific learning rates. Building upon this insight, our motivation lies in adapting variable learning convergence across each large kernel element with a first-order optimizer. Unlike previous structural re-parameterization approaches, our novelty lies in simulating the behavior of the effective receptive field as Bayesian prior knowledge to rescale the weighting of each kernel element for SGD (equation 3). Moreover, the center-diffused behavior in the receptive field resembles the spatial frequency captured in our human vision system. We model the spatial frequency as a reciprocal distance function and generate a frequency-weighted map to reweight the kernel elements for optimization. This represents the first 3D re-parameterization approach to effectively adapt large kernel convolution with novelty.

Experimental Results: • Reviewer 1 has expressed concern about the subtle improvement observed in specific organs. The subtle improvement in internal testing specifically refers to the performance of the spleen, kidney, and liver, which can be considered as large organs and are relatively easier to segment. However, the volumetric morphology of the pancreas exhibits substantial variation, making its segmentation more challenging. Nonetheless, our results demonstrate the effectiveness of large kernel convolution, as evidenced by a significant improvement from 0.801 to 0.837. By scaling up the kernel sizes, we hypothesize that more meaningful context can be extracted between neighboring organs. With more classes to be segmented, we expect the large kernel convolutions to perform even better, resulting in a higher performance improvement in AMOS compared to FLARE. Reviewer 1 also raises concerns about adapting larger kernels to improve the results. The performance with increased kernel sizes in 3D UX-Net demonstrates that simply scaling up the kernel size cannot enhance the results. This observation may be attributed to either an unfavorable block design or limited kernel learning convergence. Therefore, we have employed a plain block design to minimize complexity in optimization and hypothesize that larger kernels with additional guidance in optimization should lead to improved results.

• Reviewers 2 and 4 have expressed concerns regarding the ablation studies conducted in different scenarios. We acknowledge that we have not evaluated the variable kernel size and adapting SGD optimizer with our proposed re-parameterization strategy. We will extend our idea to include 2D images and perform additional ablation studies in our journal version.

• Furthermore, Reviewers have raised concerns about the limited qualitative representations. To enhance the clarity of our innovation, we have included only the qualitative representation of two external evaluations in the supplementary material due to space limitations, considering the inclusion of theoretical derivations in both the main manuscript and supplementary material. We will further provide the full qualitative representations in our journal version.



back to top