Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Thijs P. Kuipers, Erik J. Bekkers

Abstract

Regular group convolutional neural networks (G-CNNs) have been shown to increase model performance and improve equivariance to different geometrical symmetries. This work addresses the problem of SE(3), i.e., roto-translation equivariance, on volumetric data. Volumetric image data is prevalent in many medical settings. Motivated by the recent work on separable group convolutions, we devise a SE(3) group convolution kernel separated into a continuous SO(3) (rotation) kernel and a spatial kernel. We approximate equivariance to the continuous setting by sampling uniform SO(3) grids. Our continuous SO(3) kernel is parameterized via RBF interpolation on similarly uniform grids. We demonstrate the advantages of our approach in volumetric medical image analysis. Our SE(3) equivariant models consistently outperform CNNs and regular discrete G-CNNs on challenging medical classification tasks and show significantly improved generalization capabilities. Our approach achieves up to a 16.5\% gain in accuracy over regular CNNs.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_25

SharedIt: https://rdcu.be/dnwAX

Link to the code repository

https://github.com/ThijsKuipers1995/gconv

Link to the dataset(s)

https://medmnist.com


Reviews

Review #1

  • Please describe the contribution of the paper
    • addresses the problem of roto-translation equivariance on volumetric data.
    • group conv in 3D
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Clearly defined problem statement
    • Clear contributory factors both in theory and expt.
    • Expts are on MedMNIST found satisfactory.
    • Reasonably well developed theory.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Why only limited to medical data? why not in general machine learning?
    • Not clear that what is so special about suitability of proposed formulation in medical data ?
    • It seems to applicable to natural images as well. why? why not?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    yes

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • k-fold cross validation may be added.
    • Calculate equivariance map for better validation of proposed method w.r.t. SOTA
    • Sec 4.1 may be improved for better readability.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    see the strengths of the manuscript.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper proposes to use separable group convolutions in a 3D group convolutional type neural network. The model is tested on standard mednist data, and results are encouraging as compared to a number of existing models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper continues the work on building natural invariance into neural networks using the well-founded group theory. It is generally well-written.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The approach ignores 3 fundamental issues:

    1. How problematic is it that rotation in terms of Euler angles apparently used is non-unique?
    2. What is the space of separable kernels used and what limitations do they impose on the model?
    3. On which data, is the proposed a good model, i.e., humans are generally not rotational invariant, since the head is up and feet down.

    In general, I’m always a bit wary of works, which only summarize the effect of models by various losses. I find that these numbers can hide important insights into the quality of the solutions a given model produces.

    Besides, I have a few issues with the notation:

    • (1) the co-domain of k and f is not defined and hence the product function is unclear.
    • I presume that (2) is the lifting of f(x) into f(x,R), but this is unclear since f is used as a function on both SE(n) and R^n.
    • p. 4: I don’t understand the notation f^i and f^o. Are you thinking of it as a vector function? Is it then no longer defined on SE(n)?
    • p. 5: You describe 1 CNN baseline, but your table contains 2. K-CNN and T-CNN are not introduced
    • a minor detail: Table 1, it’s surprising that more than doubling the number of parameters only increases the memory pressure by about 33%.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    In general, neural network articles contain too few details to be reproduced due to the complexities of the models. I do, however, believe I will be able to get a knowledgeable master student to re-implement and reproduce similar results based on the article’s description.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    see above.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See above

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes the use of a continuous separable SE(3) group convolution kernel in 3D CNNs for 3D image analysis. The authors separate the SE(3) kernel in a continuous SO(3) and a spatial convolution kernel. They approximate the continuous group integral by randomly sampling discrete equidistant SO(3) grids. The continuous SO(3) kernels are parameterized via radial basis function (RBF) interpolation. The proposed method is evaluated on various MedMNIST datasets and compared with standard 3D CNNs and G-CNNs (discreetly SE(3) invariant).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A solid and simple framework is proposed for 3D rotation invariance in CNNs. The paper is well written, the method well motivated, described and seems sound. The work is put in the context of the existing literature, with a good related work including 2D and 3D equivariant frameworks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I have some concerns with the experiments and how the proposed method would work in a real clinical scenario. I fear the experiments (e.g. tiny images, artificially rotated medical images etc.) may not be representative of real-world data. See detailed comments below.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code will be shared upon acceptance. The dataset is public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. I think the standard CNN is not trained with rotation data augmentation. It should be used to compare the built-in equivariance vs learned equivariance. Also test time augmentation.

    2. The input images are very small, how does it scale with larger images more common in medical imaging?

    3. The proposed approach comes with an expected big increase of computational complexity in terms of computation time and memory. With larger images it could become impractical to use?

    4. It is only briefly mentioned (“This reduces the advantages of SE(3) equivariance.”), but often rotation invariance may not be needed in 3D medical imaging, with acquisition performed with patients roughly aligned to the scanner axes. So generating rotated versions of the images as done in the test set may not be representative of real test data. It may be beneficial when looking at 3D images that are rotated in real settings e.g. more locally at a tumor or data other than medical. It would be good to comment on this. Results that show the benefit of the approach on real data would be good.

    5. The difference with [19] (3D steerable CNNs), which has full SE(3) equivariance, should be discussed. The results should also be compared.

    Minor comments: It is not clear what is meant by “overfitting to this discretization” and why it is amplified for 3D models.

    “input signals f^i and f^o …” Why are these called input signals if f^o is the output feature map?

    “Max spatial pooling is applied after the first residual block. Before the final linear layer, global pooling is applied to produce SE(3) invariant feature descriptors.” Clarify this part. Is it a 2x2x2 max pooling ? The SE(3) invariance is only for the equivariant model. Is it global spatial and rotation pooling?

    Can you clarify which groups the K and T-CNNs correspond to ? It would clarify the sample resolution which I’m not sure I understand. They have 4 and 24 rotations respectively ?

    Typos: “this has been shown hurt model performance” “coefficient corresponding to R_i and w_i”, w_i should be bold? “The first layer maps to 32 channels The residual blocks…”

    Some literature that could also be relevant: Fuchs, Fabian, et al. “Se (3)-transformers: 3d roto-translation equivariant attention networks.” Advances in Neural Information Processing Systems 33 (2020): 1970-1981.

    Andrearczyk, Vincent, et al. “Local rotation invariance in 3D CNNs.” Medical image analysis 65 (2020): 101756.

    Finzi, Marc, et al. “Generalizing convolutional neural networks for equivariance to lie groups on arbitrary continuous data.” International Conference on Machine Learning. PMLR, 2020.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The quality is good, yet the multiple limitations in the experiments, as mentioned in the detailed comments, led to this overall score.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    I thank the authors for answering the comments. The rebuttal does not change much my previous evaluation. The paper is good overall, and some parts can indeed be improved with minor modifications as suggested by the authors. However my main concerns remain. Some comments, as mentioned in the rebuttal, can help clarify some aspects, but more experiments and other types of data would be required in my opinion to fully motivate/exploit/evaluate the contribution.

    1. No comparison with rotation augmentation. The authors answered that test time augmentation was used. But I didn’t see it in the paper. I’m not sure I understood it correctly, sorry if I missed this. Besides, I think the argument for incorporating inductive biases over data augmentation is sufficiently explicit already. Yet data augmentation (even test-time should be fine) is used by most non-inherently invariant models and therefore would be a normal baseline comparison. 2-3. The applicability of the method to real clinical data remains a question.
    2. The main results are on rotated medical images. I still do not get the point it makes for real clinical data.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper addresses roto-translation equivariance for volumetric data.

    The reviewers are largely positive about this paper, finding it well developed in terms of theory, well written, and with a clear contribution.

    At the same time, several concerns are also mentioned. The concern regarding whether the approach is more generally applicable than medical imaging is not a problem for the acceptance of the paper as long as the method is also relevant for medical imaging. Instead, for the rebuttal, the authors should focus on addressing the following concerns:

    • The three main concerns of reviewer 2
    • Concerns regarding experimental validation of reviewer 3.




Author Feedback

First and foremost, we would like to thank all the reviewers for their time and valuable feedback, allowing us to continue improving our work. We are excited to hear the manuscript was found concise, well-written, and as a result, easy to read and understand. The reviewers’ suggested improvements fit well within the current text, requiring only minor edits to the manuscript to address their suggestions and feedback.

Reviewer 2 (R2) asks about possible problems arising from the non-uniqueness of the Euler representation for SO(3). However, the non-uniqueness is not a concern since we use the quaternion representation, which is well-suited for interpolation and sampling. While quaternions form a 2-to-1 mapping to rotations, this is easily dealt with. Importantly, from an end-user perspective, no knowledge of the used representation is required. Sampling, interpolation, and the generation of uniform grids are all handled automatically. All that is needed is setting the group kernel resolution.

R2 asks about the scope and possible limitations of the separable kernel. The separable kernel is strictly less expressive than its non-separable counterpart, which should be pointed out in the manuscript. In the SE(3) case, the kernel cannot represent features containing different spatial configurations at different orientations. Despite this, prior work has shown that separable group kernels are desirable due to the significantly increased parameter efficiency and improved performance over the non-separable variant.

R2 and reviewer 3 (R3) share concerns about the usefulness of SE(3) equivariance in medical settings, as data is often orientation-aligned. However, this alignment is generally not present on every feature level, while our method does benefit from equivariance on every feature level. We agree that this should be made more explicit in the manuscript. We emphasize that in cases with pre-alignment, our approach is still beneficial, as we demonstrate on OrganMNIST3D. Model generalization improves, and more geometrically meaningful representations are learned.

R3 suggests evaluating the baselines trained on augmented data to which degree equivariance can be learned. We did not do this because prior research has shown that augmentation is disadvantageous over including inductive biases for several reasons: it is less data-efficient, especially in the 3D case, it only constrains the model as a whole, instead of individual layers, and G-CNNs learn more geometrically meaningful representations, as they generalize in a way that is consistent with the symmetry. We did augment during the evaluation. We will make the argument for incorporating inductive biases over data augmentation more explicit in the final manuscript.

R3 asks whether the method can generalize to higher-resolution samples. For our evaluation, we used low-resolution samples. However, regular group convolutions and their equivariant property are not reliant on the input resolution. Thus, our method does generalize to higher resolutions. A higher resolution could be more beneficial due to reduced interpolation errors and more detailed features. We will address this in the final manuscript.

We share R3’s concern about scalability to higher-resolution samples due to high computation and memory usage. The relatively high computational cost and memory usage is a weakness of G-CNN, something we should emphasize more in the manuscript. Similarly to CNNs, this can be reduced by chunking inputs and running smaller batches, with increased runtimes as a trade-off. At the same time, the concern of increased computational cost and memory usage also holds for data augmentation.

We agree with R3 that we should mention the difference with steerable CNNs. Our evaluation solely focuses on the comparison to convolutions, because the simple replacement of conv-layers is our primary goal, which yields significantly improved model performance and generalization.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This was a paper with mixed reactions. On the one hand, the reviewers appreciated the theoretical developments and well written paper. On the other hand, the reviewers also had both a number of questions, which were addressed in the rebuttal, as well as concerns regarding the experimental validation. More precisely, the reviewers requested a demonstration of the utility of rotation equivariance within medical applications, as well as a demonstration that the rotation equivariant models outperform “encouraging” rotation equivariance via augmentation. The authors instead refer to previous research where augmentations have been at a disadvantage.

    While the reviewers might very well be right, the MICCAI audience is highly mixed between methodological and empirical researchers, and large parts of our community want to see how models work on their data before they believe it. As such, the paper is, in its current state, not that suited for the MICCAI audience, and I can unfortunately not recommend acceptance.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The method that the work has proposed is interesting and sound, and has also demonstrated its promising performance on MedMNIST dataset. The rebuttal has addressed the raised concerns to some extent, but I found that some of the arguments may need evidence to support them, such as the benefit of the SE(3) equivariance on feature level and the claimed more geometrically meaningful representations. I agree with the review that the experiments were only performed on small MedMNIST dataset and it is without evidence how it can perform on real data. Despite these, I think that the methodology is still interesting and seems well-founded, and will be of interest to the community. Evaluation on real data will definitely be needed.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper proposes a group convolution method that is claimed to be novel. Unfortunately, I am not familiar with with relevant literature to judge the novelty. Nonetheless, the presentation of the method is not clear. There also seems to be errors. For example, Eq 2 seems to be wrong since there should be an R on the left side, and Eq (3) makes no use of f^i and f^o that are mentioned just before the equation. Overall, I am inclined towards rejecting the paper because of lack of clarity.



back to top