Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Qiang Ma, Liu Li, Vanessa Kyriakopoulou, Joseph V. Hajnal, Emma C. Robinson, Bernhard Kainz, Daniel Rueckert

Abstract

Cortical surface reconstruction plays a fundamental role in modeling the rapid brain development during the perinatal period. In this work, we propose Conditional Temporal Attention Network (CoTAN), a fast end-to-end framework for diffeomorphic neonatal cortical surface reconstruction. CoTAN predicts multi-resolution stationary velocity fields (SVF) from neonatal brain magnetic resonance images (MRI). Instead of integrating multiple SVFs, CoTAN introduces attention mechanisms to learn a conditional time-varying velocity field (CTVF) by computing the weighted sum of all SVFs at each integration step. The importance of each SVF, which is estimated by learned attention maps, is conditioned on the age of the neonates and varies with the time step of integration. The proposed CTVF defines a diffeomorphic surface deformation, which reduces mesh self-intersection errors effectively. It only requires 0.21 seconds to deform an initial template mesh to cortical white matter and pial surfaces for each brain hemisphere. CoTAN is validated on the Developing Human Connectome Project (dHCP) dataset with 877 3D brain MR images acquired from preterm and term born neonates. Compared to state-of-the-art baselines, CoTAN achieves superior performance with only 0.12±0.03mm geometric error and 0.07±0.03% self-intersecting faces. The visualization of our attention maps illustrates that CoTAN indeed learns coarse-to-fine surface deformations automatically without intermediate supervision.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_30

SharedIt: https://rdcu.be/dnwDs

Link to the code repository

https://github.com/m-qiang/CoTAN

Link to the dataset(s)

https://biomedia.github.io/dHCP-release-notes/


Reviews

Review #1

  • Please describe the contribution of the paper
    • Diffeomorphic cortical surface reconstruction based on attention mechanism between multi-resolution features.
    • Age conditional framework to specialize it on neonatal cortical surface
    • Fast end-to-end framework, which it only requires 0.21 seconds to infer the white matter and pial surface on each subject.
    • Robust evaluation to validate its proposed methods on the various viewpoint.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper proposes a fast end-to-end framework, Conditional Temporal Attention Network (CoTAN), which is underlined on widely used attention mechanism and diffeomorphism. It shows the several strengths:

    • A robust model for neonatal cortical surface reconstruction which adapts to diverse ages accordingly: it is desirable to condition the age information to the attention mechanism to integrate the multi-resolution features, which enables the model robust to surface structural variability according to ages.

    • Fast inference time: compared to existing approaches [16, 21, 28] that generates multiple stationary velocity fields sequentially with multiple networks, the proposed method enables to generate the integrated single velocity field based on attention mechanism conditioned on integration time step, which enables to reduce the runtime.

    • The proposed method is empirically validated with strong evidence: The paper effectively demonstrates the numerical effectiveness of several proposed elements, such as age-conditioning, time-varying velocity field, and multi-resolution features based on an attention mechanism.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The results presented in the paper show promising results in terms of both time and accuracy. However, some aspects, such as comparisons and control of time-varying velocity field, may need further clarification.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The proposed method seems reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The paper proposes an end-to-end framework for neonatal cortical surface reconstruction. The proposed method utilizes an attention mechanism and diffeomorphism to adapt to diverse ages and reduce inference time. Empirical validation shows the effectiveness of the proposed method, including age-conditioning, time-varying velocity fields, and multi-resolution features based on an attention mechanism. While there are some merits to these methods, certain points need to be clarified. Please see my comments below:

    • To verify whether the proposed TVF effectively reduces self-intersection by ensuring diffeomorphism, conducting an ablation study on the metric, self-intersecting faces would be beneficial. This could help evaluate the effectiveness of the proposed method in mitigating self-intersection issues.

    • Could the authors please clarify whether the reported runtime of 0.21 seconds is solely for the model inference or if it also includes preprocessing and postprocessing steps? It is possible that the integration over the time-varying field may take longer than the stationary ones. In comparison to CortexODE [22], which involves computing a single stationary velocity field, is there a noticeable difference in the model inference time between the proposed method and CortexODE? If there is a difference, are there any additional advantages of the proposed method that could account for it?

    • In the supplementary material, could the authors provide more details about the qualitative evaluation in Fig. 3? It is difficult to clearly observe significant improvements in geometric error with the proposed method compared to previous approaches. Could the authors describe which specific regions showed significant improvements, and in what qualitative aspects the proposed method demonstrated improved performance compared to existing methods?

    • Does the proposed method guarantee a spherical topology (i.e., no holes/handles)?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While some points are unclear, I believe the paper has merits that outweigh its weaknesses.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper describes a DL-based approach to diffeomorphic deformable neonatal-cortical surface extraction/reconstruction conditioned by age and time. The approach was trained in a supervised manner using “ground truth” generated using the dHCP structural neonatal pipeline [24] and showed better performance compared to that of recent DL-based cortical surface reconstruction algorithms.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The novelty lies in the modelling of a diffeomorphism as a weighted average of velocity fields, where the weights and the velocity fields are neural network functions. In other words, the proposed method estimates a diffeomorphism as the sum of velocity fields for R multiscale and M sampled “time points”, each weighted by the conditional function of the desired time point and infant age.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Naming and description of method can be changed to better describe and reflect the underly approach. The paper describes the mechanism of conditional weighting as attention, not necessarily corresponding to an attention mechanism known in DL communities [1]. The conditional weighting is more similar to feature modulation in which a condition vector (in this case a vector of time and age) is used to determine affine coefficient vectors (typically scaling and shifting) to be applied to feature maps (e.g., each scaling factor for each channel or for each set of M channels) [2,3]. The paper additionally describes that the network estimates RxM SVFs. Does not M represent sparse temporal sampling? If not, what does M represent? If it does, should velocity fields be stationary? The exact type of input is unknow. The begining part of the paper says input is MRI, but the latter part says surface.

    [1] A. Vaswani, et al. Attention Is All You Need, arXiv:1706.03762, 2017. [2] X. Wang, et al. Recovering Realistic Texture in Image Super-Resolution by Deep Spatial Feature Transform, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 606-615 [3] E., Perez, et al. FiLM: Visual Reasoning with a General Conditioning Layer. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), 2018

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Although code will be made available upon publication, the clearly described method will help reproduce the work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The paper describes a DL-based approach to diffeomorphic deformable neonatal-cortical surface extraction/reconstruction conditioned by age and time. The approach was trained in a supervised manner using “ground truth” generated using the dHCP structural neonatal pipeline. The novelty lies in the modelling of a diffeomorphism as a weighted average of velocity fields, where the weights and the velocity fields are neural network functions. However, description of method can be improved. Should the conditional weighting be called attention? Is RxM velocity stationary? Also, the exact type of input is unknow. The beginning part of the paper says input is MRI, but the latter part says surface. -Fig. 3 shows that a single U-net returning R resolution within 1 forward pass. How could the authors make sure that the returning maps capturing features at different R scales. -Does the network output a point set representing vertexes or a point set and a set of normal vectors? Fig. 3 should be modified to better reflect the output of the network. -What are the inputs and outputs to the network at inference time? Is an input the whole brain MRI or each brain hemisphere in an MRI image? -What are the minimum and maximum SVF displacements? Do output defined on the input grid ranging -1 to 1? How could one map it back to an orginal physical space based on image spatial information as defined in Dicoms (e.g., image origin, direction, and spacing)? -The paper describes that the multiscale features was upsampled and M static velocity fields (SVFs) were learned for each resolution. Fig. 3, however, does not capture any learnable modules/functions between R feature maps before being upsampled and RxM SVFs. How does the approach learn RxM SVFs? The objective function formula should be written in its complete form? Otherwise, a citation reference to each loss term should be provided. -The 877 T2 MRI was acquired from how many subjects? Were the training/validation/testing data splitted according to subjects (meaning no same subject used in both training and testing)? How many iterations did each epoch have? -Please include p-value to justify the significance of the performance improvement. Did Vox2Cortex and CorticalFlow predict pial surfaces from initial smooth meshes or from white matter surfaces? If it is the former, in my opinion, it is not a fair comparison. For a fair comparison, the same inputs must be used in training and testing for all approaches.

    • The paper described that traing time was longer for the U-net without attention since gradients were backpropagated thorugh all SVFs. Did the gradients also propagrated through all SVFs for CoTAN? How often was p(t,a) zero?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Main weakness is the inconsistence of method description. What the input and output of a network model are is the most basic understanding that readers should be able to grasp without understanding anything else, but due to inconsistency in the method description, I could not determine whether the input to the network is MRI image or a mesh. Also, due to lack of description, I assume outputs are surface points only, but how one derives surface normals and form a mesh.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Feedback is satisfactory but some concerns raised by reviewers could be better addressed. Here are a few remaining remarks. How could one determine the value M and R? If the work is inspired by squeeze-and-excitation network, please clearly mentioned that instead of “channel-wise attention” since channel-wise attention is vague and can be interpreted differently by different researchers. Are the estimated VF defined on a grid ranging -1 to 1? If so, was an affine matrix provided in NIfTI header files sufficient to warp the input mesh and map the output mesh back to the coordinate space of the image defined by DICOM image spatial information? If VF is not defined on a grid ranging -1 to 1, please detail the VF grid used by the model. The question of the fair comparison is on the input used for each model. Please provide answers to the questions raised by the reviewers.



Review #3

  • Please describe the contribution of the paper

    In this work, the authors propose a framework for diffeomorphic neonatal cortical surface reconstruction. CoTAN predicts multi-resolution stationary velocity fields from neonatal brain MRI. Instead of integrating multiple SVFs, CoTAN introduces attention mechanisms to learn a conditional time-varying velocity field by computing the weighted sum of all SVFs at each integration step. The importance of each SVF, which is estimated by learned attention maps, is conditioned on the age of the neonates and varies with the time step of integration.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel method with a cool idea of averaging the SVFs at different resolutions using an attention mechanism.
    • At par results with previous methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Did I miss it that the method wasn’t compared against PialNN or Topofit? Didn’t see in the table. Good to compare against them too.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Looks great!

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Did I miss it that the method wasn’t compared against PialNN or Topofit? Didn’t see in the table. Good to compare against them too.
    • Can discuss a bit more as to how the different SVFs add value? Might be a way to visualize the flow fields for each SVFs integrated separately and compared with the average one.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Novel method with a cool idea of averaging the SVFs at different resolutions using an attention mechanism.
    • At par results with previous methods.
  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper presents a deep-learning-based neonatal cortical surface reconstruction method that predicts multi-resolution stationary velocity fields to deform surfaces.

    The paper addresses a challenging task in neonatal MRI processing; however, the two reviewers have raised many concerns about the method description and experimental validation. Below I summarize my major concerns:

    1. Why did the author consider that the brain is symmetric and only applied their method to reconstruct surfaces of the left hemisphere? This assumption is wrong, as newborn’s brain also shows asymmetry.
    2. What are the inputs to the model? MRI scan, mesh, or both? There is inconsistency in this aspect. If T2w images are registered to the MNI152 space, then how is the Conte-69 surface atlas used for training? At the inference stage, what is the input to predict the pial surface?
    3. The pial surface results in Supplementary Fig. 3 do not show significant improvement of CoTAN over CortexODE and CFPP. Quantitative results should also be provided to get a fair idea of performance.
    4. Statistical evaluation is missing. It is hard to see significant improvement of CoTAN over CortexODE in terms of ASSD for white surface.
    5. Are there any topological defects in the predicted surfaces? The results in terms of Euler score are missing.
    6. The surface overlay plots for the compared methods should be added in Fig. 5 and Supplementary Fig. 1.




Author Feedback

We thank the AC and Reviewers for their constructive comments! We outline below how we will address the concerns.

  • Method [AC-2; R2-2,3] The inputs of CoTAN include both a T2w MRI and an initial mesh. The output is a deformed mesh. In detail, CoTAN predicts a velocity field (VF) from the input MRI. Then the vertices of the input mesh are displaced by integrating the ODE (Eq. 1) defined by the VF. [R2-4] The VF displacement is scaled by the step size h=0.02. Such integration deforms the input mesh to a new mesh as the output. We will update Fig. 3 and Sect. 2 to clearly describe these.

[AC-2; R2-3] For inference, we use the MRI and initial template surface as inputs of CoTAN to predict a WM surface. Then we use another CoTAN model to predict a pial surface from the input MRI and predicted WM surface.

[R2-5] For each scale R, CoTAN predicts M stationary VFs (SVFs) from the features after upsampling. M is the number of different SVFs learned at each scale R, so there are total R*M SVFs.

  • Attention [R2] As described in Sect. 2, CoTAN is inspired by “channel-wise attention” [18,27], which learns attention maps to weight different channels (or SVFs). This is different from self-attention in [8,31]. Instead of scaling and shifting (Ref [2-3] by R2), we learn a conditional probability distribution over SVFs.

[R2-8; R3-2] The attention map p(t,a) is visualized in Fig. 6, which shows that the SVFs are weighted and added to model coarse-to-fine deformations. [R2-1] This also verifies that CoTAN can capture multiscale features. [R3-2] We will discuss more and add a figure to compare the flow fields of the integrated CTVF and each SVF.

  • Settings [R2-4] For MRI, we use the NIfTI format, of which the header defines an affine matrix to map the surface to the image space. [AC-2] The Conte-69 surface atlas is inflated as the initial input mesh for training. The surfaces are also transformed to the MNI space. [R2-6] The 877 MRI scans were acquired from 785 subjects. The dataset is splitted by subjects. Each epoch has 526 iterations. [R2-5] The citations for loss terms have been provided [4,32,33].

  • Baselines [AC-3; R2-7] For fair comparison, we follow the original implementation for all baselines.

[R3-1] As TopoFit [17] and PialNN [23] cannot extract both WM and pial surface, instead we compared against Vox2Cortex [4] which has similar architecture.

  • Evaluation [AC-4; R2-7] We will report the t-test results in Table 1: CoTAN improves significantly (p<0.05) compared to all baselines, except the WM surface of CortexODE (p=0.07/0.19 for ASSD/HD).

[R1-1] As an ablation study, we set the integration steps K=5 such that the deformation is no longer diffeomorphic. Compared to the diffeomorphic case (K=50), the SIFs increase to 0.005/2.996% for WM/pial surfaces.

[AC-1] We agree with AC that the neonatal brain shows asymmetry. We have trained CoTAN models for left and right hemispheres respectively. The pre-trained models will be publicly released.

[AC-3,6; R1-3] We will update Fig. 5, Supp. Fig. 1 and 3 to include the qualitative comparison between CoTAN and all methods. The regions with geometric errors will be zoomed in and clearly described, e.g, CortexODE [22] produces non-smooth pial surfaces and CFPP [28] produces creases in the sulci of WM surfaces.

  • Runtime [R1-2] Table 2 reports the runtime for both model inference and postprocessing. CortexODE is slower since it uses topology correction (~1s) and Neural ODE, which requires one pass of model inference for every integration step. CoTAN only needs one forward pass (0.05s) and the integration takes 0.16s.

[R2-8] If attention is not used, the SVFs are integrated one-by-one and the first SVF will receive gradients of all SVFs (like a RNN). So the training takes much longer.

  • Topology [AC-5; R1-4] CoTAN can guarantee the spherical topology as the initial surface has genus-0 topology and the diffeomorphism is topology-preserving. The Euler number is 2 and we will report this.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have addressed all concerns.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes a surface reconstruction of neonatal cortices using an age-based conditional weighting of velocity fields to find a diffeomorphic surface deformation from a template. The reviews appreciate the difficulty of the task but had major concerns on chosen assumptions, general algorithmic clarity, statistical significance and validation of self-intersections.

    The rebuttal did reasonnably address most of the main concerns. It is believed that the final manuscript could include the necessary methodological clarification. A thorough validating analysis would be strongly beneficial to in extended publication. As is, the contribution proposes a useful surface reconstruction from a volume, even though concerns remains on validation.

    For all these reasons, and situating the work with respect to the other submissions, recommendation is towards Acceptance.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    There were a lot of comments raised by the authors as well as meta reviewers. Most of them are about details missing and writing. In my opinion a lot of rewriting and re-review is required to address all these concerns.



back to top