Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Junshen Xu, Daniel Moyer, P. Ellen Grant, Polina Golland, Juan Eugenio Iglesias, Elfar Adalsteinsson

Abstract

Volumetric reconstruction of fetal brains from multiple stacks of MR slices, acquired in the presence of almost unpredictable and often severe subject motion, is a challenging task that is highly sensitive to the initialization of slice-to-volume transformations. We propose a novel slice-to-volume registration method using Transformers trained on synthetically transformed data, which model multiple stacks of MR slices as a sequence. With the attention mechanism, our model automatically detects the relevance between slices and predicts the transformation of one slice using information from other slices. We also estimate the underlying 3D volume to assist slice-to-volume registration and update the volume and transformations alternately to improve accuracy. Results on synthetic data show that our method achieves lower registration error and better reconstruction quality compared with existing state-of-the-art methods. Experiments with real-world MRI data are also performed to demonstrate the ability of the proposed model to improve the quality of 3D reconstruction under severe fetal motion.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16446-0_1

SharedIt: https://rdcu.be/cVRSC

Link to the code repository

https://github.com/daviddmc/SVoRT

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors proposed a novel slice to volume registration method using transformers in the context of fetal brain MRI reconstruction from multiple-stacks. The proposed framework not only provides the slide to volume estimation but also an estimation of the 3D volume as to assist the motion estimation process. Results are performed on synthetic data and compared to two other slice-to -volume approaches. Qualitative results on two real acquisitions are also presented.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • To my knowledge, transformers have never been used before in the context of slice-to-volume registration, as such the proposed method is novel
    • Comparison with existing techniques
    • Provided results illustrate overperformance on synthetic data as regards SOA
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Quantitative validation is limited to 12 subjects only, though x4 are generated from those
    • Overclaims on the impact of the proposed method to final 3D reconstruction methods in practice (authors limit up to 3 stacks only in this paper)
    • Lack of details on SOTA parameters or how SVoRT parameters are set (lambda)
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Authors state on the reproducibility statement that code will be made available, nothing is mentioned to this sense in the paper. Still the different generated/simulated motion levels on the FeTA dataset should be made available also as to ensure reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The authors present a novel and interesting approach using transformers for the slice to volume motion estimation in the reconstruction of fetal brain MRI. I’ve found the paper interesting but many aspects appear as preliminary and I have major concerns with the choice of the experiments. My general comment is that as a work of slice-to-volume registration it would have been valuable to illustrate experiments according to the level of motion. This is lacking to me as to understand the value of the simulated images.

    It is unclear to this reviewer wether this method aims at provide a 3D reconstructed image or only an initialization as for other classical inverse problem reconstruction. Formulation in equation 1 solves the data term inverse problem, but it is unclear to this reviwer why no regularization is included then. If the final goal is really to provide also that SR image, classical inverse problem could have been used as for SOTA methods also. But if the goal is to use this approach as to further initiaiza a more classical SVR it would have been good to illustrate then how the different initializations (SVoRT, Planet, SVRnet) influence a classical SVR recon (inverse problem+regularization). So, how SVoRT potentially improves SVR is not proven.

    It is unclear to this reviewer the experimental setup. Authors mention FeTA dataset, and then registration to a brain atlas and resampling to 0.8 mm isotropic. Which is the rationale behind this step? Was this done by FeTA or this is something needed/specific for this study?

    Authors mention simulation of 3 stacks in random orientation, do they mean orthogonal? Or really random? Often in a real fetal acquisitions orthogonal views are generated (sometimes not perfectly). Please clarify the definition of random views.

    From the 12 left testing cases (which GA? Were they normal or pathological?) it says 4 different samples were generated for each. What does it means? 4 different levels of motion? This is a crucial point to understand the indudec motion for generating testing examples. Would have been interesting to see results vs level of motion for instance.

    It seems then in the experimental setup that yes a SVR is used further for real cases. Which SVR method is applied with which regularization technique?

    This reviwers wonder also the parameter setting of the SOTA methods, if any, how this was choosen? Do they also have outlier rejection scheme?

    Would the authors compare with manual initialization or classical motion estimation methods slice to volume for comparison purposes? At least for the two illustrated real cases. Certainly those methods might be more time consuming but would be interesting to illustrate the added value.

    I am not sure I understand what does it means the study with one stack only. There is then no super-resolution in that case. In practice, due to the in-plane through plane resolution differences, multiple stacks are acquired for fetal MRI. I would have found the assessment more meaningful if starting from 3 stack up to 6 for instance.

    Is real data at 3T? A gap/slice thickness of 2mm is the smallest one often seen in fetal acquisition that may go up to 4 mm often (also depending on the field strength and in plane resolution).

    Did the experiments with real data with SVRonly use some classical slice to volume or multi-scale slice registration as often used?

    Could the authors evocate hypothesis on why SVROnly seems even to work better that with the two SOTA initialization? I’ve found this weird overall. Would have been interesting to see the type of low-resolution stacks acquired as to illustrate the level of motion.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite the clear novelty and scientific value, i’ve found the evaluation of this paper unclear and weak. Overall it jeopardises my evaluation of the real added value of the proposed strategy.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a Slice-to-Volume Registration Transformer (SVoRT) to map multiple stacks of fetal MR slices into a canonical 3D space and to further initialize slice to volume registration and 3D reconstruction. 1) construct a Transformer-based network that models multiple stacks of slices acquired in one scan as a sequence of images and predicts rigid transformations of all the slices simultaneously by sharing information across the slices. 2) The model also estimates the underlying 3D volume to provide context for localizing slices in 3D space. 3) In the proposed model, slice transformations are updated in an iterative manner to progressively improve accuracy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work produces a novel solution to a challenging clinical problem. Deeplearning methods [11,23] have been proposed to predict transformation, which process each slice independently, ignoring the dependencies between slices. This work process the stacks of slices as a sequence, SVoRT registers each slice by utilizing context from other slices, resulting in lower registration error and better reconstruction quality. Instead of predicting the transformations alone, this work also estimate a volume from the input slices, so the estimated volume provide 3D context to improve the accuracy of transformation. Specially, during volume estimation, this paper consider some wrong slices, resulting in artifacts in the reconstructed volume. They proposed addition SVT to predict weight of slice, where represent the image quality of the slice.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Singh A, Salehi S S M, Gholipour A. Deep predictive motion tracking in magnetic resonance imaging: application to fetal imaging[J]. IEEE Transactions on Medical Imaging, 2020, 39(11): 3523-3534. The related work is missed. This paper also predicted motion parameters from sequences of slices based on RNN. It also utilize spatio-temporal information.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reference implementation of SVoRT will be available on github.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This work produces a novel solution to a challenging clinical problem. The SVoRT construct architecture to predict transformation and volume simultaneously. The estimated volume as auxiliary task provide 3D context to improve the accuracy of predicted transformation. This work utilize transformer to encode spatial correlation of the input sequence. Specially, during volume estimation, this paper consider some wrong slices, resulting in artifacts in the reconstructed volume. They proposed addition SVT to predict weight of slice, where represent the image quality of the slice. I am curious about the necessity of transformer module in this framework. What are the benefits of transformer compared to other RNN (lstm)? Can the estimated volume of the network output be used as a preliminary reconstruction result? How about quality of the estimated volume? If the quality of estimated volume is good, the estimated volume as a reference volume for reconstruction.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well-written and proposes a solution to a challenging clinical problem. The paper propose a new formulation to model slice-to-volume registration. They demonstrate significant improvements over the state-of-the-art, particularly in accuracy of transformation. Furthermore, they conduct an ablation study and evaluate the significance of their volume estimation and positional embedding, justifying the reasons for their model design. Their qualitative figures demonstrate the improvements brought by their network, and this work represents an important contribution in fetal brain image reconstruction.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors propose a novel method for fetal brain slice-to-volume registration where they use a synthetic dataset to train a neural network based on the new transformers architecture along with an inverse problem formulation to reconstruct the final volume. They also applied their method to real clinical dataset and the result were visually more accurate. Comparisons were performed with respect to two baselines.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper elegantly merges strengths of CNNs through ResNet by extracting meaningful features from original stacks, transformers strengths to know where to attend to the most and a weighted inverse problem formulation to construct the final volume using the transformations and the weights learned.
    • They apply the method to two datasets
    • They performed ablation experiments to support some components of their model
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • No baseline comparison to fetal SVR non DL methods
    • Some choices are not backed up by explanations or experiments (please refer to point 8)
    • It would have been great to see more real world examples such as in supplementary materials
    • Minor comments (point 8)
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducible for the publicly available FETA dataset. The clinical dataset is not available as the authors answered not applicable to « A link to a downloadable version of the dataset (if public) », the code is/will be however put in Github.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Unclear why using both y_hat and y (concatenated) and not only y_hat as the input of ResNet, please provide an explanation (or experiment) on the why

    • Please mention dataset type (synthetic v.s. real) in Figure 3
    • Minor comments: sec 2.3. to b*ridge, sec 2.3: « Previous works » have demonstrated, please cite, sec3.1: « Learning rate of 2 x 10 4 and linear decay for 2x 105 iterations » unclear please give decay rate for how many iterations and please mention total number of iterations(or epochs)
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The innovative merger of three paradigms (CNNs, Transformers and inverse problems) to solve SVR + the realistic look of the real dataset made my decision clear (although I would have preferred more than two examples on the real dataset, either as textual results or as images)

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors present techniques based Transformers to correct motion in fetal MRI. The paper is well written and design decisions are mostly well justified. Two junior reviewers rate the paper as a strong accept and appreciate the presented ideas as interesting and declare the paper a “a novel solution to a challenging clinical problem.”. One senior reviewer is more critical but also states “a novel and interesting approach using transformers”. The authors must address several issues before final acceptance:

    • provide source code as declared in the reproducibility statement
    • clarify choices regarding data term and regularization
    • clarify experimental choices regarding preprocessing
    • clarify image acquisition protocol
    • add missing references
    • clarify results regarding SVROnly and SOTA
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1




Author Feedback

First of all, we appreciate reviewers’ constructive feedback and their acknowledgment of this work’s novelty and high quality. Here, we would like to respond to the major concerns raised by reviewers.

  1. How do these deep learning based methods influence classical SVR The proposed method aims to provide an initial estimation of the slice transformations for the classical SVR algorithm. We demonstrated that the proposed method can improve the SVR method in the experiments. The results in Fig. 4 and 5 show the reconstruction result of the SVR algorithm proposed in [14] given the initializations of different methods.

  2. Volume Estimation In the proposed method, the volume estimation step is only used to help the transformation estimation and not used to initialize the volume in SVR for a fair comparison. We use a small number of iterations in the CG solver and use PSF reconstruction as initialization to reduce the computational cost, which implicitly makes the reconstruction smooth. Therefore, we do not employ explicit regularization in the volume estimation step.

  3. Preprocessing of the FeTA dataset Both the proposed method and the baselines aim to the position of a slice in the canonical 3D space, i.e., atlas space. Therefore, we need to register the brain volume to the atlas before extracting slices from the volume so that we know the ground-truth position in the atlas space.

  4. Fetal motion for the simulated data The details of the fetal motion trajectories used in experiments and statistics are described in Sec. 2.3 of the supplementary material.

  5. The acquisition protocol of real MR data The fetal MR data were acquired on a 3 Tesla Siemens Skyra system (Siemens Healthcare, Erlangen, Germany) at Boston Children’s Hospital using the 30- channel body flex array in combination with the spine array (total of ~42– 48 receive elements used). The HASTE readouts had the following imaging parameters: TE = 119 ms, slice thickness = 2 mm, in-plane resolution = 1 mm × 1 mm, matrix size = 256 × 256, TE = 119 ms, TR = 1.6 s, echo spacing = 5.81 ms, partial Fourier = 5/8, and in-plane GRAPPA acceleration of RGRAPPA= 2.

  6. Results on real MR data Since the real MR data were acquired with a different scanner different from the FeTA dataset. The real MR data were different from the simulated data in contrast and artifacts, and there is a domain shift between the two datasets. The results show that the baselines might have worse results compared to SVRonly, indicating that the baselines have worse generalizability compared to SVoRT. For future works, we plan to evaluate the proposed method on real fetal MR data with a wider range of acquisition parameters.

  7. Hyperparameters For hyperparameter tuning of the baselines and the SVR algorithm, we use the default values in the original papers.

  8. Related Works More related works will be added to the introduction.

Again, we thank all reviewers and our meta-reviewer for their efforts and time.



back to top