List of Papers By topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Xinzi He, Alan Q. Wang, Mert R. Sabuncu
Abstract
Head MRI pre-processing involves converting raw images to an intensity-normalized, skull-stripped brain in a standard coordinate space. In this paper, we propose an end-to-end weakly supervised learning approach, called Neural Pre-processing (NPP), for solving all three sub-tasks simultaneously via a neural network, trained on a large dataset without individual sub-task supervision. Because the overall objective is highly under-constrained, we explicitly disentangle geometric-preserving intensity mapping (skull-stripping and intensity normalization) and spatial transformation (spatial normalization). Quantitative results show that our model outperforms state-of-the-art methods which tackle only a single sub-task. Our ablation experiments demonstrate the importance of the architecture design we chose for NPP. Furthermore, NPP affords the user the flexibility to control each of these tasks at inference time.
The code and model are freely-available at https://github.com/Novestars/Neural-Pre-processing.
Link to paper
DOI: https://doi.org/10.1007/978-3-031-43993-3_25
SharedIt: https://rdcu.be/dnwNq
Link to the code repository
https://github.com/Novestars/Neural_Pre_Processing
Link to the dataset(s)
N/A
Reviews
Review #1
- Please describe the contribution of the paper
The novelty of this paper is the proposed neural network approach for raw brain MRI images pre-processing called NPP, which includes skull stripping, bias field correction and spatial transformation, allowing for simultaneous solving of all three sub-tasks via a neural network.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Novel approach: The proposed an end-to-end weakly supervised learning neural network approach for brain MRI pre-processing called NPP for simultaneous solving skull-stripping, intensity normalization and spatial transformation.
- Flexibility: NPP affords the user the flexibility to control each of these tasks at inference time, making it a versatile tool for brain MRI pre-processing.
- Large dataset: The neural network is trained on a large dataset, which enhances the generalizability of the model and makes it more robust to variations in data.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1.To get the affine transform in Figure 1, is it necessary to require an input atlas image? If so, please add the complete input image to the figure. 2.About affine part only compared with the traditional freesurfer method, need to add the relevant deep learning methods to demonstrate the State-of-the-art results in affine registration. [1]Mok T C W, Chung A. Affine medical image registration with coarse-to-fine vision transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 20835-20844. [2] Chen X, Meng Y, Zhao Y, et al. Learning unsupervised parameter-specific affine transformation for medical images registration[C]//Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part IV 24. Springer International Publishing, 2021: 24-34.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The authors used public datasets, provide their code to the reviewer and give their training parameters. Therefore their method is fully reproducible.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
There are also some minor errors, i.e. There are also some minor errors in the article, such as the inconsistent formatting of RecSSIM in Tables 3 and 4 on page 7
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
4
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper is practical for handling complex pre-processing of raw medical data with an end-to-end pipeline. But the comparison experiment is not done comprehensively enough, which need to supplement with more deep learning methods of skull-stripping, intensity normalization, and spatial normalization/registration.
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
5
- [Post rebuttal] Please justify your decision
The author’s reply solved my confusion and added relevant experiments to verify the results.
Review #2
- Please describe the contribution of the paper
The paper proposes a neural pre-processing pipeline that simultaneously performs skull-stripping, bias field correction, and affine registration. Experiments on large scale multiple datasets show that the proposed pipeline gives better results for all three tasks compared to methods which tackles only one of these tasks.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- A novel and necessary pipeline which tackels three different but general pre-processing steps through a single network.
- Experiments on a wide varity of datasets show the generalizability of the method.
- Ablation study about different parts of the proposed pipeline shows the necessity of each component.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- It is not clear what would be the range of the output scalar field (X). Is it in the range of 0-1? If so, how was it constrained?
- Following that, I am assuming that the skull stripping binary mask was generated based on this scalar field. It is not clear how the threshold was chosen, and what exact procedure was followed for the same.
- Authors are missing N4biasfieldcorrection [1] comparison for intensity normalization.
- In page-5, para-1, authors mention that they add an identity matrix to the output affine matrix. It is not clear, why this is necessary. An explanation about the same would be useful.
- Authors mention that one of the advantage of the proposed method is that it can compute multiplies field at low resolution and then upsample it with trilinear interpolation. How does this affect the skull stripping output? Also, in future, some type of ablation study about its effect on all output might be helpful.
- For evaliation of spatial normalization, the authors use PPIM dataset. The authors mentions that the RoIs for this dataset was generated using Freesurfer. However, in Fig.4(b) they compared their method against Freesurfer. It is not clear how the Freesurfer achieves less than 1.0 dice, if the evaluation RoIs were generated using Freesurfer. Any explanation for the same might be useful.
[1] Tustison, N.J., Avants, B.B., Cook, P.A., Zheng, Y., Egan, A., Yushkevich, P.A. and Gee, J.C., 2010. N4ITK: improved N3 bias correction. IEEE transactions on medical imaging, 29(6), pp.1310-1320.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The paper should be reproducible, as the authors mentions that they will make the code publicly available. The paper uses all publicly available datasets.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- There is inconsistency between Eq.1 and Fig.1 (Top). Authors may want to use similar notations in both of them.
- In Table.3 the authors show the effect of different lambda values. While, I agree that making this hyperparameter tunable during test time is indeed a good idea, the necessity of the same cannot be interpreted from the Table.3 alone. In future, the authors may want to show that same lambda value doesn’t give similar performance across dataset. Also, the authors may want to add the effect of lambda for skull-stripping and spatial normalization, in addition to bias field correction.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
6
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper tackles an important open problem of multiple neuroimaging pre-processing steps through a single deep network. Overall the paper is well-written. There are some aspect which needs clarity, but they can be easily addressed during the rebuttal. Some more ablation study might be necessary, but considering the page limit, they can be left for the journal version of the paper.
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
6
- [Post rebuttal] Please justify your decision
After reading the rebuttal, my further comments regarding the manuscript are following:
- I am happy with the response provided by the authors regarding freesurfer dice < 1.0, need to add identity matrix for affine registration, generation process of skull stripping mask,
- Authors mentioned that N4biasfield correction is part of freesurfer. However, whatever information I found on the internet (on freesurfer website - https://surfer.nmr.mgh.harvard.edu/fswiki/nu_correct) mentioned that it uses nu_correct for image intensity normalization and according to nu_correct website (https://www.nitrc.org/projects/nu_correct/), it uses N3 bias correction, rather than N4. I might be missing some information, but it would have been nice if the authors had provided a clear reference mentioning that freesurfer uses N4.
- Although, I missed in the original review, the authors didn’t compare against recent deep learning based bias field correction methods [1,2]. As there was no other deep learning based comparison in the paper for bias field correction, this comparison feel necessary.
[1] Goldfryd, T., Gordon, S. and Raviv, T.R., 2021, April. Deep semi-supervised bias field correction of Mr images. In 2021 IEEE 18th international symposium on biomedical imaging (ISBI) (pp. 1836-1840). [2] Xu, Y., Wang, Y., Hu, S. and Du, Y., 2022. Deep convolutional neural networks for bias field correction of brain magnetic resonance images. The Journal of Supercomputing, 78(16), pp.17943-17968.
I will keep my original rating.
Review #3
- Please describe the contribution of the paper
The paper proposed an interesting aspect of brain MRI preprocessing using deep neural networks. Skull-stripping, intensity normalization and spatial normalization takes time and also affects the downstream processing. The authors show that their method significantly reduces the preprocessing time while maintaining the high accuracy.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Pre-processing is a fundamental step for any raw brain MRI before applying any deep learning based analysis (Segmentation, registration etc). This paper proposed a novel deep learning based approach to preprocess the MR images. This learning based pre-processing is achieved in two steps. At the first step, the model learns the to find the foreground region for skull stripping and intensity normalization. In the next step, it learns the affine alignment with respect to the ground-truth preprocessed image. Therefore, the trained model can preprocess a brain MR image in very short time compared to other baselines.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
a) Training of this model requires a large set of preprocessed ground-truth images. Producing them is itself is a cumbursome task. Therefore, its not an easily reproducible research. b) The authors used FreeSurfer tool to produce ground-truth images. Using an specific tool for ground-truth generation can make the model biased towards that tool. A better strategy could have been devised. c) the scalar multiplier filed is not described well. The hadamard product removes the skull is clear but how the multiplied field normalizing the intensities is not very clear.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The reproducibility of this work is hard to achieve, specially the preprocessed dataset.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
a) The abstract should be written more clearly with specifics of performance improvements. b) Authors should state a short review of related works more precisely if there is any. c) In section 2.1, authors did not describe scalar multiplier field accurately. It is difficult to understand how this field is skull-stripping and intensity normalizing. d) The authors should also consider an experiment to show how their proposed preprocessing affacts the downstream tasks. A segmentation or deformable registration experiment could have been done to demonstrate the effectiveness of NPP compared to other baselines.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper proposed an interesting approach to preprocess MR images. Despite requiring a large dataset for ground-truth generation for training, once trained, the model can significantly reduce the pre-processing time with high accuracy. As mentioned above, how the scalar multiplier field is correcting the intensity should be stated clearly. Finally, the technical innovation in this paper is not sufficient and the training data generation is dependent on other tool which the major weakness of this work.
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #4
- Please describe the contribution of the paper
The authors propose a novel methodology for structural brain MRI pre-processing based on a novel deep learning architecture. Instead of splitting the pre-processing pipeline into multiple algorithms (like classical techniques do), the authors propose and end-to-end pipeline with different blocks of the network addressing different steps. Furthermore, instead of directly predicting corrected images, the network is used to estimate a joint multiplicative bias field and brain mask that is then applied to the input image. The results of each pre-processing step are evaluated with different datasets and compared to baseline approaches showing an improvement with the new method.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1) The paper is clearly written, with a strong motivation and rationale. While there are some concerns that I will list on the weakness section, the introduction was joy to read compared to some other papers on my stack. 2) The idea is quite simple but effective and elegant. In fact, while I believe structural image pre-processing might not benefit the most from a unified approach, the proposal could even have its uses for diffusion and functional MRI. 3) A measure of the time to run each algorithm is provided. While comparing CPU and GPU methods might be unfair, an estimate of the network inference for the CPU is provided in text, too. In both cases, the network proposal is faster to accomplish all tasks than any of the independent algorithms for a single task.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1) While the paper is in general well written and easy to understand, the structure is a bit incoherent. Key methodological details are given in results instead of methods. I would have expected the architecture details to be given earlier than when they appear for example and the baselines could be presented during methods, as well. Finally, there is a mention of “supplementary materials”, but I could only access the manuscript. I do not know if that is a website mistake or if the authors failed to provide them.
2) There are some questionable choices in terms of experimental design. Why are three datasets used? I assume the answer lies in the fact that there is no single dataset that is representative for all pre-processing step. Then again, technically only bias correction and skull stripping are evaluated with a direct measure (overlap with the “true brain mask”) and even then, the latter measure is biased towards the methodology used in the paper:
- According to the authors the brainmask for the NFSR dataset is based on FreeSurfer. FreeSurfer masks are also used to train the network.
- Bias correction focuses on the reconstructed images and the estimated bias field. This is the only proper direct metric compared directly to the HCP estimates. However, the results of FreeSurfer and FSL, while slightly smaller seem comparable to those of the proposed method. HCP is also a high quality dataset which might also downplay the improvement for clinical scans.
- Spatial normalisation is evaluated through a proxy, that once again benefits from the methodology used in the paper. Segmentations are compared to atlas labels which I suspect align with FreeSurfer segmentations. Since the network is trained with FreeSurfer corrected images, I expect the segmentations to align fairly well with the atlases as long as the network learns to mimic FreeSurfer. Furthermore, where are the segmentations coming from? This is a crucial aspect of the evaluation because it can introduce its own set of confounding issues (segmentation seems to not be directly related to the registration process).
3) I am extremely confused by the use of lambda on the manuscript. It seems to be both a hyperparametre of the learning process (a weight between the two losses of the network) and a sort of learned parameter of the network that affects the normalisation of the decoder. Is that right? If that is the case, I fail to see what is the rationale behind this. Why is the weight of the losses fed to the network? Why is an optimisation “hyperparametre” used to learn behaviour on the network? That seems counterintuitive and counterproductive. If the goal of lambda is to regulate what the network learns why is that information given to the network to adapt to it? Not to mention, how is that information meant to be used at inference time? The implications of such an idea are neither explored nor justified on the paper.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The datasets used for training and evaluation are all public. While it seems like some additional steps might be needed to obtain some of the masks and intermediate results used to replicate the exact same results; these are detailed on the paper, the authors used public tools to analyse the data and the code (according to the authors) is publicly available.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1) Starting with my first concern, I think the paper needs to be more tightly written and better structured. Anything that relates to the methodology such stay in the methods section. It would help frame the experimental setup and understand the steps the authors followed. There are multiple datasets involved in the process, used for different things and it all gets muddled because this information is scattered in multiple sections. Another example of this is how the network architecture details come up in Page 4. Why not have that at the beginning of section 2 to better link it to Fig. 1?
2a) There is a cognitive dissonance between arguing that classical technique are limited and divide the pre-processing pipeline into sub-tasks and then basically using the outputs of the FreeSurfer pipeline as “training labels”. My advice is to downplay the effect of splitting the pipeline into sub-tasks or slightly reframe the motivation. I would focus on the computational complexity of having multiple steps and the possibility of cascading errors for every “pipeline section”. Furthermore, I would allude to these cascading errors causing “noisy corrections” (linking it to training networks with noisy labels) and the capabilities of deep neural networks to “fix” some of these errors as exemplified by other proposals that use classical methods as their training. One example of this would be the FastSurfer paper. 2b) I would also advise the authors to clarify the motivations behind their choices on the experimental setup. Furthermore, I think it is important to acknowledge the limitations of some of the methodology. To be fair, this is a hard to evaluate topic as there is no easy way to estimate the true bias field of a single image (even from a hardware perspective) and there is no such things as a “perfect spatial normalisation”. Surrogate measures are needed and the results need to be contextualised with that in mind.
3) In my opinion, one of the weakest parts of the paper is the use of lambda as a hyperparametre and part of the information the network can learn. I might be missing something but I can really see a good justification for it. In that sense, I would advise the authors to either use citations that can back up that choice or provide a clear rationale for that choice. In other words:
- Why should a network learn about the optimisation process?
- Why should the decoder be the part of the network that learns that information?
- Why is adaptive normalisation the right choice way to use lambda?
- Why is lambda allowed to be negative according to the “Training details” section?
4) This is a minor concern, so I did not list it in he weaknesses section, but I am unconvinced with the downsampling of the scalar field that represents skull stripping and bias correction.
- Chi being smooth, does not mean that tri-linear interpolation is the right choice or that “linear resampling” would not lose information. Smoothness here only implies continuity but not a specific function. What if different regions of the images have different behaviours? What if the relationship is non-linear? Furthermore, what is the interpolation method used here to resample back?
- Why reduce it to half the size? Are there other possible choices? Is this just a time-reducing technique with not much of a trade-off?
5) The “ablation” section seems to be an afterthought. It is separated from the other experiments (why?), it has a small part that would probably be better suited to methods and it has a very small part on results which do not go too deep. I would suggest to either integrate that better with the rest of the results or get rid of it. As it stands, it barely helps the paper (it does not focus on all the steps) or provides any new insights. Finally, what does UMIRGPIT stand for? It is hard to evaluate what each part of the ablation could contribute to.
6) This is more of a technical doubt / pet peeve of mine. Why is the spatial transformer network based on self-attention? There is not true need for it as a multi-layer perceptron would accomplish the same thing. Not to mention, I always find it a bit odd to argue to use transformers to capture global information on a “spatially compressed” version of the image. A lot of the global information might be lost on the bottleneck, and once again, whatever is left can easily be processed by either a convolutional with a large receptive field (the whole bottleneck) or a linear layer.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
While I admit that I am strict with some of the shortcomings on the paper and I truly believe that some of the conclusions and results need to be taken with a grain of salt (unless the authors better justify some of the decisions), the idea of a unified network to pre-process structural brain MRI end-to-end serves as a major strength that slightly overweighs the issues. I would have probably been a lot more lenient if the paper focused on less established and trickier pre-processing pipelines such as diffusion MRI, though.
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
4
- [Post rebuttal] Please justify your decision
I found the rebuttal explanations dismissive which sadly made me reconsider my score in a negative manner.
R3+R4 “The use of FS for ground truth generation was based on its prevalence and robustness in neuroimaging. We acknowledge the potential bias and will discuss alternative strategies in revision. Yet, our intensity norm results which use an independent ground-truth indicate that NPP can outperform FS.” While I do understand and I like the fact that the bias will be addressed in the manuscript, that still downplays the motivation of the paper. If having a disjoint pipeline is an issue, having a network learn from that is limiting.
“R4 We used three datasets to demonstrate the generalizability of our method across varying data characteristics and with various ground truths.” My comments were not only regarding the lack of coherence, but also the fact that some metrics are at best proxies.
R4 “Lambda is not learned, nor is it a pre-fixed hyper-parameter. It is an input the model is conditioned on during training and test. For details, please refer to Huang et al. “Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization”.” That is probably the most dismissive answer. If lambda is given to the model and conditions it, how is that not a hyper-parameter? If lambda changes, doesn’t the prediction change, too? Not to mention, lambda is not presented on the given reference. The reference is for the adaptive instance normalisation layer. In no way does that address my concerns on how lambda is both used as a weight to the TV loss and the normalisation layers, how lambda is chosen or why lambda was sampled from a log-uniform distribution.
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
This paper proposes Neural Pre-processing (NPP), an end-to-end weakly supervised learning approach for skull-stripping, intensity normalization, and spatial transformation of head MRI images.
The strengths of the paper include the novel approach of tackling multiple sub-tasks simultaneously, the flexibility of NPP allowing user control at inference time, and the use of a large dataset for training. Additionally, the paper is well-written and provides clear results and analysis.
Considering the reviews, there are several key points the authors should address in their rebuttal:
Clarify the significance and advantages of NPP compared to existing methods, particularly addressing the concerns raised by Reviewer 1 about the lack of comparison with relevant deep learning approaches.
Provide more detailed explanations and justifications for the scalar multiplier field, the use of lambda, and the impact of downsampling on the results.
Address the concerns related to experimental design, such as the use of FreeSurfer for ground-truth generation, the choice of datasets, and the evaluation measures used.
Improve the structure and organization of the paper, ensuring that methodological details are appropriately placed in the Methods section.
By addressing these points in the rebuttal, the authors can provide a more comprehensive and convincing argument for the acceptance of their paper.
Author Feedback
R1 “Fig 1: is an atlas required?” No, our approach doesn’t need an atlas. The registration is a spatial normalization to a standard coordinate frame (MNI) used by FreeSurfer. R1 “Baseline results for deep learning based state-of-the-art affine registration” We have conducted experiments with [2] Chen et al. (C2FVIT) and will include these in the updated paper. Our method NPP achieves significantly better Dice scores. The average Dice score for C2FVIT across ROIs and test cases is 0.288, whereas for NPP it is 0.625. Full results will be included in updated Fig 5. We emphasize that C2FVIT is a pairwise registration model and has not been optimized for the spatial normalization task we consider. Moreover, as shown by Chen et al. in [2], classic optimization-based methods can still produce SOTA affine brain registration results, albeit at a computational cost. Finally, as we will remark in the updated paper, network architectures of SOTA registration models can be easily incorporated into NPP in the future. R2 “The range of output scalar field and the threshold to convert to binary mask” The scalar field, which solves both skull stripping and intensity normalization, is not range restricted by design. The appropriate values will be learned from the data. In practice, we found that thresholding it at 0.2 yields a good brain mask. We will include this detail in the updated paper. R2 “Missing N4biasfieldcorrection comparison” As we will clarify in the revised paper, FreeSurfer, which we compare against in the intensity normalization results, implements this method. R2 “Why add an identity matrix?” We do this to make sure the initial transformation is close to identity. It’s widely used in the affine registration literature to improve convergence and efficiency. R2 “The effect of computing multiple fields at low resolution” Our hardware constraints did not allow us to conduct a full-scale experiment with the full-resolution version of our model. However, as we point out in the paper, our design choice of a fixed low-res convolutional architecture that only computes a spatially smooth multiplier field, has a major computational advantage and allows NPP to handle arbitrary image resolutions for the input image, while producing competitive results. We will conduct an ablation study in the journal extension of this work. R2+R3 “Why are FS dice scores <1?” These results are quantifying registration between ROIs in an individual MRI and the atlas ROI. We will clarify R3 “how to make the research reproducible” We will make all code and model weights freely available in our Github repository. We will also allow anyone to generate FS-derived ground-truth images from the public datasets by distributing ground-truth affine transformation parameters and multiplier fields. R3+R4 “Use of FreeSurfer to produce ground-truth may have introduced bias” The use of FS for ground truth generation was based on its prevalence and robustness in neuroimaging. We acknowledge the potential bias and will discuss alternative strategies in revision. Yet, our intensity norm results which use an independent ground-truth indicate that NPP can outperform FS. R4 “The paper’s structure is not coherent, with key methodological details provided in the results section” Thanks for feedback. We will revise to improve flow. R4 “Questionable choices in terms of experimental design, particularly the use of three datasets” We used three datasets to demonstrate the generalizability of our method across varying data characteristics and with various ground truths. R4 “Lambda: hyperparameter and/or a learned parameter of the network?” Lambda is not learned, nor is it a pre-fixed hyper-parameter. It is an input the model is conditioned on during training and test. For details, please refer to Huang et al. “Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization”.
Post-rebuttal Meta-Reviews
Meta-review # 1 (Primary)
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
Although the rebuttal answered some questions it also raised new ones for some reviewers, e.g., there is still some confusion about lambda.
Meta-review #2
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The authors have adequately addressed the comments regarding the baseline results, experimental settings, missing details, etc. They have noted that the reviewers’ comments will be incorporated into the final version of the paper. Hence, I suggest accepting this paper for MICCAI.
Meta-review #3
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
Although the rebuttal provided by the authors addressed part of the concerns, they missed some important questions (e.g., why a hypernetwork is needed). After the discussion among reviewers, they agree that “ their rebuttal has raised more issues than solved them”, therefore, I recommend reject.