Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Tianyi Zeng, Jiazhen Zhang, Enette Revilla, Eléonore V. Lieffrig, Xi Fang, Yihuan Lu, John A. Onofrey

Abstract

Head movement is a major limitation in brain positron emission tomography (PET) imaging, which results in image artifacts and quantification errors. Head motion correction plays a critical role in quantitative image analysis and diagnosis of nervous system diseases. However, to date, there is no approach that can track head motion continuously without using an external device. Here, we develop a deep learning-based algorithm to predict rigid motion for brain PET by leveraging existing dynamic PET scans with gold-standard motion measurements from external Polaris Vicra tracking. We propose a novel Deep Learning for Head Motion Correction (DL-HMC) methodology that consists of three components: (i) PET input data encoder layers; (ii) regression layers to estimate the six rigid motion transformation parameters; and (iii) feature-wise transformation (FWT) layers to condition the network to tracer time-activity. The input of DL-HMC is sampled pairs of one-second 3D cloud representations of the PET data and the output is the prediction of six rigid transformation motion parameters. We trained this network in a supervised manner using the Vicra motion tracking information as gold-standard. We quantitatively evaluate DL-HMC by comparing to gold-standard Vicra measurements and qualitatively evaluate the reconstructed images as well as perform region of interest standard uptake value (SUV) measurements. An algorithm ablation study was performed to determine the contributions of each of our DL-HMC design choices to network performance. Our results demonstrate accurate motion prediction performance for brain PET using a data-driven registration approach without external motion tracking hardware. All code is publicly available on GitHub: https://github.com/OnofreyLab/dl-hmc_miccai2022.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_19

SharedIt: https://rdcu.be/cVRvK

Link to the code repository

https://github.com/OnofreyLab/dl-hmc_miccai2022

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

In this paper, the authors proposed a new deep learning-based method to correct the head motion and diminish the artifacts and quantification errors in PET imaging. This is supposed to be a good start on algorithm-based motion correction in PET imaging. However, the algorithms can’t be perfect. The experimental results also confirmed this. I was wondering if the inaccurate motion correction is acceptable in clinical use? The results showed that the predicted correction sometimes even caused higher errors.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

In this paper, the authors proposed a new deep learning-based method to correct the head motion and diminish the artifacts and quantification errors in PET imaging. This is supposed to be a good start on algorithm-based motion correction in PET imaging.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Language problems need to be checked, many long sentences are not clearly expressed and are confusing.

The experiments section needs more clarifications:

The subject group contains patients with normal cognition, cocaine dependence, and cognitive diseases. I would like to know if any strategy was used in splitting the training and test sets? For example, keeping a diversity of patient types in each subset.

In the ablation study (Table 1), the test MSE value for “more data, FWT, Deep Encoder and Normal sampling”, which is located in the second row, has a much higher value than the others. Can the authors explain this?

The results In fig. 2(a) are for subject 1. Also, there is another subject 1 in Table S1. I assume they are different subjects since one is for single-subject experiments and one is for multi-subject experiments. But referring to them the same name still causes confusion.

In section 3.2, the authors claimed that subject 2 has a mean MSE of 0.02. But Table S1 shows the mean MSE of subject 2 is 1.114. They are contradictory. I assume they are also different objects as I mentioned above. The confusions need to be addressed.

“The results for Subject 2 (mean MSE 0.02) show that the network is capable of accurately predicting motion from training subjects even though the motion relative to the reference frame was never used for training.” Does this mean that in the experiment in figure 2, the moving images of subject 2 have been resampled and different from the images that are used during training? The detail of the experiment settings needs to be clarified.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Language problems need to be checked, many long sentences are not clearly expressed and are confusing.

The experiments section needs more clarifications:

The subject group contains patients with normal cognition, cocaine dependence, and cognitive diseases. I would like to know if any strategy was used in splitting the training and test sets? For example, keeping a diversity of patient types in each subset.

In the ablation study (Table 1), the test MSE value for “more data, FWT, Deep Encoder and Normal sampling”, which is located in the second row, has a much higher value than the others. Can the authors explain this?

The results In fig. 2(a) are for subject 1. Also, there is another subject 1 in Table S1. I assume they are different subjects since one is for single-subject experiments and one is for multi-subject experiments. But referring to them the same name still causes confusion.

In section 3.2, the authors claimed that subject 2 has a mean MSE of 0.02. But Table S1 shows the mean MSE of subject 2 is 1.114. They are contradictory. I assume they are also different objects as I mentioned above. The confusions need to be addressed.

“The results for Subject 2 (mean MSE 0.02) show that the network is capable of accurately predicting motion from training subjects even though the motion relative to the reference frame was never used for training.” Does this mean that in the experiment in figure 2, the moving images of subject 2 have been resampled and different from the images that are used during training? The detail of the experiment settings needs to be clarified.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Language problems need to be checked, many long sentences are not clearly expressed and are confusing.

The experiments section needs more clarifications:

The subject group contains patients with normal cognition, cocaine dependence, and cognitive diseases. I would like to know if any strategy was used in splitting the training and test sets? For example, keeping a diversity of patient types in each subset.

In the ablation study (Table 1), the test MSE value for “more data, FWT, Deep Encoder and Normal sampling”, which is located in the second row, has a much higher value than the others. Can the authors explain this?

The results In fig. 2(a) are for subject 1. Also, there is another subject 1 in Table S1. I assume they are different subjects since one is for single-subject experiments and one is for multi-subject experiments. But referring to them the same name still causes confusion.

In section 3.2, the authors claimed that subject 2 has a mean MSE of 0.02. But Table S1 shows the mean MSE of subject 2 is 1.114. They are contradictory. I assume they are also different objects as I mentioned above. The confusions need to be addressed.

“The results for Subject 2 (mean MSE 0.02) show that the network is capable of accurately predicting motion from training subjects even though the motion relative to the reference frame was never used for training.” Does this mean that in the experiment in figure 2, the moving images of subject 2 have been resampled and different from the images that are used during training? The detail of the experiment settings needs to be clarified.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

4
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Language problems need to be checked, many long sentences are not clearly expressed and are confusing.

The experiments section needs more clarifications:

The subject group contains patients with normal cognition, cocaine dependence, and cognitive diseases. I would like to know if any strategy was used in splitting the training and test sets? For example, keeping a diversity of patient types in each subset.

In the ablation study (Table 1), the test MSE value for “more data, FWT, Deep Encoder and Normal sampling”, which is located in the second row, has a much higher value than the others. Can the authors explain this?

The results In fig. 2(a) are for subject 1. Also, there is another subject 1 in Table S1. I assume they are different subjects since one is for single-subject experiments and one is for multi-subject experiments. But referring to them the same name still causes confusion.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

3
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Not Answered
[Post rebuttal] Please justify your decision

Not Answered

Review #2

Please describe the contribution of the paper

This work attempts to do a very challenging task of learning 3D rigid head motion from highly noisy data. A motion tracking system (Vicra) is used as the learning target. To achieve this, the authors designed a neural network based model which uses encoders on two input images at different time points, and a FWT unit that operates on the difference of these two images.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper aims to address a very challenging task, i.e. to estimate motion on a second-by-second basis from noisy low resolution PET data. I have not found previous work which this paper makes incremental extension from.

The transparency of including all kinds of results, like in Fig2 is valued.

The authors discussed the limitations of the current results thoroughly.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The definition of the “3D cloud representations of the PET data” is not very clear. From the paper it looks like a back projection image, but why is it called 3D cloud data? Does it have anything to do with point clouds?
2. Also not clear if the 3D cloud images are attenuation corrected? For head motion, the non attenuation corrected data can be beneficial because there are quite strong signals near the skull.
3. It is not clear to me why the outputs of the encoders and FWT were multiplied, why use this way to combine the two sets of features?
4. The results don’t seem very convincing for demonstrating the performance of motion estimation using the proposed method, as shown in Fig2. It is indeed a very challenging task, and it will be very helpful to know what the DL-HMC network is doing. The ablation analysis included is appreciated.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors have given enough information to reproduce the work.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

I wonder if the authors would like to explore the low number of degrees of freedom in the rigid body motion problem. The proposed network uses a lot of parameters and dense connections, which are useful for dense prediction problems. But the 3D rigid motion has 6 parameters after all. For example, the 3 translation parameters can be estimated from the difference in the centre of mass, if the centre of mass can be properly estimated by a neural network.

Also I wonder if the performance of the proposed model can be easily improved by doing like one MLEM step to replace the 3D cloud images in Fig1 as the input. As far as I remember, the first few iterations of EM give more or less what looks like the brain without the details. This is of course at the cost of time in preparing the input data, but it may be a worth trade-off.

Typo:

Page 3: ‘algorithn’
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This work aims to learn the rigid head motion second-by-second from dynamic PET image data, which is very challenging given the low quality image input. The authors demonstrated the results with transparency. There could be a few things the authors could try to improve the performance, such as finding computationally efficient ways to improve the input image quality, or exploring ways to reduce the data complexity for the low number of parameters in this problem.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Not Answered
[Post rebuttal] Please justify your decision

Not Answered

Review #3

Please describe the contribution of the paper

The paper introduces a deep learning approach to real-time motion tracking based on PET data. The motion tracking results are validated in a single-subject and multi subject experiment and against an external tracking device, the Polaris Vicra camera. This aproach to motion measurements is very interesting since it is a big problem in PET acquisitions due to their much longer acquisition times than e.g. CT or MRI.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The approach is methodologically interesting in 2 ways 1) Instead of using the PET image sin image space and trying to register them, the 1-second frames are back projected along the LOR and used to construct a point cloud. This seems to make it much easier to register, since a regular 1 second PET image, not even FDG, would have enough contrast to else register it. 2) The motion tracking is properly evaluated against an external tracking device, the Polaris Vicra. This is very nice and the proper way to do it instea dof only assessing motion improvements visually or image based as is often the case. Furthermore, the paper addresses an important clinical issue, motion degradation due to motion in PET acquisitions and provides a solution that is feasible to implement without acquiring an external tracking system.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The clearest drawback is that the experimental tests are not so extensive, but considering the novelty of the approach, I would say that is acceptable. There are some minor inconsistencies, e.g. wrt acquisition time, in the methods which I will go into in detail below.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

So there is no code that is shared, at least I cannot see a link to the GitHub repo. The MOLAR reconstriction is accessible in principle, but is not easy to get to run, not even if one also has an HRRT scanner. Hence, the reconstruction of images is probably hard to redo. The data can probably not be shared, since it is clinical data.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Abstract:

The authors state “However, to date, there is no approach that can track head motion continuously without using an external device.” Well this depends on how you define continuously and if you only speak for PET. In MRI, 3D Navigators can be placed in a regular MPRage sequence every TR, so giving you a head position approx. every 2 sec, hence you should rephrase this

You use the Polaris Vicra, but throuhgout the paper there is no comment regarding if the Polaris has been validated wrt attenuation issues in the field of view of the detectors, so to speak are you acquiring worse image swiht the tracker on?

You explain already in the abstract that “the output [of your DL method] is the prediction of six rigid transformation motion parameters.”. Also later on this is stated, but in whcih coordinate system are these motion parameters reported? In scanner coordinates or in Polaris position coordinates? I am assuming in scanner coordinates, but that information is completely missing.

Intorduction:

I really like the introduction and the emphasis on the clinical application.

You state “ average head motion can vary from 7 mm [1] in clinical scans to triple this amount for longer research scans.”. As someone who regularly works with 90 or 120 minute PET research scans, I find especially the seond statement hard to believe. I would tone that down.

Regarding the usage of tracking devices the drawbacks you mention are “HMT is not generally accepted in clinical use, since it usually requires attaching a tracking device to the patient and additional setup time.” Well, by now there are also several mrtkerless tracking solutions on the market even MR compatible ones and setup times are not an issue anymore. So I would rather put the emphasis on extra cost of acquisition or on the capabilities of actually running event-based reconstruction methods based on high frequency external tracking, especially in the HRRT setup that due to its geometry restricts which reconstruction algorithms can be used.

I also disagree wiht the statement “Other systems like markerless motion tracking [11] are still under development and have not been validated for PET use”. That’s simply not true anymore, see e.g. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0215524

Methods:

Section 2.1 I have a comment on the choice of tracer. While FDG is by far one of the most clinically used tracer, it is also rather stable and one cannot compare kinetics to e.g. C11-based tracers for dopamin or serotonin. So while your population is interesting and diverse, their FDG images will still look very similar, I guess.

I ma a bit confused about the measure of motion. You state “The average intra-frame motion of eight points forming the vertices of a cube [7] was used to summarize (mean±SD) the overall motion of the brain throughout the entire scan to be 12.07±7.12 mm.”. Could you add an illustration of this? Is the cube located in the center of the head or center fo the field of view, I am a bit confused about this. Also an average of 12 mm over the acquisition across 25 subjects seems to be really large to me.

I think there is a typo in the description of the acquisition. You write “All PET imaging data is 30 minutes acquired 60 minutes post injection.” This I would understand as you having a 30 minute acquistion, aka 1800 1-second frames. But below you keep using 3600 frames. So I am assuming that you switched the two numbers around and that it should say “data is acquired for 60 minutes 30 minutes post injection”.

Also what is your regular framing for the 60 minute scan? 5 or 10 minute frames?

Section 2.2 You state “The encoders effectively reduce the 3D image data volumes down to a vector of size 128.”. That reduction seems really extreme considering the 6 degress of freedom you are trying to measure. Did you try with a larger input, e.g. by using a different network for pre-training?

Can you confirm that t_ref is the last second/1-second image in the whole scan?

As stated already in the intro, how was the HMT tracking data transformed into the 3D cloud, aka PET scanner frame of reference coordinate system? Is that cross-calibration provided by the HMT system, so all tracking is already provided relative to scanner coordinates or was this transform performed by you?

Section 2.3 This section confused me even more regarding to what the reference time is, is it now always the last timepoint or does it vary?

“we calculate the relative motion transformation matrix from the Vicra data” I guess, here I would like more details and maybe a reference to understand it

Section 2.4 Here you introduce t_ref = 3600 which does not match with an image acquisition of 30 minutes

Results I guess what you call theta is often referred to as RMS in the MRI motion correction community, see e.g. https://onlinelibrary.wiley.com/doi/full/10.1002/mrm.27705, but that is calculated wrt a point of reference, e.g. a point cloud center when using a markerless tracker or center of the marker when using a marker-based tracker. What is your point of reference?

Just as a side note, when you are the only PET site in the world with an HRRT and using the MOLAR reconstruction, then anonymization kind of goes out the window if the reviewer has decent PET expertise.

Regarding FreeSurfer by now you probably want to replace reference 4 by Fischl B. FreeSurfer. Neuroimage. 2012 Aug 15;62(2):774-81.

Section 3.1 Your results on motion prediction where you state that “accurate motion prediction performance error to be 0.035±0.073 (mean±SD of corresponding dataset).” is quite impressive. But what was the overlal range of motion for that single subject? Can you add that?

Section 3.2 Figure 2, these plots are way too small, try and make axes descriptions only once and enlarge the plots. Also please comment in the caption on my question in which coordinate system this is shown.

Again when you state results for the other subjects such as “ Mean MSE for Subject 3 is 0.74 and for the failure case in Subject 4 is 6.33.” Please also give the overall range of motion for these subjects over the scan as reference.

Discussion

You write “While our initial model results indicate capabilities of predicting motion of magnitude ∼1mm, our current pre-processing reduces input image resolution to ∼10mm3, which may limit the model’s ability to detect motion with smaller magnitudes.” But as a common cutoff for motion when deciding whether to do motion correction of the PET data is 2-3 mm, this is not really useful in practice. can you comment on that?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper covers a very important clinicla issue and comes up with a nice methodologicla solution using state of the art computational approaches both wrt the learning framework as well as in general wrt the reconstruction framework and is teste don the state-of-the-art scanner, an HRRT.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

7
[Post rebuttal] Please justify your decision

I am happy with the way the authors addressed my points. I rally like the novel idea and that it addresses such an important problem, hence I find it important to showcase this to the community.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
Contribution

A motion tracking system is used as the learning target and the authors designed a neural network based model that uses encoders on two input images at different time points, and a FWT unit that operates on the difference of these two images.
- Estimating 3D rigid-body head motion from noisy data is a challenging task.
- Transparent presentation of results, both good and bad, with good discussion of limitations.
- Used ground truth data from an external tracking device.
Weaknesses to address
- Needs clearer explanation of the “cloud representation” and a bit more about attenuation correction, choice of reference image, etc.
- The multiplication of the encoder outputs with the “feature-wise transformation” probably needs to be properly motivated.
- Experiments were not extensive and results are not especially convincing.
- Reviewers provided a good list of more minor issues etc that should be fixed.
- Clarify whether code would become available on acceptance.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

5

Author Feedback

We thank the reviewers for their careful evaluation of our work and constructive suggestions.

Cloud data (R2) and attenuation correction (AC) (R2, R3) “Cloud” refers to the 3D volumetric distribution of the PET tracer within a 1 second time interval. We create this distribution by back-projecting the entire tracer line-of-response (LOR) and record every voxel that intersects the LOR in the 3D image space. Each intersected voxel is sensitivity-corrected using a pre-computed scanner sensitivity map. We do not utilize AC due to the spatial mismatch introduced by head motion between the AC map (from the beginning scan) and data from all later time points. To apply AC, we would need to co-register the AC map to the 3D cloud data at all other time points, but conventional registration is likely to fail using this noisy data. We address this registration problem using deep learning (DL) to predict motion directly from the low-resolution cloud data, which is fast to generate and requires minimal pre-processing. We hypothesize that MLEM reconstruction would have limited benefits. MLEM suffers from high background noise due limited counts in 1 second data and the added computation cost would take away from one of our approach’s key benefits: real-time motion prediction once the DL model is trained.

Feature-wise transformation (FWT) (R2) The FWT modifies the outputs of the two input image encoders. Here, multiplication scales the encoded features elementwise as a function of time. This approach conditions the encoded features to time-dependent PET tracer decay, which allows the model to learn how the dynamic changes of the tracer distribution effect motion prediction.

Extensiveness of experiments (R3) Our results show feasibility for supervised DL motion correction. Long training times prohibited extensive testing on multi-subject data. Single-subject studies were used to guide design decisions and were critical to assesses feasibility. Single-subject results are consistent across different subjects. We are expanding our results to include cross-validation studies using more than 25 subjects.

Rigid motion (R2) The dynamic nature of PET tracer distribution makes relative motion from the center of mass (COM) challenging. Further, COM cannot capture the rotational component. DL-HMC image encoders are trained to reduce the dimensionality of the 3D cloud data into a compressed representation (128 features per cloud). We hypothesize that the encoders include all the necessary head pose information needed to estimate rigid body motion. The series of dense connections are necessary to learn the subtle differences between the two 1 second 3D cloud encodings that parameterizes the rigid motion.

Coordinate system (R3) The 3D cloud data is in Vicra coordinate system. DL-HMC learns the relative motion in this coordinate system. After inference, we transform this position to the scanner coordinate system using the Vicra reference transformation information.

Acquisition and reference time (R3) The data used was from a 90 min scan (regular framing 5 min per frame), and here we use data from 60-90 mins after injection. We use the image at 60 min (t_ref=3600 s) as the reference image to predict motion to all following time points (1800 frames ending at t=5400).

Choice of tracer (R3) FDG is the most common tracer and the largest potential source for training data for DL. We are in the process of applying our method to other tracers.

Test cohort (R1) and test subject motion (R3) The test set (Table S2) includes 2 cocaine users, 1 healthy control and 2 subjects with cognitive diseases. These subjects had relative rigid motion magnitude (L2-norm) (mean±SD) 3.8±3.0.

Code (MR) Code will be available on GitHub upon acceptance.

Subject naming (R1,R2), grammatical revision (R1), and context (R3) We will modify the subject name in Fig 2 and Table S1 to be consistent, address grammatical errors (R1) and incorporate R3’s suggestions.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

I can accept that this work lacked baseline comparisons as it might be a promising first step towards correcting head motion in list-mode PET data. One reviewer was especially positive about it.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

9

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors introduce a new supervised deep learning solution for correcting 3D rigid-body head motion in PET imaging. This is a challenging problem due to low-quality images.

Two major contributions are: Instead of working with PET images and registering them, 1-second frames are back-projected to construct a point cloud, which forms the space for registration. This simplifies and speeds up the spatial alignment problem. Also, a motion tracking system was used as a ground truth target to properly characterize the performance.

Although more extensive experiments should be run for a complete evaluation (probably for a journal paper), the existing experiments are promising and have been well described. It has been welcome that both successful and failed attempts have been discussed.

The authors participated in the rebuttal and provided a set of detailed responses to the reviewers. Code and portion of the experimental data will be shared upon acceptance.

I fully agree with R3 that the novelty of the approach, the topic, and the urgency for a solution in the PET imaging field warrants the acceptance of this submission.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

4

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Supervised Deep Learning for Head Motion Correction in PET

This submission tackles the correction of head motion during PET reconstruction. This is done by aligning points acquired from an external tracking system, and proposing a learning approach to find the underlying rigid transformations. Expert reviewers are positive on the impact of the method. The weakness of the paper is on the limited evaluation. It is believed that this weakness is understandable given the PET scenario with the polaris vicra system and does not change the scientific merit of the paper.

For all these reasons, recommendation is towards Acceptance.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

9

back to top

Supervised Deep Learning for Head Motion Correction in PET