Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Tianyi Zeng, Jiazhen Zhang, Eléonore V. Lieffrig, Zhuotong Cai, Fuyao Chen, Chenyu You, Mika Naganawa, Yihuan Lu, John A. Onofrey

Abstract

Head motion correction is an essential component of brain PET imaging, in which even motion of small magnitude can greatly degrade image quality and introduce artifacts. Building upon previous work, we propose a new head motion correction framework taking fast reconstructions as input. The main characteristics of the proposed method are: (i) the adoption of a high-resolution short-frame fast reconstruction workflow; (ii) the development of a novel encoder for PET data representation extraction; and (iii) the implementation of data augmentation techniques. Ablation studies are conducted to assess the individual contributions of each of these design choices. Furthermore, multi-subject studies are conducted on an 18F-FPEB dataset, and the method performance is qualitatively and quantitatively evaluated by MOLAR reconstruction study and corresponding brain Region of Interest (ROI) Standard Uptake Values (SUV) evaluation. Additionally, we also compared our method with a conventional intensity-based registration method. Our results demonstrate that the proposed method outperforms other methods on all subjects, and can accurately estimate motion for subjects out of the training set. All code is publicly available on GitHub: https://github.com/OnofreyLab/dl-hmc_fast_recon_miccai2023.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43999-5_67

SharedIt: https://rdcu.be/dnwxk

Link to the code repository

https://github.com/OnofreyLab/dl-hmc_fast_recon_miccai2023

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    The article introduces a novel method for correcting head motion during brain PET imaging, which uses ultra-fast reconstruction techniques to generate one-second dynamic fast reconstruction images (FRIs) as input for the network. The network architecture consists of an encoder block and a fully connected regression block that predicts six translation and rotation components of relative rigid motion between two FRIs taken at different time points. To enhance the performance and generalisability of the model, the authors employ data augmentation techniques to simulate additional relative motions that can be concatenated with true relative motions during training. To evaluate the effectiveness of the proposed method, the authors conduct studies on an 18F-FPEB dataset and compare their method with other methods using MOLAR reconstruction study and standard uptake values (SUV) evaluation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method uses high-resolution fast reconstruction images as input, which can improve image quality and accuracy in head motion correction for brain PET imaging.
    2. Task-specific data augmentation strategy that simulates additional relative motion during training to increase the variability of the dataset.
    3. They showed the usefulness of each part by conducting an ablation study on their network.
    4. In the training (loss function) they compared the proposed deep learning-based approach with a conventional intensity-based registration method using Vicra.
    5. Multi-subject studies were conducted on healthy controls using this method, demonstrating its ability to accurately estimate motion even for subjects outside of the training set.
    6. The proposed method outperformed other methods in terms of accuracy and performance on all subjects tested, indicating its potential usefulness in clinical settings for diagnosing neurodegenerative diseases.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper mentions some limitations of the proposed method, including partial limited tracking time and low time resolution compared to Hardware based methods Further validation using larger datasets from different centers would be necessary to confirm its generalisability across different populations.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper provides detailed information on the methodology used, including the dataset and experimental setup, as well as specific details about the network architecture and training process. Therefore, it is possible to reproduce their experiments with a similar dataset and computational resources. However, it should be noted that some of the data used in this study may not be publicly available or easily accessible outside of research institutions due to privacy concerns or licensing restrictions.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The authors have presented an interesting and well-written paper on a new method for head motion correction in PET imaging. The proposed approach builds upon previous work and introduces several novel design choices, including the use of fast reconstructions as input, a novel encoder for PET data representation extraction, and the implementation of data augmentation techniques. The results demonstrate that the proposed FRI model outperforms other methods in terms of MSE loss, translation accuracy, SUV difference from Vicra reference images across multiple brain regions. However, there are some areas where further clarification or improvement could be made:

    1. It would be helpful to provide more information about how the dataset was collected (e.g., patient demographics) to better understand its generalisability.
    2. While it is acknowledged that limitations such as computational requirements were present during experimentation with this method due to hardware constraints; providing additional details on these limitations may help readers who wish to reproduce their experiments but do not have access to similar resources. 3.The authors should consider discussing potential future directions or applications beyond what has been.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The network utilized in this study draws inspiration from a previously proposed model, with a similar encoder (DL-HMC framework). The novelty lies in the implementation of data augmentation techniques to enhance the network’s robustness, coupled with the utilisation of hardware data from Vicra for training purposes (In the loss function).

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    The paper proposed a deep learning-based model for brain PET motion prediction by utilizing high-resolution one-second fast reconstruction images (FRIs) with TOF. The performance of the proposed method was evaluated on an 18F-FPEB dataset by MOLAR reconstruction study and corresponding brain ROI SUV evaluation, showing that the proposed method outperforms DL-HMC and BIS methods qualitatively and quantitatively.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper proposed an encoder by adding a data augmentation block to introduce an additional synthetic relative motion for training, which can increase the data variability.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The results in Figure 2 show that the proposed method works well for small motions. However, its effect still needs to be improved when there is immediate significant motion. The paper proposed a supervised model, but there is still a gap between the network results and the Vicra. The network generalization needs to be improved.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    it is fine

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. In the network, two embedded reference images were flattening and fed into the fully connected regression block. Will the operation of flattening interfere with the extraction of spatial information? 2.The real dataset is very small. How much augmented data were used for training?
    2. In Figure 2, the network performs worse in the y and z directions than in the x direction. It would be better to add augmented data in y and z directions.
    3. Please provide detailed information for motion prediction loss.
    4. In Table 1 last column, the MSE of the proposed w/o DA is the minimum. But the proposed FRI is bolded.
    5. In Figure 3, please add DL-HMC image results.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It seems the motion estimation effects of the proposed method are not good enough as a supervised method.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper introduces a novel deep-learning based head motion correction framework that utilizes high-resolution one-second fast reconstruction images (FRIs) with time-of-flight (TOF) as input to an encoder block for PET data representation extraction. The proposed method outperforms other competing methods qualitatively and quantitatively in a multi-subject cohort (n=20).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper is well written, with a clear and concise structure that leads to convincing results. The proposed method is thoroughly and effectively explained, making it easy to understand. The figures and results are presented in a concrete and comprehensible manner, which allows for easy comprehension and follow-through.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I would be curious to see how the proposed approach would perform on non-TOF PET data during inference, considering that many older PET and PET/MR systems do not have TOF PET data.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The author stated to meet all reproducibility requirements.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Please see the answer to Questions 6. (Weaknesses)

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes a deep learning-based method for predicting PET motion across multiple subjects. The writing is clear and concise, with the methodology being effectively explained, and the results being presented in a convincing manner. Therefore, I have rated it as 6.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The idea of using high-resolution fast reconstruction images with time-of-flight (TOF) as input to the head motion correction network is novel. The method and experiments are described thoroughly. There are still some concerns regarding the generalizability of the proposed method to other datasets, which should be discussed in the revised manuscript. Detailed information for motion prediction loss should be included as well.




Author Feedback

We thank the reviewers for their careful evaluation of our work and constructive suggestions.

  1. Operation of flattening (R1). The features extracted by the encoder represent a high-dimensional semantic representation, where the spatial information will be preserved by flattening operations. Because the flattening operation is deterministic, each value in the flattened data representation effectively corresponds to a specific spatial position in the original input images. This approach contrasts with using something like a pooling operation after the encoder, where critical spatial information would be lost during pooling. The final fully connected regression block (multilayer perceptron) learns to distinguish the rigid motion between the reference and moving image representations during model training.
  2. Motion prediction loss (R1). The network was optimized by minimizing the mean square error (MSE) between the predicted motion estimate and Vicra gold-standard parameters. The equation for the prediction error for a given pair of reference and moving clouds will be updated in the manuscript.
  3. Computational requirements (R2). In order to perform head motion prediction with a temporal resolution of 1 second, our training set should ideally include each one-second frame for a specific time period of interest for each training subject. For instance, if we have 14 subjects for training, the total number of frames would be 14x3600. Although each frame will be downsized to 96x96x64 dimensions, resulting in an image size of 10 MB, a significant amount of memory resources, namely 14x3600x10 MB, are required. In the case of a larger cohort study, even more memory resources would be needed, which can pose hardware constraints. To address this issue, two approaches can be employed. The first approach involves uniformly sampling from the entire period of interest. For a 30-minute period, selecting 540 frames (equivalent to 6 minutes) can yield similar performance as using 1800 frames in our pilot experiments. This is because tracer distribution changes are not as pronounced in [18]F studies, and samples from multiple subjects effectively cover the entire period of interest. The second approach is to dynamically swap the data loaded into memory during the model training. Although this method may extend the training time, it proves more practical for researchers with limited computational resources.
  4. Potential future directions (R2). Due to limited availability of gold-standrad Vicra motion data, in the future, we will develop semi-supervised and unsupervised deep learning methods for PET head motion correction to take advantage of the large set of data without Vicra.
  5. Generalizability (Meta Reviewer). In the future, we aim to apply the proposed method to other datasets with different PET tracers using different PET scanners with and without time of flight (TOF) in order to validate model generalizability. We will correct Table 1 for the error.



back to top