Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xueqi Guo, Bo Zhou, Xiongchao Chen, Chi Liu, Nicha C. Dvornek

Abstract

Inter-frame patient motion introduces spatial misalignment and degrades parametric imaging in whole-body dynamic positron emission tomography (PET). Most current deep learning inter-frame motion correction works consider only the image registration problem, ignoring tracer kinetics. We propose an inter-frame Motion Correction framework with Patlak regularization (MCP-Net) to directly optimize the Patlak fitting error and further improve model performance. The MCP-Net contains three modules: a motion estimation module consisting of a multiple-frame 3-D U-Net with a convolutional long short-term memory layer combined at the bottleneck; an image warping module that performs spatial transformation; and an analytical Patlak module that estimates Patlak fitting with the motion-corrected frames and the individual input function. A Patlak loss penalization term using mean squared percentage fitting error is introduced to the loss function in addition to image similarity measurement and displacement gradient loss. Following motion correction, the parametric images were generated by standard Patlak analysis. Compared with both traditional and deep learning benchmarks, our network further corrected the residual spatial mismatch in the dynamic frames, improved the spatial alignment of Patlak Ki/Vb images, and reduced normalized fitting error. With the utilization of tracer dynamics and enhanced network performance, MCP-Net has the potential for further improving the quantitative accuracy of dynamic PET. Our code is released at https://github.com/gxq1998/MCP-Net.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_16

SharedIt: https://rdcu.be/cVRvH

Link to the code repository

https://github.com/gxq1998/MCP-Net

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This work seems like an extension of the work in [10]. Full text of [10] is not accessible. The difference from [10] might be that, Patlak fitting error, which accounts for the tracer kinetics in dynamic PET imaging, is added into the cost function when training a neural network based model to learn the motion displacement fields for motion correction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A kinetic model (Patlak) is added to account for tracer kinetics in dynamic PET imaging. Although Patlak only applies to irreversible tracers, the model itself is very favourable for motion correction, as the plasma input Cp on the right hand side of the equation is not affected by motion and can provide some robustness for using this as a constraint to estimate motion.

    The evaluation in this work is done extensively.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The difference between the proposed MCP-NET model and the previous B-convLSTM model [10] is not clear. The full text of [10] is not available.

    Is MCP-NET= B-convLSTM + Penalty(Patlak)? If so, why was the hyperparameter lambda set to different values (0.1 for MCP-NET and 1 for B-convLSTM) in the comparison?

    1. The proposed model aims for inter-frame motion correction in whole-body PET imaging. The frame length for motion correction in this work is 5 min. However for whole-body imaging lots of the movements such as cardiac motion, respiratory motion, sliding motion etc, are continuous. Thus the motion correction is, to some extent, limited in time resolution.

    2. Motion-introduced mismatch in attenuation/scatter correction is not mentioned. Based on the protocol given in the Supplementary, a single CT was taken before the PET scan, and the motion correction was done post reconstruction. I understand this is a problem for all post reconstruction motion correction methods, but hopefully the authors can be aware of this.

    3. The original image resolution is not given. 4X downsampling was applied before feeding the image data to the neural network, which could mean 5-10mm voxel size? The effects on the resolution of motion estimation is not clear. Fig2 does not have units for motion fields either.

    4. Why use LNCC as a similarity measure for dynamic frames?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have given lots of information for reproducing this work, with some details missing.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The order of citation numbers in the main text is random.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work includes a kinetic model to account for tracer kinetics in dynamic PET imaging for data-driven motion correction. It might be an extension of a previous work. By using a computationally simple kinetic model, the authors demonstrated the improvements in the final outcomes. This work has been successful in proving its motivation.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors propose an inter-frame motion correction framework called MCP-Net with Patlak regularization and a Patlak loss term to register whole body dynamic PET scans. The framework consists of: 1. a 3D UNet based motion estimation module, 2. a spatial transformation module to warps images, and 3. an analytical Patlak module to estimate Patlak fitting. This paper uses tracer dynamics to improve network performance for image registration, and it performed better than traditional non-rigid and other deep learning based algorithms in correcting spatial mismatch, reducing normalized fitting error, and improving spatial alignment of K_i and V_b images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The motivations described in the paper define the need for an image registration technique in whole body PET very well.
    2. The integration of a Patlak fitting module into the registration framework is novel. The addition of a Patlak loss penalty through a mean squared percentage fitting error in the loss function is new.
    3. The comparison of the proposed approach against previous methods is insightful and the metrics chosen to describe the performance are adequate (e.g. normalized mutual information, avg-to-ref SSIM etc.).
    4. Statistical analysis was also conducted, which is a plus given that not many statistical tests are run these days in DL in medical imaging papers.
    5. The paper is easy to read and organized clearly.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. It seems like the amount of data used to train the model is smaller than usual. I might be wrong, but it seems that there are only 19 frames per volume for 27 patients in total. Is this correct? If so, how many volumes were used for training/validation/testing? This point could be clarified further as I am not sure if the model is overfitting given a limited data quantity.
    2. 57 hyper metabolic regions were selected for additional evaluation by a nuclear medical physician. Are these the same regions that are shown in Figs. 3, 4, and 5?
    3. While it must be challenging to acquire data from additional scanners (other than the Siemens Biograph mCT), how well does this model translate to data acquired from different scanners within the same/different institute?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper presents a link to the code, which I’m assuming will be public upon acceptance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. It looks like both B-ConvLSTM and MCP-Net seem to have very very slight difference in performance (0.9513 +- 0.014 vs 0.9523 +- 0.014 SSIM, 0.6390 +- 0.1005 vs 0.6197 +- 0.1032 torso NFE etc.). So, it is difficult to envision the scale of improvement; how much error is actually tolerable clinically by any method?
    2. Following this train of thought, the authors mention that MCP-Net achieved “lowest remaining spatial mismatch” and for “significant motion in the hand and bladder, …, MCP-Net still has the capability to reduce error.” But this error bound is hard to define; ideally, we want perfect registration between reference/source, so what is the tolerable level of error that we can get before this model will be practically (clinically or pre-clinically) useful?
    3. The data sub-section of the paper could use some clarity. From 27 subjects, how many volumes were used? If they each have 19 frames, then why were they resized/reshaped to 128x128x256 and not 128x128x19? Was there some volumetric resampling that occurred?
    4. I’m curious to know the instances where MCP-Net failed in registration? What caused its failure and how much was the fitting error?
    5. Perhaps B-ConvLSTM and MCP-Net could be used in conjunction in an ensemble registration framework?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I still have a few concerns about the data aspect of this paper, but the results seem solid (despite being ever so marginally higher than B-ConvLSTM). If the authors can explain the data section better, it would be helpful. It would also alleviate concerns with overfitting.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposed a way to correct inter-frame motion during the acquisition of a whole-body dynamic PET scan. The proposed MCP-Net takes tracer kinetics into account, as opposed to other methods that treated motion correction as a registration problem. Qualitative and quantitative analysis showed that this framework is promising for improving the accuracy of dynamic PET.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work makes use of tracer kinetics, which is a key characteristic of dynamic PET. For this particular problem that the authors are trying to address, it makes a lot of sense to treat the motion correction not as a merely geometry problem.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The lack of ground truth makes the results questionable to a certain extent. The authors tried to address this by motion simulation, but it is not known to me how realistic the simulation was, especially with the low number of subjects involved.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Good

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The authors should think of some ways to get more ground truth (or close to ground truth) data to make the results more convincing. Even though the motion could be non-rigid, at least some measurable gross motion could still be useful.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Making use of tracer kinetics for motion correction is the right direction to address this problem.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Contribution

    The work combines inter-frame learned motion correction with time series fitting (Patlak) for dynamic whole-body PET images.

    • Well motivated.
    • Integrating motion correction with Patlak fitting is novel (as is the mean squared percentage fitting error in the loss function).
    • Extensive and insightful evaluation with statistical analysis of improvements.
    • Easy to read and well organised paper.

    Weaknesses to address

    • Unable to compensate for intra-frame motion (mention this limitation).
    • Does not mention mismatch with attenuation and scatter correction (better to also combine image reconstruction with time-series modelling and motion correction).
    • Improvements appear small, so it is unclear whether they matter clinically. More details about failures of MCP-Net may be needed.
    • A better treatment of ground truth could make the method more convincing (editor: time permitting, perhaps fit the model with a left-out frame, and assess the model reconstruction against the left-out data, interpolating the deformations between the adjacent time-points).
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1




Author Feedback

We thank the reviewers and the meta-reviewer (MR) for all the helpful comments. We would like to answer questions and clarify potential confusions here:

Both R1 and R2 asked about the difference between the proposed MCP-Net and the previous benchmark B-convLSTM and the different hyperparameter selection. The main improvement of MCP-Net is the introduced Patlak penalty. We tuned the hyperparameters in the two models separately and compared the best performance with respective optimal hyperparameter selection. It is reasonable that in MCP-Net the two penalty terms need to be balanced and the optimal lambda might be different than the situation when lambda is the only penalty. Also, R1 asked about the usage of LNCC as the similarity measurement. It was originally implemented in the VXM benchmark, and we kept the same setting as in all the deep learning models.

Both R1 and MR mentioned the continuous motion problem (intra-frame) as well as the attenuation/scatter mismatch issue. We are aware of such limitations since our current work mainly focuses on correcting inter-frame motion and the spatial misalignment issue, and we will include discussion of such limitations in the final version of this paper. Our future work will be investigating intra-frame motion correction as well as the mismatch problem in attenuation and scatter correction. By integrating these approaches, we expect the parametric quantification to be further improved.

R1 asked about the image resolution. In our Supplementary Figure 1, we have mentioned that the original resolution was 2.04mm x 2.04mm x 2.03mm. The downsampled resolution was only used in the motion estimation step due to limited GPU RAM. The displacements were upsampled back to the original resolution before warping the original frames, and the following Patlak analysis was also implemented on the original image resolution. It is noted that the proposed method performed motion estimation on the downsampled resolution better than the traditional baseline on the original resolution. We will add the motion field units in the final version of this paper.

R2 was concerned about the dataset size. Each dynamic frame is one 3D volume. The input of the network is a 3D dynamic frame sequence with length = 5, i.e., the input size is 5 x 128 x 128 x 256, and thus data augmentation was performed in the temporal dimension. Due to the relatively small subject group size, we implemented a 9-fold cross-validation to comprehensively evaluate performance. A future direction of this work is to test the generalization ability on datasets from different scanners/institutes.

Both R2 and MR were concerned about the level of improvement as compared with B-convLSTM and questioned the clinically acceptable error bound. First, we did conduct a paired two-tailed t-test and found significant difference between MCP-Net and B-convLSTM in torso NFE, whole-body Ki/Vb NMI and Avg-to-ref SSIM. Thus, such improvement was consistent and was able to show the advantage of directly optimizing parameter estimation. Second, since dynamic PET is still an emerging application, the clinical tolerance still needs to be determined when larger scale studies are published.

Both R3 and MR asked about the lack of motion ground truth and motion simulation test details. As we mentioned in the paper, due to the lack of ground truth motion vectors in whole-body real-patient datasets, we applied patient-derived motion field predictions to the selected “motion-free” frames of another subject as the direct evaluation. Both the motion vectors and the selected “motion-free” frames were derived from another well-trained deep learning model to preserve the realistic motion and avoid model bias in evaluation. Due to time limitation, we included a small number of simulation cases for evaluation to show a preliminary trend. Comprehensive evaluations with a larger number of cases will be investigated in future work.



back to top