Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Ashay Patel, Petru-Daniel Tudosiu, Walter Hugo Lopez Pinaya, Olusola Adeleke, Gary Cook, Vicky Goh, Sébastien Ourselin, M. Jorge Cardoso

Abstract

Cancer is a highly heterogeneous condition best visualised in positron emission tomography. Due to this heterogeneity, a general purpose cancer detection model can be built using unsupervised learning anomaly detection models. While prior work in this field has showcased the efficacy of abnormality detection methods (e.g. Transformer-based), these have shown significant vulnerabilities to differences in data geometry. Changes in image resolution or observed field of view can result in inaccurate predictions, even with significant data pre-processing and augmentation. We propose a new spatial conditioning mechanism that enables models to adapt and learn from varying data geometries, and apply it to a state-of-the-art Vector-Quantized Variational Autoencoder + Transformer abnormality detection model. We showcase that this spatial conditioning mechanism statistically-significantly improves model performance on whole-body data compared to the same model without conditioning, while allowing the model to perform inference at varying data geometries.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_29

SharedIt: https://rdcu.be/dnwcG

Link to the code repository

N/A

Link to the dataset(s)

https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=93258287

https://wiki.cancerimagingarchive.net/display/Public/NSCLC+Radiogenomics


Reviews

Review #1

  • Please describe the contribution of the paper

    Propose a spatial conditioning method for VQVAE+transformer network to detect abnormalities with images of varying geometries.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novel idea of connecting the VQ-VAE + transformer with a spatial conditioning. The methods and the backgrounds are reasonably covered. Good ablation comparison with variations of the proposed architecture to highlight the contribution of individual components.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Major Negatives Most of my concers are in the results section

    1. While I understand the motivation is not for improving the numbers. The improvement provided by the proposed method does not seem to be too significant compared to 22, especially for the low resolution case, and the statistical significance is also not conducted. Considering that there are some hyperparameters in the new network the benefit is spurious.

    2. Continuing on the results, considering the main motivation is to adapt to images with varying geometry and FOV, evaluation should also be performed with simple affine transforms of the input data, that will provide more guidance. Similarly, sensitivity should be measured for the perturbations. This I believe to be the key experiment to demonstrate the efficacy of the model but that is missing.

    3. No ablation on the number of bins, which seems like an important hyperparameter. If not ablation there should atleast be comment on how the number 20 was chosen?

    Minor nitpicks

    • please cite the original VQVAE paper
    • in results Figure 5 should be Figure 4
    • More clarity of what exactly are “healed” sequences
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    good but parameter sensitivity not included.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The method presented here is a smart idea, and the motivation is well defined.

    However, the results section is not very convincing of the applicability of the method. The main motivation of the inference being done on images with varying geometry, is not showcased in the results other than using cropped data. There should atleast be sensitivity comparison of the method to geometric variation to see how the spatial conditioning performed.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the idea is smart but the results and ablations are not supporting the motivation of the method.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Changing my decision to weak accept. Authors suggest they performed ablations in the rebuttal on rotations which if added to the paper will provide more trust to the method. They also provided context on the hyperparameters.

    I still feel that sensitivity analysis is missing, but with the above additions I think its worthy of acceptance



Review #2

  • Please describe the contribution of the paper

    The study proposed an abnormality detection model that is invariant to the voxel resolution, image dimensions and field-of-view of the PET/CT scans. The model adapted the CoodConv, which alleviates the difference in the spatial resolution, and in the meanwhile convey the information of FOV.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The spatial conditioning on VQ-VAE makes the model unsupervised and more generalizable, which allows the model to be more tolerant and practical in the medical imaging. 2) This work defines abnormalities as deviations between the distrution of “healed” reconstructions and the observed data. Such definition allows the abnormalities to be tokenized and understandable by the VQ-VAE model.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The details of the experiments and implementation are missing. What is the dimension of the latent space, will a larger dimension improve the result? What is the limitation/threshold on the size of the abnormal area to be detected? How much variance can be tolerated in terms of FOV? Where does the model fail?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The lack of experiments details hurts the reproducibitly.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    More details about the experiment setup. Ablation study on hyperperemeters. Discussion on the failed cases and the limitation of the model.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper presented a good idea. But without more details in the implementation, it is hard for me to evaluate how successful the model is.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors have implementation details about VQ-VAE and transformer in their supplementary materials. It would be better to have them in the main paper. In addition, it is not clear how was the statistical significance P-values obtained. How many repeats? For each repeat, was the train-valid-test re-splitted or the variants were due to data-augmentation? Moreover, it is missing on experiments/ ablation studies about the bin size, the sensitivity about the geometric variation, the threshold on the size of the abnormal area to be detected, the level of variance tolerated in terms of FOV, and where the model fails.



Review #3

  • Please describe the contribution of the paper

    The authors propose a novel deep learning framework to perform abnormality detection in full-body PET scans. The paper’s main motivation is the difficulty in detecting cancer, an already highly heterogeneous condition, with the additional complication of inconsistent data geometry. The paper proposes a new spatial-conditioning approach to training a VQ-VAE- and Transformer-based model explored here. The resulting model shows improved detection performance and better robustness to diverse data geometries.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper is methodologically sound, clearly motivated, and well executed. The authors clearly show an existing gap in the literature and propose a well-thought-out solution. The experiments are setup to precisely test the hypothesis of the paper and strong results indicate the soundness of the method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the paper is well-written for a highly technical audience immersed in deep learning literature, it may be harder to access for someone with a more clinical background. This is not an explicit weakness of the paper but could be improved to make it more accessible to a wider audience.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No issues with reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The font in the figures throughout the manuscript is too small.

    The Background is contains too much content that seems more suited for a Methods section while lacking medical context. Ideally, some of the medical detail from the introduction could be moved to the background, while methodological details from the background section could be moved to the methods section.

    Section 2.1 lacks clarity, especially when it comes to dimensionality throughout the network, for instance, lower-case h,w,d should be defined.

    Section 2.2 should be probability mass function since the latent tokens are discretized, not probability density function. Dice & Adam are not acronyms and should not be all capitalized

    The authors could describe what exactly doing a “raster scan of the latent” means.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This seems like a strong paper and a good fit for the conference. I am not confident enough in my ability to comprehensively judge the quality of the paper to give it an 8 but it certainly seems worthy of acceptance.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    While a slight rework of the presentation would have been appreciated, the overall paper is still strong and I disagree with some of the other reviews stating the results are too weak. The authors show that their method is robust and performs on-par-with or better to comparable methods. The contents of the original review still stand.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes a deep learning approach to abnormality detection in full-body PET scans using a VQVAE+Transformer network that is invariant to voxel resolution, image dimensions, and field-of-view (FOV). To address the challenge of inconsistent data geometry, the authors propose a spatial-conditioning method to train the model. The method adapts the CoordConv to convey the information of FOV and alleviate the difference in spatial resolution. The resulting model shows improved detection performance and robustness to diverse data geometries. While the paper is methodologically sound and well-executed, some concerns are raised in the results section. The improvement provided by the proposed method is insignificant compared to the baselines, especially for the low-resolution case, and statistical significance is not conducted. The evaluation should also be performed with simple affine transforms of the input data to provide more guidance, and sensitivity should be measured for perturbations to demonstrate the model’s efficacy. Additionally, no ablation is performed on the number of bins, which seems like an important hyperparameter. Furthermore, the results section does not convincingly demonstrate the method’s applicability to images with varying geometry. There should be a sensitivity comparison of the method to geometric variation to see how the spatial conditioning performed. The details of the experiments and implementation are also missing, such as the dimension of the latent space and its effect on results, the threshold on the size of the abnormal area to be detected, the level of variance tolerated in terms of FOV, and where the model fails. In summary, while the proposed spatial conditioning method for VQVAE+Transformer network shows promise for abnormality detection, further experiments and analysis are necessary to fully demonstrate its efficacy and applicability to images with varying geometry.




Author Feedback

We are pleased that all reviewers agree on the novelty and potential of our solution. We would further like to address some key points raised.

Regarding comments on the improvement provided not being significant, firstly, we would like to note that the purpose of this research is to rethink how we analyse images on a case-by-case basis. We are looking to generate a solution to deal with images in their native space by generating a model that is robust to variations in geometry with no degradation in performance, thus eliminating heavy pre-processing requirements and possible resampling issues/biases, in addition to maximising data utilisation. We believe our solution succeeds at this. Second, we would also like to point out that statistical tests were conducted and reported in the paper. Table 1 shows statistically significant improvements (P<0.05) underlined. Our method showed the best performance, and its values were significant in all scenarios (except for low-resolution). Additionally, we report more significant findings in the text of the results section. Although the results may not show a significant improvement in the low-resolution case, the motivation of this research is to generate invariance to geometries without performance degradation, thus we need to look at results holistically. The baseline models trained on Whole Body data are trained only on low-resolution data (as all data had to be registered to a group space). Even so, the VQ-VAE Transformer model trained only on this data shows performance just short of our approach, trained on both low- and high-resolution data showing the capability of our model to maintain optimal performance. Additionally, for the high-resolution and cropped dataset, we see statistically significant improvements whilst running inference on the images’ native space without the need for registration/resampling, as required for baseline methods. There has also been no hyperparameter tuning for the spatial conditioning, i.e. the number of bins used, and as such further tuning of this parameter could yield greater performance.

Regarding the choice of 20 bins for the spatial conditioning and lack of ablation study. We agree this could be further explored. The motivation for choosing 20 bins was to roughly match the average dimension of the latent representation of images in the training data. In doing so, we chose this value such that for an average sized Whole Body image, its latent code would be approximately equal to 20^3 and as such roughly align with the level of binning chosen (mentioned in section 3.2). However, although an appropriate value, we agree this warrants further exploration.

For the remaining hyperparameters, we refer readers to [1]. In this work, they extensively explore optimal codebook sizing in the VQ-VAE in addition to KDE parameters. We replicate our hyperparameter choices from those results. We also note that the reviewers require further information on the experimental setup. Unfortunately, given the page limit, we only made this information available in our supplementary material. In the supplementary material we have a detailed outline on the experimental setup including model architectures, in addition to training information with respect to losses, batch sizes, epochs, optimisers and learning rates. We also outline training procedures related to data usage and augmentations. We agree that this information is of great importance and we will include as much of it as possible in the main paper.

Finally, regarding further evaluation, we have conducted experiments with various rotations for which our model shows no degradation in performance, in addition to statistically significant performance over baselines. If allowed by MICCAI, these results will be added to the final version of the paper.

[1] - Patel, A., et al. 2023. Cross Attention Transformers for Multi-modal Unsupervised Whole-Body PET Anomaly Detection. https://doi.org/10.59275/j.melba.2023-18c1.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents a deep learning approach for abnormality detection in full-body PET scans using a VQVAE+Transformer network. The proposed method demonstrates invariance to voxel resolution, image dimensions, and field-of-view, enabling robust analysis of images on a case-by-case basis. By operating in the native image space, the approach eliminates the need for extensive pre-processing and avoids potential biases caused by resampling. The study successfully addresses reviewers’ concerns and offers a novel and clinically relevant solution to a significant problem. With the necessary revisions incorporated, the paper is well-positioned for publication at MICCAI, making a valuable contribution to the field of medical imaging analysis.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper proposes a whole body tumor detection approach from PET images by using a VQVAE + Transformer approach to address the challenge of handling images with varying field of views. The work is novel and the application to whole body tumor burden detection is also clinically relevant. The authors addressed reviewers’ concerns satisfactorily.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The reviewers have acknowledged the valuable contribution of this work, and I fully support its acceptance for publication. However, I strongly recommend that the authors address the issues raised during the rebuttal process in the final version of the manuscript. Specifically, the authors should provide a sensitivity analysis that is currently missing. Additionally, the implementation details about VQ-VAE and transformer should be included in the main paper. The statistical significance P-values were not clear, and the authors should conduct experiments and ablation studies on the bin size, sensitivity to geometric variation, threshold on the size of the abnormal area, tolerance for variance in FOV, and determine where the model fails. These issues need to be addressed before final acceptance of the paper.



back to top