Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Geoffroy Oudoumanessah, Carole Lartizien, Michel Dojat, Florence Forbes

Abstract

Anomaly detection in medical imaging is a challenging task in contexts where abnormalities are not annotated. This problem can be addressed through unsupervised anomaly detection (UAD) methods, which identify features that do not match with a reference model of nor- mal profiles. Artificial neural networks have been extensively used for UAD but they do not generally achieve an optimal trade-off between accuracy and computational demand. As an alternative, we investigate mixtures of probability distributions whose versatility has been widely recognized for a variety of data and tasks, while not requiring excessive design effort or tuning. Their expressivity makes them good candidates to account for complex multivariate reference models. Their much smaller number of parameters makes them more amenable to interpretation and efficient learning. However, standard estimation procedures, such as the Expectation-Maximization algorithm, do not scale well to large data volumes as they require high memory usage. To address this issue, we propose to incrementally compute inferential quantities. This online ap- proach is illustrated on the challenging detection of subtle abnormalities in MR brain scans for the follow-up of newly diagnosed Parkinsonian patients. The identified structural abnormalities are consistent with the disease progression, as accounted by the Hoehn and Yahr scale.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_40

SharedIt: https://rdcu.be/dnwBA

Link to the code repository

https://github.com/geoffroyO/onlineEM

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    Anomaly detection in medical imaging is a challenging task which can be addressed through unsupervised anomaly detection with probability distribution (UAD) methods,which identify features that do not match with a reference model of normal profiles.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The online incrementally compute inferential quantities method used has a strong aspect to the existing method as explained mathematically in Section 3.
    2. The formulation of the inferential quantities was written with good flow of understanding and approach.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. This paper tend to be too theorethical not much of scientific findings can be found. It tends to agreeing on the finding of other research with the same findings.
    2. Equation in page 6 was not properly label.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility checklist has been filled but not codes were attached.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The proposed method has little effect on the subject, maybe author should include a further experiments done to prove the theory.
    2. There is room for improvement if researcher focus on how to do more conclusive experiments by stating the hardware or software used.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is too theorethical and was not clear in explaination of the incrementally compute inferential quantities i.e how is the computation was done

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision
    1. In the rebuttal section, author remark : “ Using follow-up data and the Hoehn & Yahr scale of patients, we exhibited a clear correlation between the number of anomalies and the progression of the pathology. This highlights the potential of our approach for tracking the development and severity of the disease and confirms the pathophysiological expectations in terms of the most impacted structures. We propose to clarify Figure 1 (R2) to better highlight the most impacted brain structures and emphasize the difference in the numbers of detected anomalies.” Where is the correlation value being said or written?.



Review #2

  • Please describe the contribution of the paper

    This paper proposed an unsupervised anomaly detection (UAD) algorithm which uses mixture of probability distributions (Gaussian mixtures and multiple scale t-distributions) as a frugal alternative (in terms of computational demand) to deep neural networks to detect abnormalities in MRI brain scans of patients with newly diagnosed Parkinson’s disease. It uses an online Expectation-Maximization (EM) algorithm which scales better than the usual EM algorithm to large data volumes. The algorithm first learns a reference model (normal model) and then takes a decision whether the data is an outlier or not by considering the proximity to the reference model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed algorithm is novel as it could identify the subtle anomalies in the MRI scans of just diagnosed Parkinson’s patients according to a reference scale. The model also has less parameters and hence less memory usage compared to deep neural networks. The theoretical explanation of the proposed approach is very good.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -Why the proximity measure needs to be large when one of the dimensions is well explained by the reference model? Because proximity intuitively means closeness to a distribution or model. Is it because you use the inverse of Mahalanobis distance? The motivation behind the usage of proximity measure is not clear. -In section 4, how is M and FPR decided? Are these based on observation on some validation set? -The classification results should have been presented in the form of a table. -With the two mixture models, what are the classification results of the individual subcortical structures which are mostly impacted in the early stages of PD. -A better explanation of Figure 1 should be provided. -A comparison of number of the number of parameters for the two mixture models should have been made in the efficiency section and the number of parameters that are reduced with this online EM algorithm compared to a deep neural network algorithm should be made. -Nowadays with the advanced and efficient GPUs available, training of deep neural networks is not very difficult and it will be less time consuming especially for the dataset considered in this paper. Nevertheless, a comparison with deep neural networks should be made to have an idea of the trade-off between computation power, memory usage and accuracy.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper is difficult to replicate due to the advanced mathematical operations. But, since the authors have mentioned that the code would be available, this method could be replicable.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Please refer to the weaknesses section for the detailed comments.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although this paper has sound theoretical analysis, its application looks limited and specific only to the ones presented in this paper. Also, a comparison to the state-of-the-art deep neural networks should be made to understand the trade-off between accuracy and compute power for those deep neural networks and how the proposed approach helps.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors have addressed most of my concerns in the rebuttal. They have also agreed to add some deep learning baselines for comparison and discuss the trade-off between memory consumption and performance. Considering that the proposed changes would be included in the final version, I change my rating to weak accept.



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors propose to make use of an unsupervised framework to identify abnormalities in Parkinson’s disease data using an online EM algorithm. Adapting a conventional EM algorithm to an online version makes the computation feasible, allowing for an alternative to complex deep learning based anomaly detection.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper proposes a novel way to address unsupervised anomaly detection - which often comes up in today’s medical data settings where we have more data but find it challenging to label all data
    • The proposed method builds on well-known methods, and adapts them to the frugal mindset in an interesting way - by proposing an online algorithm
    • The evaluation method to compare the results is also interesting - using the available HY values
    • The paper is well-written overall
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the method is well-motivated and explained, the results are not analysed in sufficient detail.

    • It would have been interesting to present more visual results, such as graphs/brain-overlay-plots highlighting the results
    • It would also have been good to have a tabular comparison of the proposed method with existing methods, including the computational resource demands
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The exact steps of the algorithm used for the EM are not outlined clearly. The data is used publicly available. It could be possible to reproduce the algorithm on the data, but the parameters used etc are not known.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The main aspect of the paper that could be improved is the experimental evaluation. It would be good if the authors could compare their proposed work with other existing works in the field. Also, further images/plots to highlight the importance of the obtained results would be useful to understand the value of the proposed method.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite the lack of a rigorous experimental evaluation, the paper proposes an interesting idea to solve a growing problem in working with medical imaging data.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    The author seems to address majority of the concerns raised by other reviewers. I’d like to maintain my decision.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors introduce a light-weight EM based approach for anomaly detection in medical imaging datasets. The math that is presented is sound and presented well. Overall, the paper is written clearly as well. I do commend the effort of the authors to develop a “Green AI” method as this seems to be the main goal. Thus it would help to highlight the energy consumption decrease of the propose method over deep learning baselines as main results figure (perhaps as a table). Thus an overall comment would be revise the paper to emphasize some more results / metrics to engage the general miccai reader. Other excellent comments have been provided by reviewers as well which I recommend the authors to take note in a rebuttal.




Author Feedback

We thank the reviewers for their comments and for underlining that we propose a novel (R2-3) interesting idea to solve a growing problem in medical imaging (R3) in a frugal way (mR) and in a clear mathematical framework (mR, R1-2).

A common concern is the too theoretical content of the paper (R1) with a too concise experimental analysis (R1-2-3). Indeed, the goal was to to highlight the Green AI topic, certainly also of interest to the Miccai community, with the development of “frugal by design” methods, both for algorithms conception and implementation. However, the point was not to say that Deep Learning (DL) solutions are not possible (R2), but that there may be an interesting trade-off to investigate between energy/memory cost and performance.

We reckon the work is then preliminary in terms of experimental validation. We limited to one Parkinson Disease (PD) example to fit in the page limitation. Still, this is an important example, both to support our message on more frugal solutions and to motivate others to look in this direction, while providing interesting and new results on PD-induced brain alterations, particularly difficult to detect at the early stages of PD (R1). To specify (R1-2), the goal was to identify the most impacted subcortical structures and to gain insights on the disease progression. Using follow-up data and the Hoehn & Yahr scale of patients, we exhibited a clear correlation between the number of anomalies and the progression of the pathology. This highlights the potential of our approach for tracking the development and severity of the disease and confirms the pathophysiological expectations in terms of the most impacted structures. We propose to clarify Figure 1 (R2) to better highlight the most impacted brain structures and emphasize the difference in the numbers of detected anomalies.

As suggested (R2, mR), we propose to add a comparison with other DL baselines, DAGMM (Zong et al, 2018), CFLOW-AD (Gudovskiy et al, 2021), reconstruction error (RE), each combined with 2 autoencoders, lightweight AE (LAE) already used for PD (Pinon et al, 2023) and Resnet18. Note that for Resnet18 the number of parameters is ~23M. Then, as suggested by all reviewers, we propose to insert a table reporting for training/inference: hardware, energy and memory (DRAM peak) consumptions, running time, number of model parameters (R2) and performance (Gmean). As an example, for 70M voxels:

Our online mixtures: CPU, 100/30 kJ, 800/100 MB, 1/20 min, 130, 0.66. LAE + RE: GPU, 5000/8000 kJ, 25/25 GB, 1h20/3h, 5300, 0.61. LAE+DAGMM: GPU, 6300/12000 kJ, 27/27 GB, 2h20/4h, 10058, 0.56.

The consumption metric is computed with PowerAPI, which uses RAPL energy reporting available on Intel processors, and the Nvidia Management Library on Nvidia GPU devices. In terms of hardware (R1), experiments, as mentioned p.7, were done on a workstation with a CPU Intel i7-4790@3.60Gh and 16Gb of RAM. For software (R1), we developed in Python with the JAX library. The code is simple and enables fast mixture learning with up to 70M data points in less than a minute, vs 17h for one Resnet18 epoch run on a Nvidia V100, indicating the efficiency and scalability of our approach. We will provide a github link for reproducibility.

Other clarifications (R2): the proximity measure is used to detect items too far from the reference model. Items with a low proximity, i.e., a large distance, are considered as abnormal. In section 4, M=3 is the features dimension with (T1w, FA and MD) values for each voxel. The FPR is a tuned parameter, but its specific value does not impact the results much. More specifically (R2-3) there are not many hyperparameters to be set, namely the FPR and the number of mixture components (K), which is set automatically with the slope heuristic. The method is easier to use than some DL models as all steps are dictated by the model and comparatively only one hyperparameter -the FPR- needs tuning.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal have satisfied most reviewer concerns. Therefore the rating of this paper has been improved to an accept.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors propose a lightweight and unsupervised non-DL method for abnormality detection in medical images and evaluate it on brain scans of Parkinson’s patients. The method relies on an adapted online version of a EM algorithm, which makes all the computations feasible. I like the idea of also considering non-DL methods in our domain and the authors nicely motivate this also in terms of Green AI (e.g., designing traditional methods that outperform/perform on par with DL methods while being more resource efficient). As far as I can see, the proposed method is novel and in my mind of great interest to the MICCAI community. The major weaknesses of the paper is its rather limited evaluation (only one disease/application scenario considered), but I think that’s justifiable for a methodological paper.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The work presents an online EM algorithm for unsupervised subtle anomaly detection, which is light-weight and computationally more efficient compared to existing deep learning methods. The work is novel and interesting with theoretical support and presents an alternative to the main stream deep learning works. The rebuttal provided some statistics on the energy and memory consumptions of the work compared to other deep learning approaches, which will strengthen the claim of the work contribution. Major weakness of the work lies in its lack of sufficient experimental validation and its limitation to only one application. However, the general methodology is novel and will be of interest to the community and the efforts towards “Green AI” will be appreciated. More experimental validation is suggested to be considered for its future work.



back to top