Authors

Aofan Jiang, Chaoqin Huang, Qing Cao, Shuang Wu, Zi Zeng, Kang Chen, Ya Zhang, Yanfeng Wang

Abstract

Electrocardiogram (ECG) is a widely used diagnostic tool for detecting heart conditions. Rare cardiac diseases may be underdiagnosed using traditional ECG analysis, considering that no training dataset can exhaust all possible cardiac disorders. This paper proposes using anomaly detection to identify any unhealthy status, with normal ECGs solely for training. However, detecting anomalies in ECG can be challenging due to significant inter-individual differences and anomalies present in both global rhythm and local morphology. To address this challenge, this paper introduces a novel multi-scale cross-restoration framework for ECG anomaly detection and localization that considers both local and global ECG characteristics. The proposed framework employs a two-branch autoencoder to facilitate multi-scale feature learning through a masking and restoration process, with one branch focusing on global features from the entire ECG and the other on local features from heartbeat-level details, mimicking the diagnostic process of cardiologists. Anomalies are identified by their high restoration errors. To evaluate the performance on a large number of individuals, this paper introduces a new challenging benchmark with signal point-level ground truths annotated by experienced cardiologists. The proposed method demonstrates state-of-the-art performance on this benchmark and two other well-known ECG datasets. The benchmark dataset and source code are available at: https://github.com/MediaBrain-SJTU/ECGAD

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_9

SharedIt: https://rdcu.be/dnwb7

Link to the code repository

https://github.com/MediaBrain-SJTU/ECGAD

Link to the dataset(s)

https://physionet.org/content/ptb-xl/1.0.3/

https://www.physionet.org/content/mitdb/1.0.0/

https://www.cs.ucr.edu/~eamonn/discords/

Reviews

Review #3

Please describe the contribution of the paper

The paper proposes an ECG anomaly detection framework that uses two-branch autoencoder architecture. One branch focuses on global features from the entire ECG signal and the other on local features i.e., heartbeat-level details. The global and local features are concatenated and passed to a Multi-scale Cross-attention module to learn a robust representation of the signal, which is then used to reconstruct the sequence. The anomalies in the ECG are identified using the restoration error. Global and local feature branches use masking, and the network is trained on “in-painting” tasks. Hence, the signals with a high restoration error in the in-painting task are deemed anomalies. Dataset: Three publicly available ECG datasets were used to evaluate the proposed method, including PTB-XL, MIT-BIH, and Keogh ECG. The authors provide signal point-level annotations of 400 ECGs from the PTB-XL dataset.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. A solid framework that analyses the data on a global and local scale which tries to mimic the anomaly detection by human experts
2. Including masking and in-painting to detect anomalies is a simple and low-cost solution data-wise.
3. This model could pave the way for using existing datasets effectively without extra annotations.
4. The model has the potential to be deployed in real-world applications.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Like most machine learning models, the proposed network is only as good as the dataset used in training. Especially in this case, the model only trains on normal ECG signals. If the model has not seen certain normal ECG patterns during the training, it may tag them as abnormal. See questions for more details.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors have mentioned that the benchmark dataset and code will be made available. They use public datasets to test the proposed method.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
Questions:
1. Since the model is only trained on normal ECG signs, it is important to quantify how the training data variability affects the model’s performance. a. Were there any ablation experiments done to study this? b. What happens when the model encounters a normal signal at inference but the model predicts outputs with high restoration error?
2. A good way to test the robustness and generalization of the model could be to train on one dataset and try it on another (maybe after some fine-tuning for the new dataset). a. Do such experiments seem feasible to the authors?
  b. How do they expect the model will perform under these circumstances (i.e., testing on unseen normal and abnormal ECG)?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper proposes a very straightforward solution for anomaly detection. Although there are some questions regarding the model’s generalization, the proposed method seems to be a low-cost solution for the problem, which does not require ground truth annotation of the ECG signals. This could help in utilizing existing models. The model is a step forward in the correct direction for the problem at hand and could benefit by considering some measures that could make it more robust for real-world scenarios.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The authors have highlighted the challenging characteristics of electrocardiogram (ECG) anomaly detection, which are the substantial inter-individual differences and the presence of anomalies in both global rhythm and local morphology. To address these challenges, the authors have introduced a multi-scale cross-attention restoration framework for ECG anomaly detection and localization. The proposed framework employs a two-branch autoencoder that focuses on global and local features, respectively. During the learning process, cross-attention is used to consider both features simultaneously, resulting in improved performance in various types of anomaly detection tasks (e.g., patient-level, heartbeat-level, signal point-level) compared to competing methods. Moreover, the authors have proposed a new and challenging benchmark with signal point-level grount truths, which are annotated by experienced cardiologists.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed learning framework is quite straightforward, easy to follow, and robust to challenges mentioned by the authors. The authors have also conducted extensive experiments to demonstrate the effectiveness of the proposed method in handling various types of anomaly detection tasks. Moreover, the proposed benchmark with signal point-level grount truths can serve as a useful resource for future studies in ECG anomaly detection.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The paper lacks some necessary details on the method and implementation. Additionally, the analysis mainly relies on simple numerical comparisons, without delving deeper into the reasons behind the results.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The proposed benchmark dataset and source code will be made publicly avilable.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. Missing details a) Trend Generation Module (page 4) The trend generation method including average window, and difference window normalization (which are shown in Fig. 1.) need further explanation (or with formulas) and clarification. Additionally, it appears that the x_t used in the TGM is a residual rather than a trend, as defined in time series analysis. The authors should clarify this terminology. b) Competing methods A brief introduction of the competing methods, including their differences from the proposed method, should be provided. c) For implementation details, how the mask is implemented? d) The paper lacks detailed analysis of the results beyond simple numerial comparisons. For instance, in Anomaly localization paragraph (Table 3), it would be helpful to analyze the reason each method works and its performance according to the characteristics of each patient. e) In section 3.2-Ablation study for the effectiveness of individual components, how was the model implemented when each component was not in use? For example, how was the model’s performance measured when nothing was used?
2. Questions a) Is there a reason why L_trend doesn’t use uncertainty-aware restoration loss? b) For the claim “This process guides global feature learning using time-series trend information, emphasizing rhythm characteristics while de-emphasizing morphological details.”, Can the authors explain this claim? (Can the authors explain how utilizing the residual information x_t will emphasize the rhythm while de-emphasizing the morphological details?) c) Can the authors explain why uncertainty values are needed in anomaly score measurement (Eq. 3)? The reviewer is concerned that uncertainty values may make the anomaly score inconsistent. For example, according to Eq. 3, when the residual error (which is the numerator of the two first terms) is the same, the larger the uncertainty value, the smaller the anomaly score. However, it seems that the larger the uncertainty value, the harder the value is to be restored, which means that it is an anomaly. d) Can the author provide an ablation study for the trade-off parameters of the total loss (Eq. 2)?
3. Minor comments a) Can the authors provide a reference for self-attention? b) Is there a specific meaning of k in d_k? If it is just a fixed feature dimension, it would be more straightforward to use D+d. c) On page 4, line 12, the notation for the two outputs of the decoder may need clarification. d) It would be helpful to include a figure showing examples of the patient-level, heartbeat-level, and signal point-level annotations to illustrate the value of the proposed dataset especially for readers who may not be familiar with ECG data. e) The authors’ mention of inference resources in the last sentence on page 5 is very practical and commendable.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Despite some missing details and limited analysis, which is simple numerical comparisons, the paper is well organized and could be further improved and the proposed learning framework and benchmark have shown promising results and potential for future studies in ECG anomaly detection.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #1

Please describe the contribution of the paper

This paper introduces a novel multi-scale cross-restoration framework for ECG anomaly detection and localization that considers both local and global ECG characteristics. The proposed framework employs a two-branch autoencoder to facilitate multi-scale feature learning through a masking and restoration process, with one branch focusing on global features from the entire ECG and the other on local features from heartbeat-level details. The proposed method demonstrates state-of-the-art performance on a benchmark developed by the authors and two other well-known ECG datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The method seems novel and technically sound. The paper is well written and easy to follow. The proposed method achieves SOTA performance on multiple datasets. The authors developed a new benchmark. The benchmark dataset and source code will be made publicly available.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

What is the main difference between restoration and reconstruction, and what is the advantage of using the former instead of the latter? More explanation may be needed.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Very good. The benchmark dataset and source code will be made publicly available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

What is the main difference between restoration and reconstruction, and what is the advantage of using the former instead of the latter? More explanation may be needed.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method seems novel and technically sound. The paper is well written and easy to follow. The proposed method achieves SOTA performance on multiple datasets. The authors developed a new benchmark. The benchmark dataset and source code will be made publicly available.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

I agree with the three reviewers, who speak highly about the paper. I have recommended this paper be considered for an oral presentation and a Young Scientist Award if eligible, so it is critical to address all critiques from the reviewers to your best extent for you to secure an oral presentation and a Young Scientist Award if eligible, especially the implementation details and those questions listed by the three reviewers.

Author Feedback

We thank all reviewers for their insightful comments and favorable consideration (ranking 1 in each reviewer’s stack). The responses to reviewers’ questions are listed below.

To Reviewer #1 (Strong Accept): Q1: Restoration vs. reconstruction: Reconstruction takes an entire sample as input and its target is to output the input sample. Restoration takes a partially masked sample as the input and its target is to recover the masked portion of the sample based on unmasked part. Reconstruction can be seen as a special case of restoration with a 0% masking ratio. Restoration is context-aware and captures temporal dependencies, which is crucial for time-series analysis. As shown in Table 5, reconstruction performances worse than most restoration settings (80.2% vs. 82.9%-86.0%).

To Reviewer #2 (Weak Accept): Q1: Missing Details: a) About TGM: TGM applies an average window (size: 10, stride: 1) for signal smoothing and subsequent subtraction of adjacent smoothed values (Sec.2.1, Fig.1). This aggressive averaging removes morphological details, retaining trend information reflecting signal changes within a sampling interval. b) Competing methods: Representative SOTA methods were selected for comparison, including reconstruction-based [1, 18, 25], GAN-based [11], and SSL-based [29] approaches. To our best knowledge, this is the first restoration-based ECG AD method considering multi-scale ECG characteristics. c) Mask used: The ECG signal is multiplied by a binary temporal mask, setting the respective values in masked regions to zero. d) Detailed analysis of each patient’s result: Fig.2 visualizes localization results for 6 individual patients with different heart diseases, compared with a SOTA approach, analyzing the methods in localizing various anomaly types. e) Ablation setting: When none of the components were used, the method becomes a ECG reconstruction approach with a naive L2 loss and lacks cross-attention in multi-scale data. Q2: Necessity of uncertainty values: Uncertainty values assess restoration difficulty and reflect noise in the signal, such as muscle tremors or device electrical disturbances, which are unrelated to anomalies. Noise-affected regions exhibit high restoration errors, requiring their anomaly scores to be weakened (Eq.3). The model trained on normal data may exhibit over-confidence in predicting uncertainties for anomaly regions, resulting in high restoration errors and low (instead of high) uncertainty values [12]. Thus, uncertainty cannot be used to evaluate anomalies, contradicting the reviewer’s intuition. Q3: About TGM and L_trend: Referring to Q1(a) for details, TGM applies a smoothing process that filters out morphological details and a subtraction operation that emphasizes the signal’s changing trend. This process effectively reduces noise within the smoothed trend data. Thus, there is no need to measure noise, eliminating the requirement for an uncertainty-aware loss for L_trend.

To Reviewer #3 (Accept): Q1: Effect of data variability: The impact of data variability on the performance is evident. Compared to MIT-BIH, PTB-XL, with more patients (8167 vs. 44) and wider age range (0-90 vs. 22-89), representing a more challenging task due to significant indivisual differences and greater sample variation, results in decreased performance (86.0% vs. 96.9%). Occasional misclassification of normal samples is expected in such challenging situations. Addressing data variability is crucial, which is why we proposed a large-scale benchmark to inspire further research. Q2: Validation across datasets: The suggestion is insightful but requires further research. Different ECG datasets collected using various devices exhibit significant differences in signal characteristics, such as the number of leads (conductive pads attached to the skin at various positions) and sampling frequency. These variations hold unique physical meanings, making it currently impractical to directly transfer knowledge between datasets.

back to top

Multi-scale Cross-restoration Framework for Electrocardiogram Anomaly Detection