Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Deval Mehta, Shobi Sivathamboo, Hugh Simpson, Patrick Kwan, Terence O’Brien, Zongyuan Ge

Abstract

In this work, we contribute towards the development of video-based epileptic seizure classification by introducing a novel framework (SETR-PKD), which could achieve privacy-preserved early detection of seizures in videos. Specifically, our framework has two significant components - (1) It is built upon optical flow features extracted from the video of a seizure, which encodes the seizure motion semiotics while preserving the privacy of the patient; (2) It utilizes a transformer based progressive knowledge distillation, where the knowledge is gradually distilled from networks trained on a longer portion of video samples to the ones which will operate on shorter portions. Thus, our proposed framework addresses the limitations of the current approaches which compromise the privacy of the patients by directly operating on the RGB video of a seizure as well as impede real-time detection of a seizure by utilizing the full video sample to make a prediction. Our SETR-PKD framework could detect tonic-clonic seizures (TCSs) in a privacy-preserving manner with an accuracy of 83.9% while they are only half-way into their progression.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_21

SharedIt: https://rdcu.be/dnwGZ

Link to the code repository

https://github.com/DevD1092/seizure-detection

Link to the dataset(s)

https://github.com/DevD1092/seizure-detection


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces a pipeline to do video-based epileptic seizure classification. A transformer-based architecture is performanced to boost the accuracy. Instead of using RGB, it cleverly takes only optical flow to protect the privacy. Additionally, progressive knowledge distillation is designed to detect seizures more accurately and also do the early detection.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • To protect privacy, it is working on optical flow instead of RGB. The privacy-preserve is helpful for the patient.
    • It also proposed progressive knowledge distillation to exploits fewer input frames in the video
    • The paper is well-organized
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • For early detection, the purpose is to detect seizures with few samples so that we do not need to wait until the conpletion of the long seizure. To do that, the whole video is splited into k segments, and early task is to detect with the lowest possible (j) partial segment. However, I am wondering if there is a metric available to measure the lowest j? I only find the metric for the accuracy.
    • In Figure 2, I wonder what is the experiemnt setting for the baseline SETR with different input fractions? Is the SETR trained on the whole video frames, then the fraction of frames are direclty feed to the trained transformer for testing?
    • It would be better to provide more details. For eaxmple, how to do inference? Only the frames of one partial segment are feed to transformer, or all consecutive segments are computed one by one?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper has provided most of the details.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    It would be better to provide more details for the framework

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work provides a framework to detect eppileptic seizeure in a privacy-preserving manner. It could also do early detection with the progressive knowledge distillation. Although technically it is limited, I feel it is helpful for the community.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The concerns I raised are addressed in the rebuttal. Stick with the original positive ratings.



Review #2

  • Please describe the contribution of the paper

    This paper addresses the problem of epileptic seizure classification from video monitoring data. The paper proposes an algorithm that relies on optical flow (OF) from videos to preserve patients’ privacy for predicting the type of seizure during its progression for an early detection. The main contribution is an ensemble of techniques (transformer architecture for video analysis and capturing temporal dependencies + knowledge distillation) for seizure classification from videos. The approach is evaluated on two datasets with binary classification tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Well-motivated problem: using video data has advantages to monitor patients but associated challenges for automated processing, including privacy issues and real-time operation.
    • The choice of architectures for processing video processing seems adequate and well supported.
    • Overall, the paper is technically sound. The framework is clearly explained, providing detailed steps and mathematical equations. The evaluation includes comparisons to other methods and support for the main findings.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • One of the challenges addressed in the paper is the real-time detection of seizures. However, the paper does not consider the computation time of the OF or processing times of SETR and the sequential processing during progressive distillation. SETR blocks only take 64 frames, and the longest instance has ~11k input frames.
    • No ablation of the loss functions for the knowledge distillation approach. How was it decided how/where to distill the knowledge? KL or patch based or both.
    • Unclear claims in sections 3.3 and 3.4: 1) On Page 7, in the evaluation of early detection performance: “better performance retention compared to LSTM-based techniques with a reduction in the fraction of input sample”. How is performance retention quantified and compared across methods? This is hard to appreciate from the extensive results presented in Table 1. 2) From the plots on Fig. 2, higher gains in performance with direct distillation when the knowledge gap is small are only observed in the in-house dataset. The gains look similar over all input fractions for the GESTURES dataset.
    • Limited discussion of some results: 1) Fig.2, why is the purple line better than the yellow? Is any of the tasks harder? The task in the In-house dataset is seizure vs normal while in gestures is between two types of seizures.
      2) All methods show a classification improvement as more fraction of video sample is provided, which is expected. What’s a reasonable accuracy during early detection? Also, the performance metrics are close to 100% when using the entire videos, suggesting that the classification task is easy.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper provides details on the models used and the corresponding hyperparameter choices. Details on data sharing and data splits might be missing. The OF dataset can be released and the code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Missing descriptions in the methods: OF algorithm TV-L1, transfer learning from action recognition models trained on OF (which type of data?) What type of features were extracted from the RGB videos in GESTURES ? from a CNN, hand-crafted?
    • Confusion between number of segments (k) in knowledge distillation and input fraction of video sample for early detection evaluation as both refer to partial samples of a video. On a related note, the following was not fully clear to me. When using distillation: does it always start from the FULL video? Or from the FULL available video (e.g., 1/4, 1/2, 3/4)?
    • To evaluate early detection, the transition from preictal activity to the actual seizure can assist in the identification of a seizure. The instances used to train the algorithms were limited to seizure periods.
    • Elaborate on what is “motion semiotics of seizures”. Is OF enough to capture such variations?
    • Suggestion to complement the related work: expand on the descriptions of the methods included for comparison in Table 1.
    • Presentation of the results: which approaches in Table 1 are LSTM-based? What about EgoAKD or GESTURES? Table 1 has a lot of information, what is the meaning of red numbers? Worse or better?
    • Some acronyms are not introduced: ViT, OaDTR
    • Long sentence/hard to parse on page 5: “Directly distilling from a SETR block which has seen a …”
    • Overall, the notation in the equations was clear, but subindex j was already used to denote partial segments and I was confused when referring to the jth class in the KL loss.
    • Figure 2: use same scale for the graphs and why is accuracy reported and not precision, recall or F1 score?
    • In the introduction, an easier setup and no contact with the patient are mentioned as advantages of video monitoring for detecting seizures. However, in the current dataset EEG recordings were needed to define the ground truth.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes an interesting approach that combines existing methods to detect epileptic seizures from video data. Although there are some unclear points in the evaluation, the approach is validated and compared against other techniques, demonstrating the feasibility of using OF data and knowledge distillation to detect seizures with partial video samples.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The paper proposes a new approach for seizure classification from video that uses existing techniques for processing temporal data (transformers) and how to leverage knowledge from full observations. The rebuttal provided clarifications regarding some design choices (loss, number of segments) and the validation of the approach (accuracy as an outcome and inputs in the comparison methods). However, there are still some unclear aspects in the difference between the train input fraction and the inference procedure. Besides, since the current version of the paper does not include a trade-off analysis of computation times and performance or effects of subsampling, the claims related to real-time operation should be revisited.



Review #3

  • Please describe the contribution of the paper

    This paper introduces SETR-PKD, a novel framework for early detection of epileptic seizures in videos while preserving patient privacy. It uses optical flow features and transformer-based progressive knowledge distillation to overcome limitations of current methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper proposes a novel approach for early detection of seizures using optical flow, a modality that is privacy-preserving, and a custom feature extractor-transformer framework called SEizure TRansformer (SETR) block.

    2.The paper also proposes a progressive knowledge distillation method to achieve early detection from a fraction of the video sample.

    3.The paper is well-organized with interesting design of the seizure detection with privacy

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It would be better if there is more detailed analysis about how progressive knowledge distillation works and why.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    it would be better if there is code availability

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    see the weakness section

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    meaningful design

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper under consideration introduces a transformer-based pipeline for video-based epileptic seizure prediction while preserving patients’ privacy. The pipeline integrates optical flow (OF) features and progressive knowledge distillation. The main strength of the work is 1) the use of OF that is privacy-preserving, 2) use of progressive knowledge distillation method. The paper also is well-organized, literature work is up to date, and the idea is interesting for an important application and the experimental validation is somehow convincing (while missing ablation studies). There is a consensus among the reviewers about the merits of your work. A few comments and suggestions, related to technical details, and clarity, should be addressed in the camera-ready version. A few comments and suggestions, related to technical details, and clarity, should be addressed in the camera-ready version. For example, the authors need comment on the computational burden for real time application of OF (R2). Numbers in table 1 are too small, please devise a way to re-present Table 1. Also, conduct statistical analysis for the results presented to show the significance of the methods against the compared ones. The authors need to well-justify how the parameters of the pipeline (Section 3.2) were chosen and discuss the choice effect on the performance. Ad-hoc. empirical choices can provide good results but not the optimum. In addition to R2 comments on Figure 2 for performance gain, it is interesting to see the SETR-PKD results for k=8 for the GESTURES data sets. I also agree with R1 about detailing the experimental setting for the baseline SETR with different input fractions. Please also provide clarification of how the lowest segment to detect seizure is determined when both neurologists disagree on Ton.




Author Feedback

We appreciate the reviewers’ constructive feedback and will incorporate their suggestions and our responses into the revised paper. Our code and private dataset will be made available upon publication. [R1,R2,R3] Inference and computational load: The SETR block sequentially processes consecutive segments and computes the combined prediction, as correctly understood by R2. Real-time implementation of this approach is feasible due to these–1) In a real-time system, a temporal subsampling technique is applied, such as processing only one-tenth of the total frames [https://arxiv.org/abs/2004.09927], as consecutive frames are heavily redundant. 2) Our SETR block contains three encoder layers and consumes 0.34 GMACs only, which is equivalent to that of a mobilenetv2 [https://github.com/Lyken17/pytorch-OpCounter] 3) Real-time implementations of TV-L1 OF are able to achieve a 30 fps on a resolution of 360x240 [https://link.springer.com/chapter/10.1007/978-3-540-74936-3_22]. We will do a trade-off study between the above optimizations and their performance as an extension of this work. [R1,R2,R3] Lowest possible segment (j) and reasonable accuracy for early detection: For a full video, the classification task is easy since a complete temporal landscape of the motion is available. Thus, we make the task progressively challenging by dividing the video into segments (2,4,8,..). As the number of segments (k) increase, the size of each segment decreases, posing a greater challenge for early prediction, which is apparent in Table 1. Thus, the lowest possible segment (j) to classify a sample could be determined based on the accuracy performance only. According to the neurologists, an accuracy target of >95% when seizure progression is halfway is required to raise an alert. Hence, there is still a performance gap to be filled. [R1,R2,R3] Train input fractions for baseline and SETR-PKD: Our idea is to test a progressive distillation framework, where we leverage a model that has learned superior features. For a fair comparison, both baseline SETR and SETR-PKD are trained on corresponding video input fractions. However, SETR-PKD improves its features through learning from a series of sub-teacher models, starting with the full video-trained model. [R2,R3] Ablation of loss: We decided to use the conventional KL loss for logit-based distillation. Moreover, since patches capture features from individual frames, distilling features using MSE Loss would be more effective. To prove this, we conducted a study without MSE (results will be presented in the supp. of the revised version) which showed a lower performance (0.68/0.67/0.67 for SETR-PKD(k=8) on 1/4 input video to represent one result data point here) on our data. [R2] Performance retention of Transformers v/s LSTMs: Performance retention, as shown in Table 1, refers to the preservation of classification performance when reducing the input fraction of videos. In Table 1 of the revised paper, we will clearly indicate Transformer and LSTM based techniques. [R2] Is OF good enough: All techniques outperformed for GESTURES because RGB features (from TSNs too) are superior to those of OF. Despite this, we show the feasibility of detecting seizures v/s normal using OF. We also provide qualitative OF samples (Fig1 and supp.) of seizure motion semiotics, showing movements, stiffness, jerkiness, and convulsions. However, we acknowledge the limitations of OF features and plan to add joint data (body, hand and facial pose) to enhance our future work. [R2] Purple line and gains Fig.2: Purple line corresponds to the performance of direct distillation (from full video) whereas yellow line corresponds to the baseline SETR. Thus, the performance of direct distillation falls in between SETR-PKD and baseline. The gains in direct distillation are considerably low for lower input fractions, but become more uniform beyond 1/2 input fraction for GESTURES due to its superior RGB features compared to OF in our dataset.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have clearly addressed almost all the comments and provided supporting arguments and/or references. Although, the trade-off study between the above optimizations and their performance has not been addressed, it can be explored in another submission (extension) to other venues.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    There is decent technical novelty. A good connection with computer vision in terms of using optical flow to process video data. Rebuttal addressed most of the concerns from reviewers. One reviewer increased the score and the paper now has consistent scores at the acceptance level.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Good paper, good review, and strong rebuttal. An acceptance is highly recommended.



back to top