Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Tongyi Luo, Jia Xiao, Chuncao Zhang, Siheng Chen, Yuan Tian, Guangjun Yu, Kang Dang, Xiaowei Ding

Abstract

To make the earlier medical intervention of infants’ cerebral palsy (CP), early diagnosis of brain damage is critical. Although general movements assessment(GMA) has shown promising results in early CP detection, it is laborious. Most existing works take videos as input to make fidgety movements(FMs) classification for the GMA automation. Those methods require a complete observation of videos and can not localize video frames containing normal FMs. Therefore we propose a novel approach named WO-GMA to perform FMs localization in the weakly supervised online setting. Infant body keypoints are first extracted as the inputs to WO-GMA. Then WO-GMA performs local spatio-temporal extraction followed by two network branches to generate pseudo clip labels and model online actions. With the clip-level pseudo labels, the action modeling branch learns to detect FMs in an online fashion. Experimental results on a dataset with 757 videos of different infants show that WO-GMA can get state-of-the-art video-level classification and clip-level detection results. Moreover, only the first 20% duration of the video is needed to get classification results as good as fully observed, implying a significantly shortened FMs diagnosis time.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_69

SharedIt: https://rdcu.be/cVRsB

Link to the code repository

https://github.com/scofiedluo/WO-GMA

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a method to classify and localize the fidgety movements of infants in videos. To this end, the authors propose a framework consisting of a local feature extraction module from human pose keypoints and two branches of clip-level pseudo labels generation and online action modeling. The proposed system shows an accuracy of 93.8 % in the collected dataset, which is 5.4 % higher than MS-G3D [1].

    Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 143–152 (2020)

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) Detailed and accurate mathematical modeling of the three introduced modules (local feature extracting, clip-level pseudo labels generating, and online action modeling) 2) Learning without frame-level annotation 3) Showing the same performance by reading 20% of the video as if the system had read the whole video

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) It relies heavily on Openpose performance. 2) The paper insists that inference with only 20% of the entire video is sufficient, but other methods also show similar results. 2) The dataset used is not disclosed, but detailed explanation is still lacking.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper presents a detailed description of the proposed method, and the implementation details of the training strategy are expressed in detail. However, many parameters in the design are missing, so re-implementation is difficult.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    It would be nice if Figure 2 shows the intermediate result for the STAM. Furthermore, it would be good to match and align the measurement points as well. It will be helpful to explain why the proposed structure is appropriate for infant video data.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is interesting, but the method description contains unclear parts, and the results are plausible except for the early classification (requiring shorter video).

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors responded to my questions and criticisms thoroughly and precisely. And the additional data explanation increased the reliability of the experimental conditions and results.



Review #2

  • Please describe the contribution of the paper

    The authors propose an FM online action detector for automated GMA analysis for early medical intervention for cerebral palsy in infants. The proposed MIL loss-based action detector is divided into two branches to enable online learning of the vertex fusion features of the extracted key points. The authors appropriately borrowed the ideas of weakly supervised learning and online action detection to train successful FM detectors. The authors have built their large-scale datasets for FM detection, but they will not be released to the public.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    As the authors argue, the development of automated FM action detectors for early medical interventions for cerebral palsy in infants is important. The authors designed a 2D key point estimation-based WO-GMA in a trainable form by appropriately borrowing the ideas of [7] and [19], which are used for natural vision. Although the authors did not evaluate the clinical effect of the FM action detector, the performance of the detector itself was adequately compared with other SOTA models.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The consequences of ablation are unclear. In the case of the branch and local feature extraction module proposed by the author, it seems meaningful in video-level classification but does not affect action detection. It is encouraging that FM detection has been attempted from the point of view of action detection, but the overall performance is low despite the single class of action target. Even if a proposed dataset is constructed in a very limited environment, scaling to action detection does not appear to be effective.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Regarding the reproducibility of the research, the authors planned to release the code and models, but there is no plan to disclose the dataset.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The ablation results do not sufficiently show the strength of the design of the proposed model. In particular, it seems that more diverse experimental designs and analyses are needed for action detection. In addition, a better evaluation would be possible if there was an in-depth analysis of the ablation results.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is the same as the answer to question 8. The ablation results do not sufficiently show the strength of the design of the proposed model. In particular, it seems that more diverse experimental designs and analyses are needed for action detection. In addition, if there was an in-depth analysis of the ablation result, a better evaluation would be possible.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors faithfully prepared their rebuttals. I have read the reviews thoroughly, and the comments of reviewers and the author’s responses are generally considered valid. I decided to keep my score.



Review #3

  • Please describe the contribution of the paper

    In paper, the authors have proposed a weakly supervised online action detection framework for infant movements. The proposed framework had been evaluated on 757 videos and can achieve promising results with only clip-level supervision. In addition, the proposed framework is of great importance of clinical use. Only the first 20% duration of the video is needed to get classification results as good as fully observed, suggesting a significant cut of diagnosis time.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The strengths of the paper are as follows.

    1. The application is novel. The applications for infant general movements are rare compared with segmentation of organs or image classifications. As a reviewer and a reader, I would like to read new applications which can broaden my horizon.
    2. The techniques used in this paper are reasonable. By using the weakly supervised method, the annotation cost can be cut and the whole diagnosis time could be accelerated.
    3. The paper is well-organized and easy to follow.
    4. The experimental results are convincing. The authors evaluated the proposed framework on a relatively large dataset. And the methods used for comparison are all recent works.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper also needs improvements. The weaknesses of the paper are listed below.

    1. The detailed information of the proposed method may need more elaboration. For example the size of the CNN used in clip-level pseudo labels generating branch.
    2. The proposed framework was only tested on an in-house dataset. The cross-dataset generalization would be an issue. If the proposed framework could be tested on more publicly available dataset it would be great.
    3. There are some typos. For example, on page 4, “in detial” should be “in detail”.
    4. How the clip-level annotations were generated and used should be presented.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility of the paper is not very high. Because the dataset used in this paper was not publicly available. Based on the “No Free Lunch Theorem for Machine Learning”, the proposed framework may not be able to perform well on different datasets as it performed on the in-house dataset.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The comments and suggestions from the reviewer are as follows.

    1. The detailed information of the proposed method should be provided.
    2. There are some typos. For instance, the authors used “Table 1” and “table 1” interchangeably in the manuscript. But the reviewer thinks they should be consistent with each other.
    3. How many male babies and female babies were in the dataset?
    4. The interpretation of the experimental results were too limited. The explained and analysis of the results should be enhanced.
    5. In the manuscript, the resolutions of the whole dataset are not the same. How did the authors solve this problem was not mentioned.
    6. The reason why the authors focus on F+ is not clear. The author may want to explain more.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presented a framework based on weekly supervised learning method, which can solve the data scarcity problem in medical image processing. The authors had demonstrated clinical feasibility. The problem solved in this manuscript is of great clinical importance. The techniques used in this manuscript are reasonable. The manuscript is well-organized and easy to follow.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper received mixed reviews. Although the reviewers argue in support of the paper, they raise that the method description contains unclear parts. They also note insignificant improvements added in the ablation study and generally better evaluation strategies.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8




Author Feedback

We thank the reviewers for acknowledging that the application is novel or important (R3&R2), the method is reasonable or interesting (R3&R1), the experimental results are convincing or adequate(R3&R2), and the paper is well-organized (R3). To R1:

  1. Similar results with other methods in Fig.2. We would like to emphasize that other methods are designed for offline classification and thus can only classify sequences with the same length during training. However, our model can infer variable lengths of skeleton sequences and get both classification and detection results. Inference can halt when enough F+ have been detected or the diagnosis time is long enough. Besides, the similar curve trends in Fig.2 show the features of a uniform distribution over time of FMs.
  2. Model relies heavily on Openpose performance. We tried a different pose estimation model fine-tuned on infant images. WO-GMA gets slightly better results compared to the original results(mean(std) accuracy:94.0(1.2)VS93.8(1.0), F1:94.4(1.2)VS94.4(0.9)), showing the robustness of our pipeline in handling different pose estimation results. Keypoints estimation is a pre-task that is not the focus of this paper. Since the input skeleton sequence of all models in Table 1 is the same, comparisons between models are fair. To R2&R3&MR:
  3. Action detection evaluation strategy in ablation and further analysis. In Table 1, WO-GMA achieves significant improvement, showing the feasibility of online action detection in GMA. To better illustrate the influence of different modules, we will plot three more curves of the same infant in Fig.3 regarding ablation to visualize the performance. These curves show that the detection action instances are fragmented without long-range information, which is unsuitable for detecting continuous FMs. Without local feature extraction, the detection score is not as confident as WO-GMA. Without pseudo, the model may ignore the gap between intermittent FMs, which the long-range information in CPGB will make up. To further analyze the above influence, we infer the models reported in Table 2 with a metric in the code of [Zhang et.al(2022). ActionFormer: Localizing Moments of Actions with Transformers.arXiv:2202.07925.], which focuses more on the IoU threshold than confidence score, uses voc2012 rather than voc2007. The average mAP for each row in Table 2 is 16.4, 18.6, 4.4, and 17.7, and the number of prediction instances in the test set is 260, 296, 1000, and 420. The new mAP drops significantly without long-range information(4.4), brought by the fragmented detection result(1000 instances). Without local information, mAP increase is caused by less emphasis on confidence. Without pseudo(16.4), the mAP is lower than WO-GMA when the IoU threshold is higher(0.4, 0.5). We will update the detection results and add more analysis with the space get by simplifying the numerical change statement in ablation.
  4. Action detection performance. As stated in the last paragraph of section3.1(page 7), FMs detection is harder than some non-medical action detection applications.
  5. Experiments on model generalization. There is no real-world public dataset except 12 animated videos. We are collecting more data to conduct generalization experiments in the future. To R123:
  6. Method details. We will release source codes with all the parameters and add key parameters in Fig.1.
  7. More dataset details. The dataset contains 434 male and 323 female infants. The eligible participants were high-risk (premature birth, low birth weight, suspected or brain injury, birth with chronic disease, genetic or genetic metabolic disease). Videos without parental permission will not be included. Video recording was conducted according to standard protocol [5]. We will add this information by simplifying some model descriptions in the introduction that overlap with the methodology. We apologize for not replying to all comments due to limited space. We will fix typos and other minor problems.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal has been successful, and the reviewers unanimously agree that the paper should be accepted. The authors are encouraged to address the comments in the final version.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    10



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The strength is that the weakly supervised method seems could detect infant movements, and extensive experiments were provided. Some details were responded in the rebuttal. Acceptance is recommended.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents an interesting weakly supervised method to detect infant movements. All initial concerns have been well addressed in the rebuttal, and all reviewers reached a consensus in acceptance post-rebuttal.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



back to top