Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Miaomiao Cai, Mingxing Li, Zhiwei Xiong, Pengju Zhao, Enyao Li, Jiulai Tang

Abstract

Autism spectrum disorder (ASD) is one of the most common neurodevelopmental disorders, which impairs the communication and interaction ability of patients. Intensive intervention in early ASD can effectively improve symptoms, so the diagnosis of ASD children receives significant attention. However, clinical assessment relies on experienced diagnosticians, which makes the diagnosis of ASD children difficult to popularize, especially in remote areas. In this paper, we propose a simple yet effective pipeline to diagnose ASD children, which comprises a convenient and fast strategy of video acquisition and an advanced deep learning framework. In our framework, firstly, we extract sufficient head-related features from the collected videos by a generic toolbox. Secondly, we propose a head-related characteristic (HRC) attention mechanism to select the most discriminative disease-related features adaptively. Finally, a convolutional neural network is used to diagnose ASD children by exploring the temporal information from the selected features. We also build a video dataset based on our strategy of video acquisition that contains 82 children to verify the effectiveness of the proposed pipeline. Experiments on this dataset show that our deep learning framework achieves a superior performance of ASD children diagnosis. The code and dataset will be available at https://github.com/xiaotaiyangcmm/DASD.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_42

SharedIt: https://rdcu.be/cVRwt

Link to the code repository

https://github.com/xiaotaiyangcmm/DASD

Link to the dataset(s)

https://github.com/xiaotaiyangcmm/DASD

Reviews

Review #3

Please describe the contribution of the paper

This work is a dataset release paper with an accompanying model which performs surprisingly well on such a small dataset. The work also suggests a strategy for such dataset collections.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The collected dataset is valuabel for the ML community.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

While the input to the model has a temporal dimension (i.e. it’s a video), the architecture is not designed to take the variable length of videos into accont (e.g. using a recurrent model). Instead, they have chosen to subsample the video frames to a static size of N.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The dataset and code will be released; all good.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Please avoid using vague and self-gratifying terms (such as “advanced deep learning framework”, where the term advance is not well-defined).
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The work introduces and releases a new dataset with accompanying model. They do a thorough job of analysing the model and its performance. They also switch different components in/out of the architecture to assess their impact.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

3
Reviewer confidence

Somewhat Confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #4

Please describe the contribution of the paper

The paper presented a method for ASD diagnosis in children, based on the analysis of videos and extracting head-related information. The proposed method was evaluated using a unique dataset, which will be made publicly available as the authors promised. The experimental results showed superior performance as compared to baseline approaches.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper has a sound experimental setup and strong evaluation. Another major contribution of the paper is the dataset, which will be made publicly available.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The paper lacks technical novelty as it utilizes existing methods or software (OpenFace) to extract the features followed by simple approach for reduction and classification using regular CNN. But the paper presented strong evaluation.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

OK
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
The paper lacks technical novelty as it utilizes existing methods or software (Open Face) to extract the features followed by simple approach for reduction and classification using regular CNN. But the paper presented strong evaluation. The paper has a sound experimental setup and strong evaluation. Another major contribution of the paper is the dataset, which will be made publicly available. The fact that the dataset will be made public is a huge PLUS. The research community is of a great need for such datasets.

I just have few comments:
1. why the faces are blurred in the figure? Due to human protection agreement? If this is the case, how will you share the dataset publicly? Can you provide more information about dataset accessibility condition and agreement?
2. Although the paper is well-written, there are several typos here and there. Please do several rounds of proofreading before final submission.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Although the paper lacks technical novelty, it has sound experimental setup and strong evaluation.
Number of papers in your stack

7
What is the ranking of this paper in your review stack?

3
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #5

Please describe the contribution of the paper

This paper aims to ASD detection on videos. To do that, a new dataset is collected. It describes the video acquisition strategy in detail. It also designs a new pipeline to detect ASD. Specifically, openface features are extracted from a small segment of videos. A designed HRC module is to enhance features. Finally, scores from different segments are aggregated to obtain a final result. It also has ablation studies to verify the effectiveness of the proposed module.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is well organized
- It collected a dataset for ASD detection, and claims that dataset will be released.
- A new pipeline is proposed for detection. It exploits openface features and introduces HRC attention.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- It says the frames are deleted if the landmark detection confidence is less than 0.75. If that, how to keep the number of consecutive frames N in each snippet. Does it use the empty frame or all-zero landmark feature?
- It seems there is no description on details of the dataset. How many videos are included in the dataset? Is it the same as the number of children? If not, are videos of the same child included in both training and test split?
- It would be better to have an ablation study on the number of segments.
- As the child with ASD may(not) have response to different moving directions, is it possible that max operation would have a better performance to aggregate the score from different segments?
- I feel the technical novelty is marginal. Extracting face features for ASD is explored in previous works. HRC attention module is very similar to channel attention in SENet.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Details of the framework/experiments are provided in the paper. If the dataset will be released, it would be enough for reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- There is a previous work[1] that also does video-based autism detection. It would be better to discuss the relationship.
[1] Machine Learning Based Autism Spectrum Disorder Detection from Videos
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

4
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This work proposes a new dataset and pipeline for video based ASD detection. I think they are helpful for the community. However, the technical novelty is marginal. I prefer the weak reject as my rating at the current stage.
Number of papers in your stack

6
What is the ranking of this paper in your review stack?

5
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The study proposes an interesting framework for video diagnosis of ASD. According to reviewer suggestions, temporal information could be considered to further improve the performance. Variable length of videos is not supported for now, but that may not be an issue if data acquisition is consistent.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

3

Author Feedback

We thank the reviewers for their constructive comments. The major concerns are addressed as follows. We will incorporate them in the camera-ready version as necessary.

Meta: Variable length of videos is not supported for now. Reply: In Section 2.2, we design a temporal frame subsampling method to adapt videos of arbitrary length. Therefore, our method can support variable lengths of videos.

Reviews 5: How to keep the number of consecutive frames N in each snippet if delete the frames whose landmark detection confidence is less than 0.75? Reply: We first delete the frames whose landmark detection confidence is less than 0.75 and then divide the remaining frames into T segments. Subsequently, we sample N consecutive frames as a snippet in each segment.

Reviews 5: There is no description on the details of the dataset. Are videos of the same child included in both training and test split? Reply: We show the details of the dataset in Table 2 of the supplementary material. Our dataset has a total of 82 videos provided by 79 children, where 3 children provide two videos. Note that during cross-validation, two videos from the same child are assigned to different subsets respectively. Therefore, during a certain round of cross-validation, we may meet two situations as follows: (1) two videos from the same child are simultaneously packed into the training set or validation set, and (2) two videos from the same child are adopted by the validation set and training set, respectively.

Reviews 5: There is no ablation study on the number of segments. Reply: We further conduct the ablation experiments on the number of segments. Considering the minimum number of frames in the dataset is 167 and the number of frames in each snippet is set to 32, the maximum number of segments is [167/32] = 5. In the case of keeping other parameters unchanged, we take the number of segments as 1, 2, 3, 4, and 5, respectively, the accuracy of our framework based on different numbers of segments are 89.07, 91.40, 92.59, 95.06, 92.59 (%) respectively and the F1 score are 82.50, 86.48, 88.07, 91.90, 88.24 (%) respectively. The results show that our framework achieves the best performance when the number of segments is set to 4.

Reviews 5: Is it possible that max operation would better perform to aggregate the score from different segments? Reply: During the experiment, we observe that Step A and Step B in the strategy of video acquisition occupy most of the acquisition time. In addition, comparing Step A and Step B, it can be found that the child’s movement is the same except for the difference in direction. Therefore, the interactive ability demonstrated by the child is uniform throughout the process of video acquisition. We can conclude that the average operation would have a better performance to aggregate the score from different segments.

back to top

An Advanced Deep Learning Framework for Video-based Diagnosis of ASD