Authors

Tingting Chen, Yi Cheng, Jinhong Wang, Zhaoxia Yang, Wenhao Zheng, Danny Z. Chen, Jian Wu

Abstract

Effective approaches for accurately predicting the developmental potential of embryos and selecting suitable embryos for blastocyst culture are critically needed. Many deep learning (DL) based methods for time-lapse monitoring (TLM) videos have been proposed to tackle this problem. Although fruitful, these methods are either ineffective when processing long TLM videos, or need extra annotations to determine the morphokinetics parameters of embryos. In this paper, we propose Adaptive Key Frame Selection (AdaKFS), a new framework that adaptively selects informative frames on per-input basis to predict blastocyst formation using TLM videos at the cleavage stage on day 3. For each time step, a policy network decides whether to use or skip the current frame. Further, a prediction network generates prediction using the morphokinetics features of the selected frames. We efficiently train and enhance the frame selection process by using a Gumbel-Softmax sampling approach and a reward function, respectively. Comprehensive experiments on a large TLM video dataset verify the performance superiority of our new method over state-of-the-art methods.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_43

SharedIt: https://rdcu.be/cVRwu

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes a deep learning method to predict the viability for IVF of an embryo, based on microscope video timelapses spanning 3 days, and with groundtruth taken as an embryologist evaluation at day 5/6. The base structure of the method is a spatial-temporal CNN+LSTM with additional positional encoded features. The core novel contribution is a selection mechanism that decides which frames are relevant for the classification task, which shows an improvement in classification when compared to using all frames or other selection methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is clearly written. The frame selection mechanism is interesting and can be potentially useful in other relevant MICCAI topics (e. g. surgical workflow segmentation / action recognition).
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The main weakness is the limited analysis of the experimental results, and in particular it would be useful to expand on the frame selection results. (also refer to detailed comments). More qualitative results could be provided as supplemental material, including failure cases and interpretation of the corresponing selected frames.

Additionally, there are no comments on limitations of the proposed method, current limiting challenges, and future research. Conclusions could be improved by incorporating such comments rather than just summarising the paper.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- The authors state they will release the code. Also, the description of the method is clear and simple enough to replicate the proposed architecture and training methodology.
- The authors state they will release data in the reproducibility form. This would be great as to my knowledge there is no equivalent public, open data.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- The authors mention “kinetic parameters” as a data input, however, this is never explicitly described/defined. Is this just a frame index / timestamp? Are these manual annotations, or are they automatically extracted? If manual, then it means the method is not fully automatic which is not clear in the current submission. It would be important to clarify.
- Table 1: Could you report the number of selected frames for competing methods? At least one of them is mentioned to do frame selection.
- Table 2: I do not understand LSTM having #SF=32. From how the dataset is described, I would expect the number of frames to be in the thousands (every 5 min, over 3 days), so why is LSTM only using 32? Is there any subsampling?
- Fig. 3: Images have visible person names on the top. I don’t know if this is any sensible information, should it be hidden?
- Fig. 3: Could you display which frames (index/time) are being selected?
- Qualitative evaluation: “AdaKFS can select a small number of informative frames of different embryo development stages, such as 2-5 cell and 8-cell stages”. A more in-depth analysis of selected frames would be useful: showing more results, including failure cases (in suppl material). Additionally, it would be useful to have more quantitative statistics of specific events that are detected (i understand this is a lot of work though).
- Can embryologists make an accurate prediction based on selected frames? I wonder if the selection mechanism could be useful just by itself and make embryo analysis faster / more convenient. Any comments?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

My overall impression of the paper is positive. The selection mechanism is clearly explained and could have potential application in other video analysis problems targeted by MICCAI. I need a rebuttal to form a stronger opinion.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The paper proposes to adaptively select informative frames from the TLM video, to improve the blastocyst classification. A policy network with Gumbel-Softmax sampling approach and a reward function are developed for frame selection. Experimental results on the in-house dataset show the effectiveness of proposed method, outperforming the SOTA across various evaluation metrics.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Novel method about key frame selection for improving the automatic blastocyst formation.
- Promising results on the in-house dataset are achieved.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Some statements and experimental settings should be clarified
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Not good, as they evaluate the method on their in-house dataset, while do not mention the release plan of both dataset and code
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- Do you plan to release the dataset as well as the code?
- The definition of morphokinetics parameter should be clarified.
- There are several statements which are imprecise: e.g., ‘using a small number of frames and their morphokinetic parameters, without any extra annotations’ in method. However, the method still requires the classification annotation for training model. The determiners should be added.
- What are the dimensions of f_t and k_t?
- It seems that the numbers of selected frames for different sequences are different with the proposed method. It is better to illustrate the range of the total length of different sequences, as well as the range of the numbers of selected frames.
- In the ablation study, how to determine whether the frame are selected or passed when only using policy network?
- It is interesting to see the separate efficacy of kinetics features. The ablation using kinetics with only LSTM could be added.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Interesting method with promising results
Number of papers in your stack

6
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

In this paper, the authors proposed a novel deep learning framework to adaptively select informative frames for blastocyst formation prediction using time-lapse monitoring (TLM) videos on D3.

Extensive experiments were conducted on a large TLM video dataset to evaluate the proposed method; experimental results demonstrated its superiority over the latest state-of-the-art methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The proposed framework is novel. It addresses two issues in existing methods: input videos need to be manually annotated; redundant and irrelevant information in TLM videos can overwhelm the informative ones.
2. Extensive experiments on a large-size dataset were conducted, which includes comparison with state-of-the-art methods and ablation studies, to systematically evaluate the proposed method.
3. The manuscript writing is very good and clear.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

It would be nicei if the authors could discuss about potential limitations about the proposed method.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Dataset and codes are not publicly available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
1. Although the authors implicitly indicate how the image features ft and kinetic features kt are generated in the framework shown in Figure 2, it would be better if the authors could explicitly mention that in the method section.
2. How did the authors tune the hyper-parameters for different methods in experiments?
3. Page 6, ‘We uniformly sample T = 32 frames …‘. Was the same sampling process performed in the testing period? Will the sampling rate affect the prediction performance?
4. Page 6, last line: The authors may want to cite a reference for ‘ImageNet-pretrained weights‘.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed framework is novel. It addresses two issues in existing methods: input videos need to be manually annotated; redundant and irrelevant information in TLM videos can overwhelm the informative ones. Extensive experiments on a large-size dataset were conducted, which includes comparison with state-of-the-art methods and ablation studies, to systematically evaluate the proposed method. The manuscript writing is very good and clear.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper describes a deep learning method that uses adaptive selection of informative frames for blastocyst formation prediction using time-lapse monitoring (TLM) videos . The paper seems well written and results are very promising. HOwever the reviewers have a few concerns. Please address these, especially the questions about why the experimental validation is sufficient, release of the dataset, and limitations of the proposed method
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

5

Author Feedback

Common Questions: Morphokinetics parameters? (1) The morphological images in the video can be considered as morphological parameters, and the kinetic parameters are the time durations from fertilization to various embryo development stages. We gave this definition in lines 12-14, Para 3, Sect. 1. (2) We use the frame index as the kinetic parameters, and we describe it in the caption of Fig.2. As the images are captured every 5min, the exact kinetic parameters should be [frame index * 5]. But, we find using the frame index or *5 is no difference. Thus, we encode the frame index as the kinetic features. The kinetic parameters are not manually extracted. (3) The dimensions of f_t and k_t are 2048, and we gave how they are generated in the “Implementation Details” subsection. We will improve the description of related parts to make them clearer.

Input and selected frames: (1) The number of frames for a TLM video is about 700 ~ 900, which is too large to input to the model. Thus, for each video, we uniformly sample 32 frames as input, and this operation is also widely used in many other video analysis methods. We stated this in lines 5-6, Para 3, Sect. 3. All our methods use sampled 32 frames as input, not only the LSTM. (2) The sampling rate may affect the results. We tried to sample 16, 32, and 64 frames, and using 32 frames performs the best (64 frames are too large, but 16 frames are not enough and have low performance). Moreover, we tried different sampling schemes (uniform/random/weighted sampling), but there are no much differences among them. (3) The testing period also has the same sampling process, that is, all our inputs (in training, validation and testing) are uniformly sampled 32 frames from each video. The difference is that the sampled 32 frames in training set are not fixed in each training iteration, and we set a random offset, which improve the robustness of our model. (4) The numbers of selected frames are ranging from 5 to 8. We will improve the dataset description.

To R1: Selected frames of other methods? [14] used 600 frames in its temporal model and 35 frames in the spatial model. [24] opted 3 focal point images to form a single ‘RGB’ image (not video). We will report these numbers in the revised version.

Embryologists make prediction? This needs to be further explored whether embryologists can make accurate predictions using those selected frames. According to the results of embryologists (the true labels) and clinical diagnosis practice, it is the model itself that learns which frames are useful for the blastocyst formation prediction. As Fig.3 shows, the selected frames are indeed informative frames including different embryo development stages or key frames with abnormal cleavage.

Thanks for your nice comment. We will hide the text information on the top of images, and display the index of selected frames. We will also give more analysis on the selected frames (including failure cases) in the revised version, and add it to the suppl material (if permitted).

To R2: Extra annotations? We stated in lines 23-17, Para 3, Sect. 1 that known methods need extra annotations (such as cell stages) to include the morphokinetic parameters. Our method is fully automated, only using the blastocyst or nonblastocyst label, and introduce morphokinetic parameters without any other annotations. We will make the statements clearer.

Frame selection only using policy network? The policy network directly determines whether to skip or use a frame, training with Gumbel-Softmax sampling. The only supervision is the final classification label.

To R3: Hyper-parameters? (1) For the known methods, we use the parameter setups in the original papers. (2) For the ablation study, the hyper-parameters are determined empirically and experimentally. We empirically choose several feasible values for one parameter, and then find the best one through experiments. Once determined, this parameter will used in all the ablation methods.

back to top

Automating Blastocyst Formation and Quality Prediction in Time-Lapse Imaging with Adaptive Key Frame Selection