Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Ali Mottaghi, Aidean Sharghi, Serena Yeung, Omid Mohareri

Abstract

Automatic surgical activity recognition enables more intelligent surgical devices and a more efficient workflow. Integration of such technology in new operating rooms has the potential to improve care delivery to patients and decrease costs. Recent works have achieved a promising performance on surgical activity recognition; however, the lack of generalizability of these models is one of the critical barriers to the wide-scale adoption of this technology. In this work, we study the generalizability of surgical activity recognition models across operating rooms. We propose a new domain adaptation method to improve the performance of the surgical activity recognition model in a new operating room for which we only have unlabeled videos. Our approach generates pseudo labels for unlabeled video clips that it is confident about and trains the model on the augmented version of the clips. We extend our method to a semi-supervised domain adaptation setting where a small portion of the target domain is also labeled. In our experiments, our proposed method consistently outperforms the baselines on a dataset of more than 480 long surgical videos collected from two operating rooms.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16449-1_51

SharedIt: https://rdcu.be/cVRXp

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a domain adaptation method for surgical action recognition.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed pseudo label sampling is somewhat helpful;
    2. experimental results are good;
    3. presentation is clear and well-organized.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The technical novelty is weak;
    2. Code is not available while the reproducibility responses are checked;
    3. Only one dataset is used.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code is not available while the reproducibility responses are checked.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The paper is an incremental work for AdaMatch which is properly cited. Compared with AdaMatch, the contributions are 1) queued predictions; 2) video level augmentation; and 3) pseudo label sampling. However, the authors also remove the random logit interpolation from AdaMatch.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The first and second contributions are clearly trivial and straightforward. As for the 3rd contribution, pseudo label sampling, it is clearly functionally similar to distribution alignment which is proposed by AdaMatch. While the authors compared their method with AdaMatch in Table 2, I still encourage the authors to provide ablation studies on which one (distribution alignment vs pseudo label sampling) is the key.

    2. Morevoer, I suspect that the slight disadvantages of AdaMatch compared with the proposed method may be attributed to the prediction cache (1st contribution). The authors are encouraged to also apply this trick to AdaMatch to fully justify the effectiveness of Pseudo Label sampling.

    Overall, the technical novelities of this paper are weak and the experimental results do not properly justify the necessarity of the proposed pseudo label sampling.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    The authors have proposed a method to overcome the lack of generalization problem on models trained to recognize the surgical activity across different operating rooms. The authors propose to adapt the model that was originally trained on the source domain by predicting pseudo-labels on the target domain and using them to retrain the target model using augmented versions of the pseudo-labeled clips. The pseudo-labels are generated on the target domain based on the most confident labels from the source domain. The level of confidence is based on a threshold that is determined for each class during the model training on the source domain by taking into account the class imbalance. The authors evaluated the domain adaptation strategy by only providing labels in the source domain (unsupervised) and by also providing labels to a small portion of the samples in the target domain (semi-supervised). Both non temporal and temporal features are used for the surgical activity recognition.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper presents several strengths, the state-of-the-art is complete, the method is well explained, the validation compare to state-of-the-art methods and an ablation study is done

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weakness of the paper is the data description. There is no information about the type of surgeries and the annotation protocol

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The information provided will allow the reproducibility of this paper. The release of the dataset will be a plus to allow further comparison.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The description of the dataset could be improved. For example, there is no information about the 28 types of surgery. Did their all come for the same specialty, as gynecology or digestive? What are the 10 phases and how are they defined? How many observers have annotated the data? How the data were merged if there are several annotations for the same surgery? What are the instructions given to the observers?
    2. The authors did not discuss the limitation of their model. Did some phases are more difficult to recognize than others?
    3. The authors talked about the execution time of their model (supplementary material) but not of other models. Are these execution times similar? and if not, does the performance increase justify the use of the proposed method?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    8

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The weaknesses of this paper are minors, it is very clear and provide all information to understand and reproduce the work. Moreover, it addresses an important issue in workflow recognition methods: the lack of generalization.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    8

  • [Post rebuttal] Please justify your decision

    Majority of the issues has been clarified by the authors.



Review #3

  • Please describe the contribution of the paper

    This paper tackled the problem of phase recognition from external ceiling mounted cameras. The proposed approach wanted to address the problem of generalization from one operating room to another one. They have explored the use of both unlabelled data and labelled data from the target OR.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • generated a big dataset
    • proposed a model for domain adaptation
    • well written
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • limited technical contribution
    • limited information on the dataset, like procedure, video duration, clinical team, etc
    • not sure how balanced the dataset is and only reporting accuracy and mAP might not be enough. I would expect to see precision, recall and F1 score
    • experiment parameters are given in the supplementary material and not sure if it will be published with the paper
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Experiment parameters are given in the Supp Materials, not sure if it will be published with the paper. The model trained and assessed on private dataset make it difficult to reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    the paper misses information on the dataset and we need to see other evaluation metrics that are often used in the literature.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Limited technical contribution, lack of information on the dataset and would like to see more metrics.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    3

  • [Post rebuttal] Please justify your decision

    Technical contribution is limited and assessed on a private dataset.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper presents a semi-supervised domain adaptation method for surgical activity recognition of OR data. The approach is validated on a dataset of 480 surgical videos from two OR settings and compared to baseline approaches, demonstrating improved generalizability of surgical activity recognition across operating rooms with distinct environments.

    The main criticisms of the work are related to limited technical novelty and contribution, experimental results not justifying the methodology, dataset description/limitations, and concerns around reproducibility. The following points should be addressed in the rebuttal:

    • Clarification and further explanations on the main technical contributions of the paper, and how they are justified by the experimental results.
    • Missing details/information regarding the dataset, and how balanced it is
    • Missing details regarding experimental parameters and the model, and its limitations
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7




Author Feedback

We thank reviewers for the constructive feedback on our work. We first address the question on technical novelty, and then additional reviewer comments below.

  • Technical Novelty (R1, R3) We would first like to clarify R1 and R3’s concerns with respect to technical novelty. Specifically, our work is the first to study semi-supervised domain adaptation in untrimmed video action recognition, which is a significantly more challenging task than prior works (e.g. AdaMatch) which addressed only image classification. Our setting of untrimmed video action recognition (1) requires handling video inputs, which limits the possible batch size (4 or 8) and ability to estimate prediction score distributions as needed by prior image classification works (1024); and (2) requires handling highly imbalanced data, which causes model collapse when using the pseudo labeling strategies in prior image classification works. Our method addresses the first challenge by introducing the use of prediction queues to maintain better estimates of score distributions; and the second challenge through pretraining and sampling strategies of both video clips and pseudo labels to prevent model collapse. We also present effective video augmentation strategies for this problem setting. We argue that demonstrating how to effectively perform semi-supervised domain adaptation in long untrimmed videos is an important contribution, and significantly advances the field compared to previous works focusing only on image classification. Furthermore, our work significantly advances capabilities for surgical workflow analysis in long videos, which is a new and emerging domain area that is of high interest to the MICCAI community. We hope that this clarifies the concerns and that R1 and R3 would be willing to revise their rating in light of this. We will also revise the paper to more clearly articulate these contributions.

  • Reproducibility (R1, R2) We have included architecture, and implementation details in the experiment section and supplementary. TimeSformer model is used as our backbone and its hyperparameters are borrowed from the original paper [2]. We further analyze the hyperparameters and discuss the GRU model in supplementary.

  • Dataset (R2, R3) Our dataset is captured in a medical facility with two operating rooms equipped with the da Vinci Xi surgical system. Our 484 videos include 28 types of procedures performed by 16 surgeons/teams. These videos are on average 2 hours long and are individually annotated with 10 clinically significant activity classes needed to measure OR workflow efficiency. Our classes are highly imbalanced, Patient Prep contains 10 times more frames than Robot Docking class. We will add this to the paper.

  • R1 (Relative importance of contributions) We clarify that in table 2 we have included ablation studies showing the relative importance of both components as requested. Specifically, using only sampling pseudo labels gives mAP of 80.78, and using only distribution alignment gives mAP of 82.13, while using both results in mAP of 83.71.

  • R2 (Model limitations) Please refer to the section on dataset for further information. Since our dataset is unbalanced, it is harder to recognize minority classes compared to dominant ones. We use pseudo label sampling to overcome it.

  • R3 (Evaluation metric) We clarify that we report results both for a solely clip-based model as well as for the full model. For the clip-based model, we only want to assess how meaningful the features are, so we evaluate it in a completely balanced setting and report only accuracy. For the full model, this is imbalanced due to the untrimmed setting, so we report mAP. This is standard for untrimmed activity recognition (also used in “Temporal Convolutional Networks for Action …” Lea et. al. “Hollywood in Homes …” Sigurdsson et. al.) and summarize the precision-recall trade-off information requested. We add a full precision-recall curve in supplementary




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The topic of workflow recognition model generalization is of interest to the CAI community, the paper is well-motivated, and validation is performed on a large dataset (including performance comparison to SOTA and an ablation study). While some of the concerns from the reviewers regarding data description and missing details/information regarding the dataset have been addressed by the rebuttal, other concerns of the reviewers regarding technical novelty and contribution still remain. lw: could be accepted considering the topic. I thought the rebuttal did a decent job in resopnding to technical innovation and evaluation (data)

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    13



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After carefully reviewing the paper, reviewers, MRs and rebuttal, an accept is recommended. The paper still has a few weaknesses but the problem addressed is important as activity recognition is performed on untrimmed videos.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    10



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have not addressed adequately all of the reviewers’ comments in their rebuttal as there are still concerns about the technical contribution of this work.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



Meta-review #4

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Although the majority voting amongst ACs resulted in “reject”, there was a large discrepancy between reviewers regarding the rank and justification of their decision. Therefore, the PCs assessed the paper reviews including meta-reviews, the rebuttal and the submission. Although the novelty of the technical contribution was the main criticism, the paper has merits including a novel application and a profound evaluation. The topic of surgical activity recognition across operating rooms is interesting and unexplored in existing literature. Though the technical contribution is not strongly justified in terms of domain adaptation, studying model generalizability on surgical video for task of activity recognition is important. Further considering the dataset is large (484 full-length surgical videos), standardized (daVinci Xi surgical systems) and representative (collected from 16 surgeons), experimental validations of surgical video at this scale is impressive. Therefore, this paper has a high chance to generate discussions and inspirations in CAI. In summary, the PCs agree with the convincing arguments of the positive reviewer and AC, therefore the decision is accept.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



back to top