Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Jay N. Paranjape, Shameema Sikder, Vishal M. Patel, S. Swaroop Vedula

Abstract

Surgical tool presence detection is an important part of the intra-operative and post-operative analysis of a surgery. State-of-the-art models, which perform this task well on a particular dataset, however, perform poorly when tested on another dataset. This occurs due to a significant domain shift between the datasets resulting from the use of different tools, sensors, data resolution etc. In this paper, we highlight this domain shift in the commonly performed cataract surgery and propose a novel end-to-end Unsupervised Domain Adaptation (UDA) method called the Barlow Adaptor that addresses the problem of distribution shift without requiring any labels from another domain. In addition, we introduce a novel loss called the Barlow Feature Alignment Loss (BFAL) which aligns features across different domains while reducing redundancy and the need for higher batch sizes, thus improving cross-dataset performance. The use of BFAL is a novel approach to address the challenge of domain shift in cataract surgery data. Extensive experiments are conducted on two cataract surgery datasets and it is shown that the proposed method outperforms the state-of-the-art UDA methods by 6%. The code can be found at https://github.com/JayParanjape/Barlow-Adaptor

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_70

SharedIt: https://rdcu.be/dnwdR

Link to the code repository

https://github.com/JayParanjape/Barlow-Adaptor

Link to the dataset(s)

https://ieee-dataport.org/open-access/cataracts


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces a new Barlow Adaptor method for unsupervised domain adaptation (UDA). The driving application are videos of cataract surgery. The method outperforms state-of-the-art methods by 6%.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written and easy to follow. The validation presented in Table 3 supports the conclusions of the paper.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Overall, the improvements presented are nice but relatively small. The authors introduce a feature alignment metric called BFAL as part of the work. The conclusion states that this metric can be easily extended to other network layers.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors provide information about parameters used. They do not provide information about how to obtain the source code for the Barlow Adaptor/BFAL.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Please provide information about the software that you used. How was it written and is it available?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The complaints raised are relatively minor. Overall, this is a good paper in a niche application domain.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper leverages an unsupervised domain adaptation (UDA) method called Barlow Adaptor for lowering the domain shift in surgical tool classification for cataract surgeries. Barlow Adaptor incorporates CORAL loss and Barlow Feature Alignment Loss (BFAL) to learn class-aware, aligned as well as sparse feature representations between two domains. The results suggest an improvement over state-of-the-art UDA methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors suggest a new architecture named Barlow Adaptor using available tools in UDA such as CORAL loss and BFAL to promote sparse and unique features between source and target while only using source labels.
    • The designed architecture shows improvement with both CNN-based and transformer-based backbones on the task of cataract tool classification with domain adaptation between the D99 and CATARACTS datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper brings together multiple SOTA components while not offering any technologically novel components or modifying the available components w.r.t the task at hand.

    • The paper lacks proper motivation and related works w.r.t. cataract tool classification. Specifically considering the minimal application and methodological novelty, the method is only applied to a single task in cataract surgery while diverse set of tasks on public cataract surgery datasets are available, e.g. phase segmentation for cataract101 dataset* and CATARACTS, semantic segmentation for CaDIS**.

    • Although qualitative results suggests improvement in tool classification using the proposed method, the improvement is not consistent across all setups, e.g. for ViT in D99->CATARACTS the CORAL loss only setup works significantly better than the proposed setup.

    • There is no other comparison across domain adaptation benchmarks (even in other domains, e.g. CT->MRI) to highlight the impact of suggested work. Considering the usage of non-public dataset and limited set of results and evaluations, proper validation of the results is even more difficult.

    • As the suggested method promotes feature adaptation between the two domains, studies for analyzing the quality of feature adaptation would be crucial to justify the impact of Barlow adaptor.

    • Schoeffmann, K., Taschwer, M., Sarny, S., Münzer, B., Primus, M.J. and Putzgruber, D., 2018, June. Cataract-101: video dataset of 101 cataract surgeries. In Proceedings of the 9th ACM multimedia systems conference (pp. 421-425). ** Grammatikopoulou, M., Flouty, E., Kadkhodamohammadi, A., Quellec, G.E., Chow, A., Nehme, J., Luengo, I. and Stoyanov, D., 2019. Cadis: Cataract dataset for image segmentation. arXiv preprint arXiv:1906.11586.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    One of the datasets is not publicly available. There are no results on public benchmarks while this possibility exists. Therefore, there is limited reproducibility of the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Please first refer to weaknesses.
    • The related works are not comprehensive and covering most recent works in the field. Also reference [26] which is an integral component in the architecture design is not mentioned in related works.
    • The motivation behind specific architectural components are not properly justified, e.g. the projector.
    • It can be observed that the improvement of suggested architecture compared with previous SOTA is very limited. In this scenario, a cross-validated ablation study for proper evaluation of the improvement or evaluation on other benchmarks or tasks would help in justifying the performance of the model.
    • Lastly, it is claimed that BFAL promotes sparse domain-agnostic features unique to surgical tools, while there is no feature explainability study or qualitative/quantitative analysis of the learned features in general.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper investigates an interesting problem in surgery tool classification. There are however some major shortcomings: 1) Limited technical and application novelty 2) Limited experiments and results: Inconsistent quantitative improvement, limited evaluation and comparison specifically with public benchmarks, generalization to other tasks, etc. 3) Lack of feature adaptation analysis.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #5

  • Please describe the contribution of the paper

    The paper proposes a new method to solve the unsupervised domain adaption problem for instrument classification in cataract surgery. Experimental results show the effectiveness of the method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well-written and easy to follow.
    2. The authors proposed a new model for UDA in the instrument classification.
    3. The comparison and ablation experiments are reported to verify the effectiveness.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The proposed loss for feature alignment is not new. It is based on prior work Barlow Twins [26]. The motivation for using this loss is a little bit weak, and the main contribution should be more clearly clarified.
    2. Some technical details and experimental settings are unclear.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    One of the used datasets is public, and the authors claim that their code will be publicly available upon acceptance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The proposed BFAL enforces equal features of images from different domain images, which may cause the model to miss important information in the images, especially the features related to different instruments. This could affect the downstream instrument recognition task. Therefore, the use of BFAL seems to rely on relatively strong assumptions and the authors should provide a more detailed explanation.

    2. In Figure 2, the features generated by the Projector are used in BFAL calculation, but the gradient expression shown in the figure appears to be incorrect.

    3. Some technique details are not clear: (a) the structure of the Projector; (b) the feature dimensions used in the BFAL calculation. Dose BFAL also require high-dimensional features to achieve better results, as in [26]?

    4. In the experiment, 14 classes are used for classification. Are those images that do not contain these categories also used for training?

    5. The D99 dataset proposed in [7,9] contains 99 cataract videos, but the authors claim they use 105 videos in this work, please give more information on the data usage.

    6. The result of Target Only (Resnet50), D99, in Table 2 is inconsistent with the table in Fig.1, showing 57% and 58%, respectively. Besides, the format of the data results should be consistent.

    7. From Table 1 in the supplementary, instrument (11) and (12) have low results in Target Only but performs well in the UDA methods. Does this mean that there is not enough data on these two types in the training set? The dataset splitting and distribution should be discussed.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper adopts a new loss for feature alignment across different domains. The experiments are complete, as comparing methods and ablation studies are included. However, the explanation of the method is not clear enough and the technical details are not clear.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This papers proposes a barlow adaptor with barlow feature alignment loss for unsupervised domain adaptation of surgical tool presence detection. The method is interesting and contributes to performance improvement on experimental datasets (D99 and CATARACTS). In the rebuttal, the authors should clarify the technical novelty and detailed advantages over general UDA methods.




Author Feedback

1) Technical Novelty and Advantages (Meta reviewer, R3 Q1, R5 Q1) - Barlow Twins[26] is a self supervised learning (SSL) technique that works on augmentations of the same image and is not related to domain adaptation (DA). Here, we utilize a similar construct to align features between images from different datasets. However, directly using it for Unsupervised DA (UDA) does not work as it enforces similar features between different surgical tools as also mentioned by R5. As a UDA setting does not have target labels, applying it between images of the same tools is not possible. Hence, the novelty lies in adapting it correctly for UDA. In our paper, we propose a minimax game between the CE loss (which encourages feature differentiation) and the BFAL (which encourages similarity) to utilize the strictness and redundancy reduction properties of BFAL to account for the problem mentioned above. This method is advantageous over existing UDA not only because it outperforms them but also because the minimax approach is beneficial in the cataract surgery domain. As mentioned in Section 3 last para, the cataract surgery domain has multiple features that are dataset (surgeon, hospital) specific but are not required for tool classification. Our approach discourages learning these features that are prone to domain shift. Thus, the technical novelty of this paper lies in correctly adapting Barlow Twins to the UDA task and leveraging domain-specific properties for better performance.

2) Technical Details (R5 Q3,4,5,7) - The projector is a linear layer with 100 hidden units, which is smaller than the features required by [26]. High feature dimensions are avoided to prevent overfitting on smaller datasets. During training, we exclude images without any instrument among the 14 classes, following the existing literature on cataract instrument classification. The authors of the referenced papers for D99 confirmed that 6 videos didn’t have surgical phase or skill labels but had instrument labels. Hence, we include those 6 videos in our analysis, making the total 105. To address the underrepresentation of classes 11 and 12, we ensure a similar proportion across the training, validation, and test sets using stratified split.

3) R3 Q2,4 - [3] and [22] motivate classification as a task in the surgical domain, while [8] motivates it with respect to cataract specifically. We didn’t find much work addressing DA in this application, which is the main task of this paper and is a clinically relevant task, as described by [16]. We do not reference [26] in the related work section because it is a SSL method and is not directly related to DA. Cataract-101 is used for phase segmentation and CADIS is used for semantic segmentation, which are different from instrument classification. We agree that using Barlow Adaptor for segmentation is interesting future work.

4) R3 Q3 - ViT generally performs poorly on smaller datasets, and our method shows better results with ResNet architectures which are shown to be better suited for this task. The lower microaccuracy mentioned may also stem from inherent class imbalance. Since microaccuracy is dominated by the dominant classes, macroaccuracy could be considered a better metric since it is averaged with equal weights over all classes. For this metric, in every setting, our method works better than the others.

5) Generalizability (R3 Q4) - We tested BFAL on a general computer vision Office-Home datast, where it beats SHOT [Liang et al. ICML’20] by 2% and DANN[5] by 4% on macro accuracy, while performing on par with recent UDA SOTA, CDTrans[24]. Cross dataset performance between the MR35h and SARTAJ MRI datasets shows a 1.3% improvement over CORAL and 2% over MMD. We will add these observations in the revised version.

6) Typos (R5 Q2,6) - We will correct the typos in the figures as mentioned. Fig 1 should show 57 instead of 58, and in Fig 2, backprop for the Projector should use BFAL instead of CE.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal addresses some points on technical novelty and technical details. Overall, two reviewers give positive ratings and one reviewer is negative. As a UDA paper for specific application of instrument classification in cataract surgery, this is a decent work regarding method design and experimental validation. There is also no over claim on technical contributions in the context of this paper. In the final version, if possible, the authors are suggested to add comparison with existing general UDA methods, in order to make this work also have some impact in a broader scope.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors present an unsupervised domain adaptation method for surgical tool presence detection to address the challenges of domain shift in cataract surgery data. The paper is well-written, the results on comparison and ablation experiments are promising, and the topic is well-motivated and of interest to the community. The main concerns from reviewers regarding technological novelty, proper motivation with regards to the selected task (tool classification), generalizability, and additional details related to the implementation have been addressed by the rebuttal.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal addressed most critical concerns raised by reviewers, such as technical novelty and details. Though there still exist the concerns about marginal improvement and application significance, given the contributions of the paper, I recommend acceptance.



back to top