Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xinkai Zhao, Yuichiro Hayashi, Masahiro Oda, Takayuki Kitasaka, Kensaku Mori

Abstract

Semantic segmentation is an important issue of intraoperative guidance in laparoscopic surgery. However, the acquisition and annotation of laparoscopic datasets requires a large amount of workload, which limits the research on semantic segmentation of laparoscopic images. In this paper, we address the Domain-Adaptive Semantic Segmentation (DASS) task, which requires only computer-generated simulated images and unlabelled real images to train a laparoscopic image segmentation network. In order to bridge the large domain gap between generated and real images, we propose a Masked Frequency Consistency (MFC) module to encourage network learning frequency related information of the target domain as additional clues for robust visual recognition. Specifically, MFC randomly masks some of the high-frequency information of the image to enhance the consistency of the network’s predictions for low-frequency images and real images. Extensive experimental results show that the proposed MFC module can be flexibly inserted into existing DASS frameworks and improve performance. Our approach can perform comparable to fully supervised learning method on the CholecSeg8K dataset without using any manual annotation.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_63

SharedIt: https://rdcu.be/dnwdK

Link to the code repository

https://github.com/MoriLabNU/MFC

Link to the dataset(s)

http://opencas.dkfz.de/image2image

https://www.kaggle.com/datasets/newslab/cholecseg8k

http://camma.u-strasbg.fr/datasets


Reviews

Review #3

  • Please describe the contribution of the paper

    This paper proposes a novel masking frequency consistency (MFC) module, inspired by masked image modeling and masked image consistency, to address the task of domain-adaptive semantic segmentation of laparoscopic images. The evaluation on public datasets demonstrates the effectiveness and superiority of the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The authors innovatively tackle the task of domain adaptive semantic segmentation for laparoscopic images by utilizing computer-simulated images and real unlabeled images. (2) To alleviate the substantial domain shift between simulated and real images, the authors propose a novel masking frequency consistency (MFC) model that narrows the gap between different domains. MFC is the first UDA method to apply masking strategy in the frequency domain. (3) The current experimental results demonstrate its effectiveness and transferability, although there may be some room for improvement.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The only major innovation in the methodology is the MFC module, while other aspects such as the use of teacher-student training paradigm and exponential moving average (EMA) for consistency regularization are relatively conventional. (2) Due to the presence of only one core contribution, the paper lacks ablation experiments. Nevertheless, it would be beneficial to provide some comparative analysis experiments to justify the selection of certain hyperparameters, such as r, b, h, and w, and elucidate the rationale behind their choices. (3) In order to demonstrate the transferability of their proposed MFC strategy, the authors showcase the results of DeepLabV2 and SegFormer in Table 2. However, this may be insufficient and unreasonable. Firstly, there should be experiments with different segmentation network architectures in Table 1 as well. Secondly, merely replacing one network architecture is inadequate. There should be 3-4 different segmentation network architectures based on CNN and Transformer. Lastly, the latest network in the DeepLab series is DeepLabV3+ (although it is also considered an older architecture). Using V2 is not meaningful due to its outdated nature.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper demonstrates good reproducibility as the authors state their intention to open-source the code, and the evaluation conducted using publicly available datasets.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    In response to the identified weaknesses, here are some potential suggestions for modifications: (1) Perform analytical experiments to explicate the motivations and reasons behind the selection of each hyperparameter, along with providing possible analytical explanations. (2) To establish the transferability of the proposed method, supplement comprehensive experiments on multiple UDA tasks, utilizing more recent network architectures and covering both CNN and Transformer-based networks. (3) Consider adding some domain generalization (DG) methods, such as AADG: Automatic Augmentation for Domain Generalization on Retinal Image Segmentation, to the related works section. If possible, provide comparisons with some DG methods for better context.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper’s primary contribution, innovative points, and practicality are interesting, and the reproducibility is relatively good, albeit some additions and modifications may be necessary.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    In this paper, the authors propose a novel masking frequency consistency (MFC) module to reduce the domain gap between generated and real images. By applying a masking strategy on the frequency domain, the model achieves comparable results to the fully supervised learning method. The method of the paper is novel, and the authors conduct many experiments about the model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The authors train the model using only computer-generated simulated images and unlabeled real images, which reduces the cost of the laparoscopic dataset annotation. (2) Due to the large domain gap between generated and real images, the authors propose a novel masking frequency consistency (MFC) module. The masking of high-frequency regions facilitates the transfer of knowledge learned in the generated images to the real images and improves the consistency of the network’s predictions for low-frequency images and real images. (3) The idea and framework of the article are clear and easy for readers to understand.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) Some of the experimental details are not perfect, including the method of determining the parameter values for the MFC method, and the related ablation experiments. (2) This paper does not compare and analyze the difference in final prediction performance between masked and unmasked high-frequency information.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    In the abstract, the authors state that the relevant program code will be uploaded to GitHub, which will facilitate method reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The method of the article is innovative and achieves relatively good results. However, the following doubts remain: (1) Has the Masked Frequency Consistency method been previously used in other fields? If so, please provide more references. (2) What is the parameter alpha value when the teacher network is updated? Does the parameter value change for different tasks? Please further explain the effect of different values on the network performance. (3) How are the parameters r b w h determined in the MFC method? Please give the relevant ablation experiments and explain how robust the model is.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The main factors for scoring include the following: (1) Innovativeness of the method: the method proposed by the authors is effective in reducing the domain gap. (2) Accuracy of the results: the prediction results of the model are better than most of the current methods. (3) Reliability of the model: the authors provide less explanation in this regard (disadvantage).

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    The paper proposes a method for Domain-Adaptive Semantic Segmentation (DASS) of laparoscopic images. The aim is to train a segmentation network using only computer-generated simulated images and unlabeled real images, without the need for manual annotation. To bridge the gap between generated and real images, the paper introduces a Masked Frequency Consistency (MFC) module that encourages the network to learn frequency-related information of the target domain as additional cues for robust recognition. The experiments show that the proposed approach achieves comparable results to fully supervised learning method on the CholecSeg8K dataset, which is a benchmark dataset for this task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper proposes a novel module for UDA, namly MFC, which bridge the gap between generated and real images.
    2. The experimental results show that MFC outperforms other UDA methods and comparable to fully supervised methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. While the paper presents a novel approach for Domain-Adaptive Semantic Segmentation, the use of frequency-related information and consistency regularization is not new in the field of medical imaging.
    2. More ablation studies are needed. For instance, the authors should compare their method with other methods that can be used to obtain images with low-frequency information, such as Gaussian blur.
    3. The impact of the masking strategy of MFC should be further analyzed.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper provides enough detail for reproduction.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. More ablation studies are recommended to provide more insight into the effectiveness of the MFC module.
    2. The paper focuses on laparoscopic images, but the proposed motivation could be applied to other medical imaging modalities. The authors could discuss the generality of the algorithm and verify its effectiveness in a wider range of modalities to further demonstrate the potential impact of the proposed approach.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty of the paper is limited, but the experimental results are satisfactory. More ablation studies are needed to improve this paper.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The three reviewers generally agree that the proposed method for Domain-Adaptive Semantic Segmentation of laparoscopic images is interesting and innovative. The use of computer-generated simulated images and unlabeled real images to train the model is a strength. The proposed masking frequency consistency module is seen as a novel approach that bridges the gap between different domains using a masking strategy in the frequency domain. However, there are some weaknesses identified, including imperfect experimental details, lack of ablation studies, and limited comparison with other methods. Reviewer 1 rates the paper as a weak accept with interesting merits that slightly weigh over weaknesses, while Reviewer 2 recommends acceptance with a score of 6, citing the paper’s innovativeness, accuracy of results, and some reliability concerns. Reviewer 3 also recommends acceptance with a score of 5, suggesting potential modifications and additional experiments to further improve the paper’s contribution and practicality. Overall, taking into account the positive assessments and recommendations of all three reviewers, the paper is recommended for provisional acceptance.




Author Feedback

We are grateful to all the reviewers and ACs for their positive feedback and insightful comments. We will carry out the suggested experiments and update our code repository with the results. Furthermore, we agree with the reviewers that extending our MFC method and experimenting with more datasets would be valuable for future work. We appreciate all the suggestions and will include them in the camera-ready version and further journal version.



back to top