Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Lin Li, Jingyi Liu, Shuo Wang, Xunkun Wang, Tian-Zhu Xiang

Abstract

Trichomoniasis is a common infectious disease with high incidence caused by the parasite Trichomonas vaginalis, increasing the risk of getting HIV in humans if left untreated. Automated detection of Trichomonas vaginalis from microscopic images can provide vital information for diagnosis of trichomoniasis. However, accurate Trichomonas vaginalis segmentation (TVS) is a challenging task due to the high appearance similarity between the Trichomonas and other cells (e.g., leukocyte), the large appearance variation caused by their motility, and, most importantly, the lack of large-scale annotated data for deep model training. To address these challenges, we elaborately collected the first large-scale Microscopic Image dataset of Trichomonas Vaginalis, named TVMI3K, which consists of 3,158 images covering Trichomonas of various appearances in diverse backgrounds, with high-quality annotations including object-level mask labels, object boundaries, and challenging attributes. Besides, we propose a simple yet effective baseline, termed TVNet, to automatically segment Trichomonas from microscopic images, including high-resolution fusion and foreground-background attention modules. Extensive experiments demonstrate that our model achieves superior segmentation performance and outperforms various cutting-edge object detection models both quantitatively and qualitatively, making it a promising framework to promote future research in TVS tasks.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_7

SharedIt: https://rdcu.be/cVRvr

Link to the code repository

N/A

Link to the dataset(s)

https://github.com/CellRecog/cellRecog

https://zenodo.org/record/6545146


Reviews

Review #1

  • Please describe the contribution of the paper

    In this paper, the authors proposed a TVNet for Trichomonas Vaginalis segmentation in microscopy images. The proposed method is constructed by a high-resolution fusion (HRF) module and foreground-background attention (FBA) module. Extensive experiments on a private TVM13K dataset have shown the effectiveness of the proposed method by outperforming several image segmentation methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The proposed HRF and FBA modules are indicated to be effective via ablation studies.

    • The proposed method is making an early attempt on the Trichomonas Vaginalis segmentation in microscopy images.

    • The overall paper is clearly presented and easy to follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • There lack experiments on public datasets, which limits the reproducibility of the proposed method.

    • The HRF module is similar to the channel and spatial wise attention mechanisms, which have been proposed in the following publications:

    L. Chen, et al, “SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning”, CVPR 2017, pp5660-5667.

    S. Woo, et al, “CBAM: Convolutional Block Attention Module”, ECCV 2018, pp3-19.

    • The paper lacks computational complexity analysis on the proposed method in comparison with other image segmentation methods.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Experimental settings are presented in the paper. However, all experiments are conducted on a private dataset, which is not available during review.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • In Table 1, is there any specific policy to set the definition of the small object, i.e., ratio < 0.1

    • There is a related work (see below) also focusing on the Trichomonas vaginalis analysis using deep learning methods, please include the discussions and comparisons with it:

    X. Wang, et al, “Trichomonas vaginalis Detection Using Two Convolutional Neural Networks with Encoder-Decoder Architecture”, Applied Sciences, 11(6), 2738, 2021.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper is clearly written. However, the reproducibility is limited since there is no experiments on the public datasets. In addition, the overall methods lack novelty.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors present a new large-scale annotated dataset for the segmentation of Trichomonas vaginalis on microscopy images named TVMI3K, together with a novel deep neural network called TVNet used as baseline. TVNet is a Res2Net-like architecture with five levels of features, high-resolution fusion modules, a neighbor connection decoder and foreground-background attention modules. The proposed baseline performs favorably with respect to nine state-of-the-art image segmentation models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The publication of a new large-scale annotated dataset is of interest to many biomedical computer vision developers and poses new challenges to the existing methods.

    The proposed baseline architecture combines very recent modules in order to improve the segmentation results. Such modules are justified using an ablation study.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    References to previous work on Trichomonas Vaginalis segmentation/detection are missing. For instance, check “Wang, X., Du, X., Liu, L., Ni, G., Zhang, J., Liu, J. and Liu, Y., 2021. Trichomonas vaginalis Detection Using Two Convolutional Neural Networks with Encoder-Decoder Architecture. Applied Sciences, 11(6), p.2738.” and its corresponding dataset: https://github.com/wxz92/Trichomonas-Vaginalis-Detection

    In the comparison with the state of the art, the number of execution trials and hyperparameter exploration is unclear.

    In the ablation study, the Dice and IoU values are missing.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors do not mention the range of hyper-parameters considered nor the method to select the best hyper-parameter configuration. They only specify some of the hyper-parameters used to generate results. The exact number of training and evaluation runs (iterations or epochs) is not provided.

    A description of the hardware infrastructure used is provided but nothing is mentioned about the deep learning framework used nor the code availability.

    There is no analysis of situations in which the method failed.

    There is no description of the memory footprint nor an average runtime for each result, or estimated energy cost.

    There is no analysis of statistical significance of reported differences in performance between methods.

    The results are not described with central tendency (e.g. mean) & variation (e.g. error bars).

    The specific evaluation metrics and/or statistics used to report results are correctly referenced.

    There are no details of train / validation / test splits nor details on how baseline methods were implemented and tuned.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    An effort should be made with regards to reproducibility and evaluation. More specifically, the authors should provide a better description of the range of hyper-parameters considered for every method (including those of the state of the art), the number of training and evaluation runs, validation split and validation results, etc. In that sense, I recommend to follow the code of good practices proposed by Dodge et al. (“Show your work: Improved reporting of experimental results”, 2019).

    In page 2 there is a typo: “deep learning techniques has not yet been well-studied” should read “deep learning techniques have not yet been well-studied”

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty of the proposed dataset is clear and the baseline results are promising, although they should be confirmed with a proper description of the hyperparameters used for every method.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The contribution of this paper is two-fold. First, a large dataset is created with annotations at different levels. Second, a new method is proposed for TVS.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Both contributions (dataset and segmentation method) are significant: The dataset is very large and extensively annotated, including common annotations like object-level segmentation masks, but also more detailed attributes at image level (labels like “multiple objects”, “out-of-view”) and object level (labels like “complex shape”, “out-of-focus”). The segmentation method outperforms latest methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The description of the segmentation method should be improved (see “detailed & constructive comments” along with some minor issues).

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper contains enough details for the reproduction of results, although the segmentation method should be described in more detail.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Major issues: In the description of the segmentation method (Sec. 3), details about NCD are missing. There should be a high-level summary about how NCD [4] works. A comparison with the partial decoder component [31] should be made in the introduction, not here.

    The overall presentation of the method should also be improved. For example, in Sec. 3.1 and Fig. 2 it is unclear what $P_1, … P_6$ are, since they are only mentioned later in Sec. 3.3. In Fig. 2 the arrows for $P_3$ to $P_6$ are also confusing since it is unclear where they originate, and where the arrows for $P_1$ and $P_2$ are (or, whether $P_1$ and $P_2$ exist).

    Minor issues: Sec. 3.3: Why does the weak foreground region $F^2_{i+1}$ contain boundary information? Does “weak” mean that the foreground feature is not strong enough? If so, then the boundary information there might be inaccurate.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    8

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Both dataset and the segmentation method are very convincing.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper is reviewed by three experts in the field. All the reviewers agree with that the paper is generally well written with the methods being easy to understand and follow. There are some minor issues needed to be addressed in the final version, including more details of the algorithm, and more analysis for experiments. The proposed dataset is highly valuable for the community. Please make sure to integrate the points raised by all reviewers and provide the link for dataset public when preparing the final version.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3




Author Feedback

We thank the meta-reviewer and the reviewers for their efforts in handling our submission and reviewing the paper. We are encouraged that they find our method is technically effective (R1), our dataset is interesting (R2), our work is significant with convincing results (R2, R3), and our paper is well written (R1). We respond (Re) to each comment below. Reviewer #1 (R1) Q1. Lack experiments on the public datasets. Re. To our knowledge, there is no officially released open-source large-scale annotation dataset for TVS. Although Wang has carried out a work, the method is based on video and adopts optical flow (video temporal cues) for TVS, which is slightly different from our focus. Besides, it is noted that the authors have not released the complete annotation dataset and model code. The resulted/annotated videos they released (without original video and separate annotation files) are far from supporting other researchers for model training and comparative studies. To this end, we constructed the large-scale TVMI3K dataset, with rich annotations including object-level masks, boundaries and attributes, with the expectation of advancing the field. Most importantly, we will open source our entire dataset and source code when the paper is officially accepted to facilitate further research and algorithm evaluation in this field. The dataset currently can be publicly available at https://zenodo.org/record/6534086.
Q2. The novelty of method. Re. In addition to the newly collected TVMI3K dataset, we propose a simple and well-performing baseline model, which integrates edge cues and foreground-background exploration (not focusing on the design of complex modules), for this task. Q3. HRF vs. CBAM. Re. We first integrate high-resolution edge features into the backbone features of each level to enhance feature representation. In the specific implementation, we adopt the attention model (i.e., CBAM) to extract critical information from the backbone features which are aggregated with edge cues. Q4. Computational complexity. Re. The number of parameters (M) and FLOPs (G) of our model are 154.9 and 97.6, respectively. A detailed comparison analysis will be added to our final version. Q5. The definition of small objects. Re. To our knowledge, there are two ways to define small objects: based on relative scale [1] and based on absolute scale [2]. In our dataset, the absolute pixel of the minimum object is 1848, which is greater than 32*32 [2]. Thus, we follow the setting of [1] and choose the relative scale way to define small objects (ratio≤0.1). Reference: [1] Camouflaged object detection. CVPR, 2020. [2] R-CNN for small object detection. ACCV, 2016. Reviewer #2 (R2) Q1. The number of execution trials and hyper-parameter exploration. Re. We have described the parameter settings in Sec. 4.1 (last part). Hyperparameters are set based on the experience from related work and our experiments. The parameters of the comparison methods refer to the original paper or released code. Due to page limitations, we will add more hyper-parameter exploration and analysis to our journal version. Q2. More analysis on experiments, such as failure case, statistical significance, etc. Re. Thanks for your valuable suggestions. Due to page limitations, we will add more experimental analysis to our journal version. Q3. Dice and IoU in ablation study. Re. We achieve 0.251, 0.328, 0.271 and 0.376 mDice respectively, and 0.163, 0.228, 0.185 and 0.276 mIoU respectively for (a), (b), (c), (d). Reviewer #3 (R3) Q1. The description of method. Re. We will clarify our method in the final version, including NCD and definition of $P_i$. Q2. Weak foreground regions contain boundary? Re. In our experiments, boundary uncertainty will lead to inaccurate segmentation. The weak foreground area is generally located in the boundary area between object and background. FBA aims to mine object and boundary cues from weak foreground regions to improve segmentation.



back to top