Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Wei Huang, Xiaoyu Liu, Zhen Cheng, Yueyi Zhang, Zhiwei Xiong

Abstract

Deep learning-based methods for mitochondria segmentation require sufficient annotations on Electron Microscopy (EM) volumes, which are often expensive and time-consuming to collect. Recently, Unsupervised Domain Adaptation (UDA) has been proposed to avoid annotating on target EM volumes by exploiting annotated source EM volumes. However, existing UDA methods for mitochondria segmentation only address the intra-section gap between source and target volumes but ignore the inter-section gap between them, which restricts the generalization capability of the learned model on target volumes. In this paper, for the first time, we propose a domain adaptive mitochondria segmentation method via enforcing inter-section consistency. The key idea is to learn an inter-section residual on the segmentation results of adjacent sections using a CNN. The inter-section residuals predicted from source and target volumes are then aligned via adversarial learning. Meanwhile, guided by the learned inter-section residual, we can generate pseudo labels to supervise the segmentation of adjacent sections inside the target volume, which further enforces inter-section consistency. Extensive experiments demonstrate the superiority of our proposed method on four representative and diverse EM datasets. Code and models are available at https://github.com/weih527/DA-ISC.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_9

SharedIt: https://rdcu.be/cVRvt

Link to the code repository

https://github.com/weih527/DA-ISC

Link to the dataset(s)

https://www.epfl.ch/labs/cvlab/data/data-em/

https://mitoem.grand-challenge.org/


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose to conduct domain adaptive mitochondria segmentation by enforcing inter-section consistency. They align both segmentation results and intersection residuals predicted from source and target volumes via adversarial learning. The validations show that their method outperforms leading methods on several datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The exploration of inter-section consistency for this task stands for its novelty;
    2. The work presents an extensive experimental section.
    3. The paper is clear and well written.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) It seems that the performance of the Oracle and NoAdapt for adaptation from MitoEM-R to MitoEM-H in Table 2 is low. 2) It seem less informative to conduct ablation results onadaptation from VNC III to Lucchi (Subset1).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors listed “yes” for both code and pre-trained models. In this case, it can be an easy task for both training and testing. If the reproduction was only based on the descriptions in the paper, it could be somewhat difficult.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    1)Please check the performance of the NoAdapt for adaptation from MitoEM-R to MitoEM-H in Table 2. 2)It seem less informative to conduct ablation results onadaptation from VNC III toLucchi (Subset1). 3) Discussions about the limitation of the study and future work are recomended. In general, the authors should emphasise more the real benefits of the methods found and give some general points and suggestions to authors taking these fields into account.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors address domain adaptive mitochondria segmentation by enforcing inter-section consistency. The exploration of inter-section consistency for this task stands for its novelty. The work presents an extensive experimental section. Moreover, the paper is clear and well written. There also some issues in this study. For example, the performance of the NoAdapt for adaptation from MitoEM-R to MitoEM-H in Table 2 is exceptionally low,which may mislead the readers.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    The authors present a method for successfully applying a model trained on one EM dataset to another EM dataset. This domain transfer allows the reuse of existing ground truth annotations. The authors compare their results with previously published methods for the same task and other UDA methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors evaluated their method on multiple datasets and compare with an extensive set of methods that exceed methods developed for this task
    • The authors show that their method outperforms existing methods
    • The paper is written clearly
    • The authors include ablation studies
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The authors developed a 2D model for a 3D task. 3D models have been used for organelle (synapse, mitos, …) segmentation and detection since at least 2017 and vastly outperform 2D models. The method presented by the authors relies on comparing predictions on 2D slices; however, it is not clear whether this would work for the more appropriate 3D task. Further, one could argue that by having access to a “larger field of view” in the third dimension, their comparison to existing methods is not precise. It is not clear, whether the access to adjacent slices alone would have created the observed performance increase
    • The presented datasets are not representative for current EM datasets. Two of the datasets are < 20um^3 in volume.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method appear reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • Please avoid exaggerated language (“for the first time”, “as a resuce”)
    • the term “gaps” is not clear at first. Please provide a better explanation early in the paper.
    • The evaluation would be better done object-wise and not segmentation-wise as has been done by Kreshuk et al, 2014; Dorkenwald et al, 2017 and others. Object-wise evaluations better reflect the performance as it relates to the downstream use-case
    • Please use a different color combination in Fig 2 to allow red-green blind readers to parse this figure.
    • The dataset pairings in the evaluation minimizes differences between paired datasets reducing the information that can be cleaned from it. In the interest of better evaluations please consider different pairings such as MitoEM-R + VNC III.
    • The work by Januzweski et al, 2018 (CycleGANs) addresses a similar problem for neuron segmentations and should be at least mentioned.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is based on a 2D model. 3D models are the state-of-the-art for this task for several years and should be used to show the effects of the introduced methods on actual applications. Given the comparison to other 2D models, it could still be argued that this is not a sole reason for rejection. However, the fact that there is ample reason to believe that a 3D model would not work with the proposed inter-section consistency means that this work will have limited impact.

  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    After considering the author’s rebuttal I changed my opinion to “weak reject.”

    I appreciate the authors’ responses to my concerns.

    In their response the authors argue that their model should not be regarded as a traditional 2D model because it takes an additional slice into account. I agree. This echos one of my main points of weakness: The comparison to the other presented 2D methods is difficult because of the differences in how the data was presented to predict a given voxel (“field of view”).

    Even though 2.5D models improve over 2D models the SOTA on the MitoEM challenge is a 3D model, as the authors point out. Therefore, it is not clear why in their additional experiment (comparing the SOTA 3D model to theirs - both without UDA), theirs comes out ahead. It is odd that they chose a dataset combination for this experiment that was not used in the paper (Lucchi -> MitoEM-H). The Lucchi dataset is small and might be in fact too small to train a large (3D) model - it could have overfit which would make a worse performance on another dataset plausible. It is not clear what point the authors are proving with this analysis. It is also not clear why the authors could not have made the same comparison with their version of UDA included in both models. This would have addressed all my concerns.

    The authors rightly point out that the ablation experiments show that their performance improvements go beyond the impact by the additional slice fed into the network. I changed my opinion based on that.



Review #3

  • Please describe the contribution of the paper

    A new method is proposed for unsupervised domain adaptation (UDA) of 3D mitochondria segmentation in EM images. The inter-domain alignment design leverages adversarial training to align segmentation output space. The authors designed residual decoder and discriminator, where “residual” refers to the differences between the prediction maps from two sections of the same volume, which enforces simultaneous alignment between the prediction maps of two sections from the same 3D volume as well as the residual prediction maps from the target domain with the corresponding masks from paired source domain sections. For intra-domain alignment, they designed inter-section consistency loss for the target domain to penalize differences between the prediction map of each section and their corresponding pseudo-label generated by subtracting the residual prediction map from the prediction map of the paired section of the target same volume. The method is validated on four mitochondria datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The design of inter-section consistency leverages intra-domain information, which is intuitive and effective. Although enforcing the prediction consistency in general is a common idea for 3D image segmentation, it seems has not been leveraged for UDA of 3D images yet.
    • Both the extensive experiments on four datasets as well as the ablation experiment seem solid and convincing.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • UDA network design needs more justification and clarification. Please see comment section Point 2 for details.
    • Descriptions about UDA in the introduction and method section are a bit confusing and inaccurate.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Overall it seems feasible to reproduce the results. Some information selected as “yes” in the reproducibility report is missing: mean, variance, statistical significance and failure case analysis. Another question is whether the authors will publish their code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. It seems to me that there are a few inaccurate descriptions about UDA: 1.1. In the introduction section on Page 2 it is ambiguous and misleading to describe some of the unsupervised domain adaptation (UDA) approaches as “… aligning segmentation results … to make the prediction of mitochondria on the target volume similar with that on the source volume”. However, with UDA it is not the exact prediction of the image volumes from the two domains that are made “similar”, but the distribution of features (or selected summary statistics of the features) extracted from the two domains with the domain adapted model. Though this alignment can be achieved via various methods, including aligning the output space, in the end we just need the adapted segmentation model that extracts aligned features for both domains. Could the authors rephrase and clarify the description? 1.2 Related to 1.1, on Page 4 Methodology section under “Prediction Consistency”: “… we enforce the target predictions to be similar with the source predictions …” This description is confusing. I would suggest the authors rephrase it. For example, the described approach can be described as “enforcing the distribution of target layout to be similar with the distribution of source layout”. 1.3 In the “related works” section on Page 2, the description of UDA literature for EM applications is not accurate: The “pseudo label-based” approaches also seek to align the features of two domains in order to learn domain-invariant features, despite via entropy-based self-supervised training. Thus such definition overlaps with the authors’ definition of “domain alignment-based” UDA approaches. It is therefore confusing to categorize UDA approaches into “pseudo label-based” and “domain alignment-based”.
    2. Looking at Supplementary Fig 1, the authors designed two separate segmentation encoder branches for two image sections from the same volume with a certain z-step, respectively. Separate decoder branches are also depicted in this image.
      2.1. It is misleading in the main manuscript that these two separate segmentation decoders (including the classifier layer for segmentation) are labeled with the same color and denoted the same as “Seg. Decoder”. A related concern is that the segmentation discriminators for each of these two decoders are also labeled the same in Fig1 and denoted the same in the loss term (Equation 2). 2.2. There is no justification of why adopt two separate decoder braches. Since the layout of the segmentation masks between the two sections should be from the same distribution, it seems that one shared decoder (as well as a shared discriminator) can serve the purpose and the additional decoder complicates the model. I would suggest that the authors add their justification for such a design. 2.3.Which exact network modules are used for inference? Can any of the encoder and decoder branch be used for inference?
    3. “inter-section gap” belongs to intra-domain gap, which has been studied in multiple reports (such as the baseline used in this manuscript DA-VSN and Pan et al “Unsupervised intra-domain adaptation for semantic segmentation through self-supervision” CPVR 2020). Considering the framework includes both inter-domain alignment and intra-domain alignment, I would suggest that the authors provide such elaboration/clarification to make the design easier for readers to relate to domain adaptation.
    4. UDA have great promises for the MICCAI community, but one open question is how to avoid the heavy dependency of the target domain annotations as validation set for UDA model selection. I would encourage authors to provide insights on this regard in their future work and help the community seek a way to make UDA more feasible/useful in practice, e.g. clinical applications.
    5. Minor: 5.1. Source domain and segmentation map are both denoted as “s”. Although the former was denoted by upper case and the latter by lower case, it can be confusing to the readers. Could another letter be used to denote segmentation maps? 5.2. In Table 2, “AdaptSegNet[4]” –> “AdaptSegNet[21]”; DANN reference is not [21] and is missing?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Developing models with the constraint of data scarsity is of great interest to the community. To this end, the authors propose a simple, elegant yet effective approach for improving UDA for 3D EM mitochondria segmentation. Such an approach potentially can be extended to other 3D imaging modalities and applications. The experimental validation is solid and extensive.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    I kept my decision after reviewing the authors’ feedback and the other reviews. Thank the authors for clarifying their model design and plan for including justifications and clarifications in their revised version.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors proposed a novel mitochondria segmentation using intersection consistency. The reviewers agreed that the paper has some merits, i.e., novelty of leveraging intersection consistency, well-written, extensive experiments on multiple datasets. However, the reviewers also pointed out several critical issues, summarized below.

    1) Justification of 2D segmentation rather than 3D (3D segmentation is considered SOTA). 2) Possible flaws in the experimental results (the Oracle and NoAdapt for adaptation from MitoEM-R to MitoEM-H in Table 2) 3) Justification and clarification of UDA network design.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR




Author Feedback

We thank R1 and R3 for proposing “accept” and “strong accept”. We also appreciate R2 and AC for giving us an opportunity to clarify the major concerns as below. Due to space limit, we will directly address some detailed comments in the revised version.

R2: Justification of 2D segmentation rather than 3D. Reply: We have three points here. (1) Indeed, 3D models generally outperform 2D models. However, recent studies show that 2.5D models leveraging 2D convolutions to explicitly model the inter-section information even outperform 3D models for video and volumetric data segmentation [1]. Our work follows this trend and thus cannot be regarded as traditional 2D models. (2) Since domain adaptation (DA) of 3D models is rarely studied, we remove the DA strategy in our model to conduct a fair comparison with a SOTA 3D model (the MitoEM challenge winner [8]). When trained on Lucchi and tested on MitoEM-H, the IoU score of [8] is 0.427 while ours is 0.514 (higher is better). (3) Ablation results in Table 3 justify that our performance gain over traditional 2D models is NOT due to “access to adjacent slices alone”. Setting ii can be regarded as a 2D UDA method with two adjacent image inputs, while its performance is still quite limited (IoU=0.571) comparing with our method (IoU=0.687). [1] Gonda F, et al. Parallel separable 3D convolution for video and volumetric data understanding. In BMVC (2018).

R1: Possible flaws in the experimental results. Reply: As described in the caption of Table 2, the values of the ‘Oracle’ and ‘NoAdapt’ on MitoEM-H/R are directly obtained from [23]. We have double checked the results and can confidently confirm that there are no flaws here. Since the mitochondrial structure in Human is more complex than that in Rat, domain adaption from R to H is more difficult than that from H to R. Therefore, it is reasonable that the scores of ‘R->H’ are lower than that of ‘H->R’. This can be also confirmed by the results of the MitoEM challenge winner [8], where the AP-75 score is 0.917 on R but 0.828 on H.

R3: Justification and clarification of UDA network design. Reply: The encoder and the segmentation (Seg.) decoder in our framework are adopted from [15]. The Seg. decoder consists of two shared convolutions and two separate convolutions. This design has two reasons: (1) the shared convolutions can implicitly learn the association of two adjacent images; (2) the separate convolutions can decode separate features for each input image and output two segmentation maps simultaneously. If the decoder is fully shared, the segmentation network will not be able to achieve these two goals at the same time. In the manuscript, we omit the two separate convolutions for simplicity. We will add explanations in the revised version. On the other hand, the Seg. discriminator is indeed fully shared since the layout of segmentation maps is from the same distribution. During the inference phase, we adopt the trained encoder and the entire Seg. decoder to output two segmentation maps simultaneously.

R2: The presented datasets are not representative. Reply: We respectively cannot agree. For the task of mitochondria segmentation, the Lucchi dataset published in 2013 is a classic dataset and is widely used in many studies including supervised learning [8] and domain adaptation (DA) [17]. We adopt this dataset to make a fair comparison with other DA methods. The MitoEM dataset, published in the Large-scale 3D Mitochondria Instance Segmentation Challenge at ISBI 2021 [22], is the latest dataset with a relatively large size of 30um^3.

R1: Less informative to conduct ablation from VNC III to Lucchi (Subset1). Reply: To facilitate the comparison with existing works, we conduct our ablation experiments from VNC III to Lucchi following [17].

R3: Inaccurate descriptions, definitions, and citations (R3). Reply: Thanks for your valuable suggestions. We will carefully check and revise these issues in the revised version.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors addressed the reviewer’s concerns in the rebuttal reasonably well. Although the paper has one weak reject (4), it is upgraded from reject (3). Moreover, the other two reviewers support accepting this paper. I agree with the reviewers and recommend accepting this paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal reasonably addressed conerns on 2D vs 3D segmentation, potential flaws in experiments and UDA network design. Two reviewers suggest acceptance. One reviewer moved the scale from reject to weak reject after rebuttal. Overall, this is good paper with merits over weakness.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors propose to conduct domain adaptive mitochondria segmentation by enforcing inter-section consistency.

    2 reviewers are in favour and the negative appreciated the rebuttal, but is still not convinced about acceptance. However this is outweighed by the positivers.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    9



back to top