Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Qi Chen, Mingxing Li, Jiacheng Li, Bo Hu, Zhiwei Xiong

Abstract

3D mitochondria segmentation in electron microscopy (EM) images has achieved significant progress. However, existing learning-based methods with high performance typically rely on extensive training data with high-quality manual annotations, which is time-consuming and labor-intensive. To address this challenge, we propose a novel data augmentation method tailored for 3D mitochondria segmentation. First, we train a Mask2EM network for learning the mapping from the ground-truth instance masks to real 3D EM images in an adversarial manner. Based on the Mask2EM network, we can obtain synthetic 3D EM images from arbitrary instance masks to form a sufficient amount of paired training data for segmentation. Second, we design a 3D mask layout generator to generate diverse instance layouts by rearranging volumetric instance masks according to mitochondrial distance distribution. Experiments demonstrate that, as a plug-and-play module, the proposed method boosts existing 3D mitochondria segmentation networks to achieve state-of-the-art performance. Especially, the proposed method brings significant improvements when training data is extremely limited. Code will be available at: https://github.com/qic999/MRDA_MitoSeg.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_4

SharedIt: https://rdcu.be/cVRvo

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a novel data augmentation method to improve the segmentation of mitochondria in 3D electron microscopy (EM) images. The method is based on (1) a pix2pix-like network to generate realistic 3D EM images from mask (label) images which is trained in an adversarial fashion, and (2) a 3D mask generator that produces realistic mitochondria labels using size, distance and morphological priors from the training set.

    The presented method boosts the performance of state-of-the-art segmentation networks (that use traditional data augmentation) on a public dataset, especially when training data is scarce.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    To increase the usually limited training data of 3D EM segmentation datasets, the authors propose an original method based on state-of-the-art image generation models adapted to 3D. The mask layout generator is a simple and effective solution to produce more diverse but realistic masks, from which synthetic EM images can be later generated. This is an interesting approach that opens the door to also generate synthetic images on other domains.

    The results of using this data augmentation method are evaluated against the state of the art in the field of mitochondria segmentation on EM volumes and its impact is analyzed with a proper ablation study and by simulating the scarcity of training labeled data.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The method is only tested on a single public dataset, while others (Kasthuri++, VNC, etc.) are publicly available as well. Moreover, the selected dataset (Lucchi) is isotropic, while many EM datasets present a lower resolution in the z direction. This problem should be taken into account for a more generalist solution.

    In the comparison with the state of the art, the number of execution trials and hyperparameter exploration is unclear.

    There is no information about execution times, which would be very interesting for the final user.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors do not mention the range of hyper-parameters considered nor the method to select the best hyper-parameter configuration. They only specify some of the hyper-parameters used to generate results. The exact number of training and evaluation runs (iterations or epochs) is not provided.

    A description of the computing infrastructure used (hardware and software) is provided together with an analysis of situations in which the method failed (too limited data).

    There is no description of the memory footprint nor an average runtime for each result, or estimated energy cost.

    There is no analysis of statistical significance of reported differences in performance between methods.

    The results are not described with central tendency (e.g. mean) & variation (e.g. error bars).

    The specific evaluation metrics and/or statistics used to report results are correctly referenced.

    There are no details of train / validation / test splits nor details on how baseline methods were implemented and tuned.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    An effort should be made with regards to reproducibility and evaluation. More specifically, the authors should provide a better description of the range of hyper-parameters considered, the number of training and evaluation runs, validation split and validation results, etc. In that sense, I recommend to follow the code of good practices proposed by Dodge et al. (“Show your work: Improved reporting of experimental results”, 2019).

    If possible given the short review processing time, I recommend testing the method on other datasets as well.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty of the proposed method is clear and the results are promising, although the presentation of the results should be improved and the use of more datasets is recommended.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    The authors propose to boost 3D mitochondria segmentation by synthesizing images from synthetic instance layouts. It uses one publicly available dataset for validation and shows the benifit of the proposed data augmentation strategy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is fairly well written, albeit the method and presentation of results can be improved.

    The authors synthesize more diverse images by producing more synthetic instance layouts

    The paper performs rigorous analysis with ablation tests.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors fail to review previous studies on learning the mapping from instance masks to real EM images, ahlthough previous methods only synthesized 2D images.

    The authors fail to investigate the effect of the postprocessing. When comparing with other methods, it is better to declare which method uses the postrpocessing strategy in this study. For example, did the 3D U-Net and 3D U-Net (w/ ours) in Table 1 use the same postprocessing?

    From the results in Table 2, it seems the model using the proposed method and 1/16 data outperforms the model without using the proposed method but using 1/8 data. Note that the second method use the same amount of data ( due to your special training setting of each batch) as the first method but with real data. It is insteresting to interperate this result.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors listed “yes” for both code and pre-trained models. In this case, it can be an easy task for both training and testing. If the reproduction was only based on the descriptions in the paper, it could be somewhat difficult.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    It will be better to review previous studies on learning the mapping from instance masks to real EM images. There are previous methods on synthesizing 2D images.

    It will be interesting to show the results by models training only on synthesized image.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The investigation of conducting mask rearranging data augmentation is interesting and seems useful. The paper is neat and in principle.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    In this paper the authors propose a generative adversarial modeling approach to augment data for 3D mitochondria segmentation in EM images. They first trained a pix2pix GAN model to synthesize realistic mitochondria EM images with instance masks. To increase the diversity of appearance for the synthesized mitochondria images, they design a pipeline to rearrange instances of mitochondria in the 3D masks and then feed these rearranged masks as input to the image synthesis network. The generated images are mixed with real images and used for training the segmentation network. The authors validate the method on one public dataset under the conditions of fully accessing all training data as well as reduced numbers of training examples, outperforming several baseline methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The proposed generative module is independent of, and thus can be integrated into, any segmentation network, which can be leveraged to benefit various existing and new segmentation network designs.
    • The authors designed and conducted experiments with decreasing numbers of accessible annotations and demonstrated the increasing superiority of the proposed model compared with baseline methods. Such type of experiment dealing with data scarcity issue is very valuable to the community.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The proposed method is a straightforward extension from 2D version for generating synthetic 3D mitochondria images, which directly utilizes pix2pix setup. The authors designed mitochondria mask rearrangement technique, but this is a general engineering tweak with incremental contribution: take the pix2pix GAN for example, in the test stage, one can use any hand-drawing as mask input to the generator, where the objects are arranged in any location the users want; such a general and already existing technique is not specific to mitochondria image generation and thus the first contribution the authors stated seems a bit weak. In addition, there are other concerns about the mask rearrangement technique; please see the comments section.

    • The experiments are limited to one mitochondria dataset with two correlated volumes, which makes the second contribution (experimental validation) not particularly strong as well. Although the authors demonstrated improvements over previous works, it is hard to be fully convinced about the superiority of the proposed approach with results from one dataset. It will be highly valuable for the authors to validate the approach in other public mitochondria EM segmentation datasets, such as the rat and human datasets from mitoEM (Wei et al., MICCAI, 2020) and potentially also VNC III (Gerhard et al, Figshare, 2013).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No repo link is provided in the manuscript but the authors stated that related code will be released in the reproducibility response; considering that the key parameters about mask rearrangement is not clearly stated (please see comments section), it can be helpful to provide code or provide more complete description.

    In the reproducibility response the authors stated “yes” for the following list of information but did not include corresponding information in the manuscript: mean, variation, statistical significance of experimental results; runtime, memory footprint; failure cases. Could the authors at least report mean and variation of the results?

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. Page 2 Paragraph 2, the authors stated that “… aims to synthesize enough diverse mitochondria EM training data …”. The authors designed mask rearrangement techniques in order to generate diverse images. The proposed mask rearranging strategy takes into consideration the size, number and relative spacing of the mitochondria in real images, which makes a lot of sense. But there are concerns as follows: 1.1. Key details and supporting evidence of mask rearranging strategies are missing: (1) the authors studied the actual size distribution and distance distribution of mitochondria, but none of these statistics are included in the manuscript to support the design of two described strategies. (2) key details, including the mitochondria number and size distribution in the synthesized images, are missing. This also leads to the following questions: (a) are there other characteristics of the mitochondria images (e.g. the overall density of mitochondria, ) needs to be matched in order to synthesize realistic looking mitochondria images? Whether it matters or not to match these characteristics of mitochondria to those of real images in order to improve segmentation performance? 1.2. The relative location of mitochondria and biologically important structures in the background is ignored. Unlike natural scene images, or even fluorescent images only targeting and imaging specific cellular components, where a foreground instance can appear in various locations in the scene and even unlikely locations (this can be justified since the natural scene image can be synthesized, like an advertisement image), the background in mitochondria images, however, has biological meanings. One question is, when the orientation and localization of mitochondria are changed by the rearrangement technique, how are the background filled? Do the synthesized biological structures in the background look realistic? Does it matter to have realistic background in order to improve segmentation performance? 1.3. Evaluation of the quality of generated images is missing. It is common practice and standard to have experts evaluate image quality. In addition, considering the abovementioned concerns regarding the mask rearrangement technique, it will be highly valuable if experts with domain knowledge on mitochondria EM images can validate the generated images in terms of how realistic they are.

    2. On Page 7, the authors showed that the proposed approach “can improve the segmentation performance by alleviating the false and missed detection cases, respectively.” This is a generic statement and not very clear to me how the two examples in Fig. 2 can fully demonstrate how the proposed method improves segmentation. It would be great for the authors to provide more insights and reasoning in this regard as well as potentially show more examples. In addition, study failure cases: What type of failure modes are there? Is there any type of failure mode with the proposed method which are note present with the baselines?

    3. In Section 3.2 “Learning from limited training data”, it is described that “… decreasing the training volume on the z-axis, which are one-half, one third, one-fourth, and one-fifth, respectively”. How does the selection of sub-volume affect the results (for example, which “half” of the volume)? It seems that here only one consecutive block along the z-axis is selected, but does it preserve more data diversity to select multiple consecutive image blocks spread along the z-axis, each of which have smaller depth (i.e. smaller number of image sections in z-axis)? For example, to reduce a 3D volume of 100 images by half, could it preserve more example diversity to select No. 1-25 and 51-75 image sections, instead of No. 1-50 image sections?

    4. Confusing method description: 4.1. On Page 3, the authors stated that “we apply 5 times downsampling with 3×3×3 convolution of stride 2, so the maximum downsampling rate is 32.” But in Supplementary material Figure 1, it showed 4 downsampling operations. 4.2. In Section 3.1, the size of each training example d by h by w is 32 by 256 by 256, but why it is changed to 33 by 306 by 306 on Page 7 “Learning from limited training data”? I suggest that the authors explicitly clarify and explain the reasoning of their design choices.

    5. For future work, I would also suggest the authors consider more recent generative models for conditional generation of images, such as instanceGAN (Mo et al, ICLR, 2019), SPADE (Park et al, CVPR, 2019), SEAN (Zhu et al, CVPR, 2020) etc.

    6. Typos and grammar: 6.1. Page 2 Paragraph 2 “… obtaining considerable accurate segmentation annotations”. Would it be “… obtaining a considerable number of accurate …”? 6.2. In Page 2 Paragraph 2 and a few other places in the text, “perceptual realistic 3D EM images” could be “perceptually realistic…”.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The topic of data augmentation is of interest to the community and the experiment on reduced number of annotations is intriguing. However, there are concerns about incremental novelty on methodology as well as limited experiments and validation.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    I kept my decision after reviewing the authors’ feedback and the other reviews. Thank the authors for adding validation on an additional dataset. I don’t think the authors convincingly addressed my concerns about the limited novelty of methodology though: extending 2D pix2pix to 3D and the engineering tweak of mask rearrangement are incremental at most. Concerns of ignoring the biologically relevant structures in the image background as well as lack of proper evaluation of the quality of synthesized images were not addressed.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper introduces a new data augmentation method for mitochondria segmentation from electron microscopy (EM) images. The core idea is to use a GAN-based image translation method to generate realistic EM images from given mask images, which supplement limited training data.

    The reviewers agreed that the proposed method is useful and interesting, and can be generalized to other domains. However, there are also several concerns that should be addressed in the rebuttal, summarized as follows:

    • Insufficient validation, tested only a single dataset
    • No details about model training (epochs, hyperparameters, etc) and experiment (running time, memory footprints, etc)
    • Statistical significance analysis is missing
    • Effect of post-processing
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4




Author Feedback

We thank all reviewers for their valuable time and comments. The major concerns are replied below. Due to space limit, we will directly address some detailed comments in the revised version.

R1&R3: Insufficient validation, tested only a single dataset. Reply: We conduct a quick test on the mitoEM-rat dataset mentioned by R3. Using Res-UNet-R as backbone and under data scarcity condition, i.e., 1/64 (10010241024) and 1/640 (40512512) real data for training, the main results are reported below, which confirm the effectiveness of our method again. DataMAP~MAP (w/ ours) 1/64~~84.8±0.8|86.1±1.1 1/640~59.2±2.1|64.4±0.9

R1&R3: No details about model training and experiment. Reply: We train all Mask2EM networks for 20 epochs and crop 4000 patches online in every epoch, which costs about 24-25 hours with 1 TITAN XP GPU. We train 300K iterations for the segmentation network with our method, which costs about 7-8 days with 2 TITAN XP GPUs. As for the other training hyperparameters, we follow pix2pix and Res-UNet-R, which are consistent across all experiments. We use a 24GB memory footprint for both synthesis and segmentation. We will supplement these details in the revised version. Other details can be found in our code, which will be released later.

R1&R3: Statistical significance analysis is missing. Reply: We select the top 5 models for different methods to analyze the statistical significance (mean±standard deviation). Methods~~~~~~~~DSCJACAJIPQ V-Net~~~~~~~~93.4±0.2|87.7±0.2|86.9±0.4|83.4±0.3 UNet++~~~~~~~93.7±0.2|88.2±0.3|87.2±0.9|84.3±0.8 Res-UNet-R~~~~93.7±0.2|88.2±0.3|87.9±0.4|84.7±0.3 V-Net (w/ ours)~~93.9±0.1|88.6±0.2|87.9±0.2|84.7±0.4 UNet++ (w/ ours)~~94.1±0.2|89.0±0.3|88.7±0.3|85.3±0.4 Res-UNet-R (w/ ours)~94.5±0.1|89.6±0.2|89.2±0.4|86.1±0.2

R2: Effect of post-processing. Reply: All experiments in the testing stage use the same post-processing and hyperparameter configuration, which equally improve the results of both baselines and our method.

R2: Better to review previous studies on synthesizing 2D EM images. Reply: Thanks for the suggestion. We will give more comprehensive reviews in the revised version.

R2: Interpretation of results in Table 2. Reply: From Table 2, we can see that when the amount of real data gradually decreases, the improvement of our method over the baseline shows an increasing trend. It indicates that our method actually provides more diverse data distribution by rearranging masks. Even we use the same or fewer real data, we can obtain more diverse samples and thus get better segmentation results.

R3: Incremental novelty on methodology and detailed technical problems. Reply: As the first work to introduce the existing 2D synthesis technique into 3D mitochondria image generation, we believe this work has its own merits in opening a new solution track for an important task. As the reviewer pointed out, the proposed method “makes a lot of sense”, but there are many detailed technical problems that deserve further study along this line (which we think also justifies the value of this work from the other side). We greatly appreciate the reviewer’s suggestions, and we are willing to do more profound research on these interesting problems in the future.

R3: The figure and text mismatch. Reply: We missed a 3D Conv with stride 2 in the figure, which should be added before the first 3D Conv with stride 1.

R3: Data size is confusing. Reply: The size of the cropped images as input to the network is 32256256. 33306306 is the whole size of 1/42 real data.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors provided additional results and addressed most of the reviewer’s comments reasonably well. Reviewer 3 is still not convinced about the technical novelty, which is the weakest point of this paper. However, there are some merits that outweigh the weakness, so I recommend accept.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper may lack of novelty but the solution proposed is efficient and well explained and may lead to further improvement in the future. The authors replied adequately to the main concerns if they manage to include it in the camera ready version. With these changes, acceptance would be possible

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



back to top