Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Tao Hu, Hayato Itoh, Masahiro Oda, Yuichiro Hayashi, Zhongyang Lu, Shinji Saiki, Nobutaka Hattori, Koji Kamagata, Shigeki Aoki, Kanako K. Kumamaru, Toshiaki Akashi, Kensaku Mori

Abstract

Automatic segmentation of substantia nigra (SN), which is Parkinson’s disease-related tissue, is an important step toward accurate computer-aided diagnosis systems. Conventional methods for SN segmentation depend heavily on such limited modalities of magnetic resonance imaging (MRI) as neuromelanin and quantitative susceptibility mapping, which require longer imaging times and are rare in public datasets. To enable a multi-modal investigation for SN anatomic alterations based on medical bigdata researches, the need for automated SN segmentation arises from commonly investigated T2-weighted MRIs. To improve the performance of the automated SN segmentation from a T2-weighted MRI and enhance the model generalization for cross-center researches, this paper proposes a novel test-time normalization (TTN) method to increase the geometric and intensity similarity between the query data and the model’s trained data. Our proposed method requires no additional training procedure or extra annotation for the unseen data. Our results showed that our proposed TTN achieved a mean Dice score of 71.08% in comparison with the baseline model’s 69.87% score with in-house dataset. Additionally, improved SN segmentation performance was observed from the unseen and unlabeled datasets.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16449-1_70

SharedIt: https://rdcu.be/cVRXI

Link to the code repository

https://github.com/MoriLabNU/TTN_for_SN_segmentation

Link to the dataset(s)

https://openneuro.org/datasets/ds003653/versions/1.0.0

https://human.brain-map.org/mri_viewers/data


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors use two 3D U-Net vanilla models in a sequential manner to segment substantia nigra in MRI. This model represents a coarse-to-fine cascaded network. The first network generates a ROI that is used by the subsequent network to segment substantia nigra. Furthermore, during inference stage, the authors propose a test-time normalization to boost segmentation accuracy. The final segmentation is the average probability among the input images from the time-test normalization. Validation is performed using an atlas-based metric.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors proposed a novel way to use test-time normalization. An affine transformation and histogram matching are used to normalize the input query. 84 cases were used for training, 52 for testing, and 20 for validation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The contribution is merely incremental. 3D U-Net, sequential networks, and time-test normalization are well known techniques. There is a lack of description of the in-house dataset. How many experts labeled the in-house dataset? Experimental section must be expanded. Average Dice Index is not enough to assess the performance of the proposal. Specially when the difference is < 2% (Table 1).

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Two datasets are publicity available. Experimental part sounds technically correct, but short.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Authors need to improve and extend Experiments and compare their proposal with SOTA methods. Authors need to improve Conclusions. They only summarized the paper.

    Minor corrections: In Page 2. deep learning technique have achieved -> deep learning techniques have achieved In Page 2. Inspired by the work of Alice -> Inspired by the work of Le Berre et al. In Fig. 1 prposed -> proposed In Section Methods squentially -> sequentially In Section 4 dice -> Dice

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    2

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Marginal contribution. No comparison of the proposal against SOTA methods. No conclusions.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    Authors have addressed most of my comments. However, I still believe their contribution is insufficient for MICCAI standards.



Review #2

  • Please describe the contribution of the paper

    This paper introduces substantia nigra segmentation from T2-weighted imaging since these scans tend to be more readily available in large open-source datasets. The authors show that using a test-time normalization (TTN) method can help increase segmentation of substantia nigra (SN) accuracy. The same model was used on several different combinations of preprocessed data, such as using histogram matching and an asymmetric loss (ASL) function on the training set, to test which procedures worked best for getting the best SN segmentation. Ultimately, TTN with ASL, was shown to be the best method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper is extremely detailed in writing every step taken to create the model, run the model, and even the values for each hyper-parameter used and tuned. This makes the methods highly reproducible. Although the model was trained on in-house data, which is not accessible to others outside the lab, anyone trying to reproduce the method could use the 2 open-source databases of images the paper mentions, to train and try out the model.

    2. This is a novel approach to getting accurate segmentations of the substantia nigra without using an atlas or without using imaging such as a neuromelanin scan. T2w, as mentioned by the authors, is not typically an imaging modality used for this kind of segmentation, however, this is an image that is typically acquired, clinically, next to the T1, and therefore is more readily available in open-source datasets.

    3. This definitely has clinical relevance, as being able to segment the substantia nigra and potentially perform some quantitative metrics such as size or any other relevant clinical feature metrics can be of great value.

    4. For the most part, every step of the pipeline is explained really well. The reasoning behind including steps, such as histogram matching, is also explained really well in the ‘Discussion’ section.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    N/A

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    1. In-house data was used to train the model. Open-sourced data sets were used for testing generalization of the methods

    2. The authors give very detailed descriptions of the methods used which therefore makes the methods highly reproducible.

    3. This is mentioned in strengths as well, even though in-house images are used for training the model, the other 2 publicly available datasets that the authors used for generalization of the method can be used to reproduce the results.

    4. Also, speaking to the data, the authors were able to generalize their model to 2 publicly available datasets

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This was a solid paper, there weren’t too many things to comment on, except for a few suggestions…

    1. It may be helpful, as far as reproducibility, to include metrics such as repetition time and echo time for the T2-weighted images collected in-house.

    2. At the end of the methods section, the authors mention proposing a post-processing re-threshold procedure, this should be explained. Maybe give a few sentences on what exactly was done regarding re-thresholding, post processing.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    8

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This was a solid paper that gave a very detailed description of the method. The authors considered previous methods and implemented methods they thought could increase accuracy of their model. They even generalized to publicly available data. They used an atlas in this case because publicly those datasets did not provide annotations. The novelty is not only in the approach but in the fact that they used T2-weighted images to do the segmentation, knowing that this is an image that is typically available. Other groups have tried to solve this problem using the T2-weighted image, however, this does not take away from the novelty of this method.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    This paper presents a test time normalization method using an affine registration and histogram matching to improve the model generalization of substantial nigra segmentation during the inference time. It is said to be resulting in increased segmentation accuracy and the estimation of model uncertainty. Proposed results tend to perform better than the SOTA in terms of mean Dice score and in unseen datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • A novel TTN method based on spatial and intensity attributes for accurate SN segmentation and better model generalization.
    • Prior atlas-based likelihood estimation to examine the segmentation output on unlabeled datasets.
    • Fair comparison of the proposed work against the research being done in this application.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Some of the areas of the paper are unclear and need to be clarified. For eg. post-processing re-threshold to maximize H^, How is N=10 for support set chosen? Instead can this be set based on some pre-defined Dice threshold? In the Qualitative evaluation, authors claim TTN helps the model to identify SN regions better without showing the GT. To this end, labeled sample could have chosen. Further, it is unclear from Fig 3 that “the estimated uncertainty maps indicated larger oscillations in the boundaries of SN”. I see these oscillations throughout the segmented regions. Also, a color bar could be useful here.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Authors have done a good job in providing experimental details and the employed datasets to help reproduce their work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • Qualitative evaluation section can be improved by choosing examples to compare against the gold standard.
    • Some sentences can be phrased better specifically starting sentences of a Section for eg. Section 2.1, “To further boost the SN segmentation…”. Similarly Section 2.2 Writing this in active voice will help the reader understand the essence of that section better.
    • Please explain how a large computation time can be handled during the inference stage moving forward.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Although this work could be potentially beneficial towards diagnosing one of the neurodegenerative disorders, the novelty is not substantial enough to be acceptable considering MICCAI standards. Further, the qualitatively results do not clearly showcase the essence of this approach in segmenting the desired structures.
  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper describes a method for substantia nigra segmentation that includes test-time normalization for robust performance. The reviewers acknowledged practical value of this work, but also raised questions related to novelty and clarity of the paper. The authors should provide their feedback on these issues in the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7




Author Feedback

We appreciate all the reviewers for their constructive feedback. To clarify the suggested unclear points, we give the following replies to comments.

1)Methodological novelty(to Reviwer#1) One of our major contributions is the proposal of the support-set-based test-time normalization (TTN) rather than a specific neural network. While the previous test-time augmentation (pTTA) methods like Ref. [11] are designed for specific training settings, the proposed TTN is relatively independent of the training settings. Therefore, different from recent pTTA methods, the proposed TTN is available for any trained network and is independent of augmentation methods adopted in the prior training. No augmentation is also OK. To the best of our knowledge, the proposed support-set-based TTN is a new approach. Our second contribution is the proposed template-based normalized likelihood for post-processing and empirical evaluation of segmentation.

2)SOTA comparison(to Reviwer#1) We regard the popular and recent test-time augmentation method (pTTA) [11] as SOTA of TTA techniques. The pTTA is developed for the similar segmentation tasks (lesions in brain) and equipped with theoretic formulation, making it a solid representative of the TTA techniques in medical applications. Table 1 showed the improvement by proposed TTN in comparison to pTTA. Besides, compared with pTTA, the proposed TTN requires no additional constraints on augmentation procedures, allowing the quick integration of it and the pre-trained segmentation models.

3)Annotations(to Reviwer#1) The manual annotations were performed by a board-certified radiologist with 10 years of experience specializing in neuroradiology, and then checked and confirmed by two other experts.

4)Motivation of the evaluation with Dice(to Reviwer#1) The target SN is tiny, and the SN volume has been regarded as an important biomarker. Dice coefficient is found efficient to measure the volumetric overlap and tiny outliers [a], therefore Dice index is suitable for showing the validity of SN segmentation. HD 95 (mean±std) also indicates the improvements: 2.19±0.69 (ASL+TTN), 2.71±0.85 (ASL+pTTA), 2.26±0.81(ASL).

5)Scan setting of T2W(to Reviwer#2) Thanks. The major scan parameters for our in-house dataset were as follows: 3200/564 ms repetition time/echo time; 3.86 mm spacing between slices; 256 × 240 mm field of view; 0.8 mm slice thickness.

6)Selection of N(to Reviwer#3) N was selected based on the experience of pTTA. In this paper, 10 is shown close to the smallest number with the converged performance and is then expected to reach trade-off between stable performance and less computation-tine. Yes, soft Dice can be an alternative.

7)Qualitative evaluation(to Reviwer#3) We agree with the opinion that the ground truth improves the clarity of Fig 3. But since we intend to stress the enhanced generalization ability by proposed method, the samples from unseen datasets are more appropriate. Additionally, by applying the proposed re-threshold procedure, the segmented SN can be guaranteed in the sense of higher likelihood derived from the prior template, providing empirical references for the qualitative evaluation. Thanks for your comment about the uncertainty maps. The ‘jet’ color bar was actually used with the warmer color indicating the higher uncertainty. We will include it. The previous description indeed caused the mentioned misunderstanding. A more accurate expression is that “The regions with vague boundaries tend to present higher uncertainty”.

8)Computation-time handling(to Reviwer#3) An efficient way to reduce computation-time is setting a smaller N (8 suggested). The second is to employ a light algorithm for registration, for example, single resolution involved algorithm and down-sampled MRI pairs.

9)Unclear descriptions and typo errors(To all) We will edit and refine them. [a]http://insightsoftwareconsortium.github.io/SimpleITK-Notebooks/Python_html/34_Segmentation_Evaluation.html




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal addressed most of the concerns.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    10



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors proposed a a test-time normalization methods and received diverse opinions during the initial review. Although the work is sound and the rebuttal has addressed a number of concerns, the major concerns remains its limited novelty for MICCAI standard.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The main and the most concerning issues raised by the reviewers were the limited novelty and insufficient comparison with competing methods. The rebuttal has not convincingly addressed these concerns. Give the numerous existing TTA methods it is still unclear what is the methodological contribution of the proposed one. A comparison to only a single other method ([11]) is presented and the entire experimental setup is limited, given that [11] has not been designed for the particular in-house dataset used. Finally, given that the main contribution is the TTN, which the authors claim to be independent of the training setting, I would expect evaluation on different other (preferably public) datasets.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



Meta-review #4

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Both reviewer and AC recommendations on this paper were split with a large divergence. The PCs thus assessed the paper reviews, meta-reviews, the rebuttal, and the submission. While novelty and clarity were noted as the main concerns to be addressed, the primary AC has found that these concerns were adequately addressed during rebuttal and recommended Acceptance. While some expressed reservations about the contribution of the work. others expressed enthusiasm and support for the novelty and clinical relevance of the work. The reviewers especially appreciated that the presented method was trained on the in-house data but evaluated for generalization on two public datasets. The PCs agreed with the convincing arguments of the reviewers and felt that the weaknesses as pointed out were outweighed by the strengths. The final decision of the paper is thus accept.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



back to top