Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Junyan Lyu, Pengxiao Xu, Fatima Nasrallah, Xiaoying Tang

Abstract

Whole brain segmentation is vital for a variety of anatomical investigations in brain development, aging, and degradation. It is nevertheless challenging to accurately segment fine-grained brain structures due to the low soft-tissue contrast. In this work, we propose and validate a novel method for whole brain segmentation. By learning ontology-based hierarchical structural knowledge with a triplet loss enhanced by graph-based dynamic violate margin, our method can mimic experts’ hierarchical perception of the brain anatomy and capture the relationship across different structures. We evaluate the whole brain segmentation performance of our method on two publicly-accessible datasets, namely JHU Adult Atlas and CANDI, respectively possessing fine-grained (282) and coarse-grained (32) manual labels. Our method achieves mean Dice similarity coefficients of 83.67% and 88.23% on the two datasets. Quantitative and qualitative results identify the superiority of the proposed method over representative state-of-the-art whole brain segmentation approaches. The code is available at https://github.com/CRazorback/OHSR.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_37

SharedIt: https://rdcu.be/dnwDL

Link to the code repository

https://github.com/CRazorback/OHSR

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The major contributions of this work are the design of a segmentation method that learns hierarchical structural knowledge based on triplet loss and a graph-based dynamic violate margin.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of this work lies in the design of using ontology-based hierarchical brain anatomy for segmentation. By using triplet loss enhanced with a graph-based dynamic violate margin, the method effectively captures relationships across different brain structures.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) While the triplet loss is reasonable and should be helpful to distinguish between different brain structures, it is designed on feature embeddings rather than segmentation network outputs. With this, segmentation results could still have gaps (voxels doesn’t belong to any class) and noisy outputs (neighboring voxels assigned to different labels).

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors’ report on reproducibility appears to be reasonable.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The triplet loss is designed on feature embeddings rather than segmentation network outputs. With this, segmentation results could still have gaps (voxels doesn’t belong to any class) and noisy outputs (neighboring voxels assigned to different labels).

    Could also consider imposing triplet loss on segmentation network outputs.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed segmentation model based on hierarchical structural knowledge and triplet loss is reasonable and novel. Experiments are well designed and convincing.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper presents a new whole-brain segmentation method via ontology-based hierarchical structural relationship. The authors propose the methodology of constructing the brain hierarchy graph and the selection of margin in triplet loss function. The proposed module is incorporated in 3D U-Net for segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea of using brain hierarchy to guide whole-brain segmentation is interesting. The authors implement their idea based on graph construction, which is reasonable. Such a design is able to simulate human’s perception of brain parcellation, and experimental results demonstrate its effectiveness.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors evaluate their method on two public datasets. However, experimental results indicate that the improvement of their method is relatively lower, when combined with a stronger baseline (nnU-Net, for instance). Also, the improvement on a simpler baseline (CANDI) is not as significant as on the JHU Adult dataset. In terms of experimental setting, JHU Adult Atlas dataset is too small but the authors do NOT use cross-validation, which significantly weakens their authenticity.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code link is provided in the paper. Public datasets are used to evaluate the algorithms.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    This paper presents a new whole-brain segmentation method via ontology-based hierarchical structural relationship. The authors propose the methodology of constructing the brain hierarchy graph and the selection of margin in triplet loss function. The proposed module is incorporated in 3D U-Net for segmentation. The method is evaluated on two public datasets and achieves state-of-the-art performance. Overall the paper is interesting; the motivation behind its methodology is reasonable. But several concerns should be addressed:

    • JHU Adult Atlas dataset is too small but the authors do NOT use cross-validation, which significantly weakens their authenticity. Since the dataset is too small, the current split may not reflect the real performance on the dataset.

    • How the hyperparameter $\lambda$ is selected? Any motivation or experimental support behind the hyperparameter selection?
    • In CANDI dataset, why the authors use a different hierarchical relationship (if I understand it right) with the JHU dataset?
    • In Table 3, why the authors do not use OHSR (Type II) when evaluating on vanilla U-Net?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea of using brain hierarchy to facilitate whole-brain segmentation. The motivation behind the design is reasonable and natural. The proposed method outperforms previous ones. But the experimental setting on JHU Adult Atlas dataset significantly weakens their authenticity.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper presents a novel approach of utilizing ontology-based hierarchical structural relationship (OSHR) of the human brain’s anatomy as prior knowledge for whole brain segmentation. The method encodes the OSHR into a voxel-wise embedding space and conducts deep metric learning to separate contextually dissimilar voxels using a triplet loss with a dynamic violate margin.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The paper proposes an innovative approach of incorporating OSHR into deep learning segmentation models for whole brain segmentation. (2) The method achieves state-of-the-art performance on two publicly-accessible datasets. (3) The paper contains well-designed experiments and insightful ablation studies, enhancing the credibility of the proposed method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The paper lacks details about the computational efficiency of the proposed method. (2) Although the authors have provided the hyperparameters and training details, a detailed analysis of the training process and time complexity would be beneficial. (3) Method validation and training stay on the same dataset with limited data, without testing across cites.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Authors claimed that the code will be available at Github.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    (1)Could the authors provide more information about the computational efficiency, such as the time complexity and GPU memory usage? (2)Despite the rapid development of deep learning-based methods, atlas-based methods (given their stability) are still commonly used in the processing of fine-grained brain structures to process subjects, especially in non-human primates. How does the proposed method compare to the existing usual-used approaches for fine-grained brain structures, such as those based on single or multi-atlas registration? (3) Only 8 cases of in-site test data imply possible bias in results and model overfitting.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a novel approach to whole brain segmentation that incorporates ontology-based hierarchical structural knowledge and achieves superior performance over SOTA methods on two publicly-accessible datasets. The paper is well-written and the methodology is clearly explained. However, there are few test data so far, which may not be able to fully prove the generality of the method.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors have addressed all the issues.



Review #4

  • Please describe the contribution of the paper

    The authors propose a deep learning-based method for brain parcelation of a large number of semantic classes (up to 282 classes) of MR images using a novel loss function. The proposed loss function enforces that the feature vectors of voxels belonging to similar semantic classes are also similar, where the similarity of the semantic classes is decribed by their hierarchical relationship within a graph describing the ontological relationship between classes. The authors evalated their method on two publicly available datasets and compared it with state-of-the-art methods with the results showing that the proposed method is able to improve upon the state of the art.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The introduction and description of the method is well written and easy to follow.

    The authors address an interesting research question, namely how to use prior knowledge about the relationship of classes in the presence of a large number of classes. Typically, class labels are simply encoded using a one-hot encoding, which makes all classes orthogonal in target space. Consequently, confusing similar classes produces the same error as confusing very different classes. For example, confusing amygdala with the 3rd ventricle has the same impact on the loss function as confusing the amygdala with the hippocampus. The proposed loss function adds another loss term to the feature representation of voxels, enforcing that similar classes also produce similar feature vectors, so that very distant classes in the ontology are confused less often. The authors also propose a sound method for measuring the similarity of classes in an ontological hierarchy.

    Hinton et al [1] have shown that the similarity between classes can greatly help network training. In their paper, they learn the similarity between classes using a large network, where the similarity is defined as the distance between class logits. Training directly on logits encodes knowledege that, e.g., the 7 looks more similar to the 1 than to the 8 and incorporating that knowledege into the loss function by training against the logits helps hand-written digit classification. In contrast, the authors of this paper describe a method to use prior knowledege about the relationship between classes without the need to train a much larger model to learn those relationships, with similar effectivness, which makes the method more practically useable for segmentation tasks with limited training data and large number of classes.

    [1] Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. “Distilling the knowledge in a neural network.” arXiv preprint arXiv:1503.02531 (2015).

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The biggest weakness of the paper is the description of the evaluation and how the comparison with the state of the art was done. The method was compared against the U-Net, QuickNAT, and nnU-Net but the caption of table 1 suggests that the proposed method was not compared against the state of the art directly, but against a reimplementation. Reimplementing and reproducing methods in prohibitively difficult due to the large number of hyperparameters, choices, augmentation techniques and implementation details, which are often not provided due to size limitations. The best way for a fair comparison is to either compare methods on the same dataset and let each author tune their method for the data and submit the results, or to use the implementation provided by the authors. Especially in case of the nnU-Net, I see no good reason not to use the available implementation. Using the original nnU-Net implementation instead of a reimplementation would have made the comparison much stronger. As it stands, it cannot be judge if the proposed method as a whole is an improvement over the state of the art, which is something that the authors claim in the paper.

    I cannot retrace how the numbers in table 2 (which shows the comparison with the state of the art on the second dataset) were produced. The paper states that those numbers were taken from the original papers, which is a fine approach, but I cannot find those numbers in the papers cited. Paper [20] did not evaluate their method on the CANDI dataset. Also QuickNAT was not evaluated on CANDI in the referenced paper (reference 22). They did do a comparison in a different paper (reference 21), but they only used a subset of CANDI and only for testing. The method was trained on different data. I can find some of the numbers of table 2 in the ACEnet paper (reference 15), but the authors of [15] performed the evaluation using a reimplementation of the compared methods and also did not use the original implementations or values provided by the method authors. Not knowing how the numbers were generated greatly limits the trustworthiness of the comparison. Please reference the correct papers of provide a link to a website where the numbers are taken from. Otherwise, comparing against a reimplementation always leaves room for errors in the reimplementation in favor of the proposed method.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The triplet loss is a sumation over the set T but the set T is not defined.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    “we split the dataset following a previous study” Please summarize shortly for people not familiar with the study. The auhors of [15] mention that they randomly split the dataset into training, validation and test sets without providing the exact splits. Did you also use the same splits or only followed the same procedure (same size of sets) but different cases per set?

    “More dataset and implementation details are provided in the supplementary material.” The supplementary material available to me only provided more details on the evaluation (Fig. 1 - Fig. 3) but no additional details on the implementation. Was this intentional?

    “we use the identical 3D U-Net as the ”3d fullres” configuration in nnU-Net” I interpret this as the same network architecture but not the same training code. If correct, why wasn’t the nnU-Net framework used for training? To get the exact same architecture, you would have to run at least the planning phase of the nnU-Net training anyhow to get the right patch size, batch size and network parameters.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is interesting to the MICCAI community and the method seems novel and suitable to incorporate prior knowledge about the relationship between classes. The biggest weakness is the evaluation or the description of the evaluation. It’s not clear to me how the numbers were produced, which is important to back up the claim made in the paper that the proposed method is superior to the state of the art.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Thank you for clarifying how the comparison was done. I appreciate, that you used the implementation of the nnU-Net directly to run the training. This, in my opinion, is the best way to generate a comparable baseline for datasets, where no baselines are currently available. As for the other methods, please indicate the the original U-net implementation is 2D and that the other methods are reimplementations. From the ACENet paper:

    “We directly compared our method with state-of-the-art competing deep learning methods on the three datasets with the same model training and test settings, including SD-Net (Roy et al., 2017), 2D U-Net (Ronneberger et al., 2015), QuickNAT V2 (Roy et al., 2018), and 3D U-Net (Çiçek et al., 2016). All these methods were implemented with the same network architectures as reported in their corresponding papers, except that 256 filters were used in the 3D U-Net instead of 1024 for reducing the computational cost.” To me, this reads like they used reimplementations so please refer to the results as results produced by the authors of the ACENet paper, instead of results reported in the original papers. Just for clarity.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The novelty (learning ontology-based hierarchical brain anatomy in segmentation task) is well received by reviewers. There are still some points to address, mostly regarding experiments, especially the fairness of comparison with other methods and the significance of accuracy improvement by the proposed method. Detailed parameters and experimental settings should be provided by pointed out by reviewers. The rebuttal should include the answers to the possibility of having unassigned voxels (Reviewer 1) and the concern of Reviewer 4 regarding the reimplementation of existing methods for comparison.




Author Feedback

R1: Imposing triplet loss on outputs to address unassigned voxels We consider introducing hierarchical information into the output space as part of our future efforts. Specifically, we will utilize a modified DiceCE loss, ensuring two important properties: 1) For each voxel, if a node on its corresponding tree is labeled as positive, all its parent nodes in the brain hierarchy graph should also be labeled as positive. 2) For each voxel, if a node on its corresponding tree is labeled as negative, all its child nodes should also be labeled as negative.

R2&R3: Cross-validation on JHU dataset We compare nnU-Net with OHSR using 3-fold cross-validation on JHU dataset. OHSR consistently outperforms nnU-Net, with their mean DSCs being respective 85.70 and 85.12.

R2: Hyperparameter lambda Lambda is empirically set to 0.5 and follows a cosine annealing policy. This choice is motivated by the prioritization of optimizing voxel-wise segmentation results to establish a stable feature space. As soon as the between-structure semantic similarity gets appropriately defined, the weight of the triplet loss gradually increases, refining the feature space using hierarchical information.

R2: Different hierarchical relationship in CANDI CANDI has distinctly-defined labels from JHU dataset, reflecting different ontology levels. For instance, the “left cortical white matter (WM)” in CANDI encompasses multiple clinically defined cortical WM structures, similar to a lv2 structure in JHU dataset. Consequently, we can only group the labels in CANDI into WM, GM or CSF categories.

R2: OHSR (Type II) on vanilla U-Net We apologize for the omission. Type II (77.99) achieves superior performance compared to Type I (77.34) when applied to vanilla U-Net, which aligns with the results obtained from nnU-Net.

R3: Computational efficiency The network parameter counts of vanilla U-Net, QuickNAT, ACENet, nnU-Net, and OHSR are respective 6.44×10^6, 3.55×10^6, 4.14×10^6, 1.89×10^7, and 1.89×10^7. We also assess the computational overhead of OHSR when utilizing nnU-Net as the backbone. Although the training time for each step increases from 0.49s to 0.52s, the inference time is unaffected since OHSR only constrains the training process.

R3: Multi-atlas segmentation results on JHU dataset We compare the SOTA multi-atlas segmentation method PISCL with OSHR on JHU dataset. PICSL is available in the ANTs software. The mean DSCs of PISCL and OHSR are respective 80.03 and 83.67.

R4: Reproducibility of compared methods In terms of JHU dataset, U-Net and QuickNAT are reproduced based on their original repositories, as no DL-based methods have ever been tested on this dataset. We meticulously tune the hyperparameters to ensure optimal performance for these compared methods. As for nnU-Net, we evaluate the results derived from its official implementation. The training command is “nnUNet_train 3d_fullres nnUNetTrainerV2_noMirroring”, which guarantees full reproducibility. Regarding CANDI, all results, except for those from nnU-Net, are directly copied from the ACENet paper. For OSHR and nnU-Net, we employ the same data split to ensure fair comparisons.

R4: Split of CANDI The official repository of ACENet does not provide the dataset split information. Hence, we follow the paper and randomly split the dataset into training (60%), validation (10%), and testing (30%) sets. To ensure authenticity, we perform a random split of the dataset for ten times. The mean and std of the DSC for these ten splits are respective 88.37 and 6.32, even surpassing the results in the manuscript.

R4: Implementation details We apologize for not having provided more implementation details in the manuscript because only figures and tables are permitted in the supplementary material. We would like to clarify that we use identical hyperparameters as those specified in the “debug.json” file of nnU-Net. All implementation details of OHSR and other compared methods will be included in the GitHub link.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Reviewers’ critiques were mostly addressed, which were mostly about experimental details and fair comparisons with STOA methods, in the rebuttal. Two reviewers increased their ratings after receiving the rebuttal. There is a consensus among reviewers on the paper’s novelty (learning ontology-based hierarchical brain anatomy in segmentation).



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    All reviewers agreed on the novelty of the work. The authors have addressed the concerns on evaluation and R4 has increased the score to 5. Congratulations!



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents whole-brain segmentation method via ontology-based hierarchical structural relationship. The reviewers agreed that the work is novel however they raised some concerns related to the experiments and the comparisons. These concerns have been convincingly addressed in the rebuttal. In the comera-ready version please include the details, clarifications, and discussion you have provided in the rebuttal.



back to top