Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Jiawei Liu, Fuyong Xing, Abbas Shaikh, Marius George Linguraru, Antonio R. Porras

Abstract

Automatic anatomical segmentation and landmark localization in medical images are important tasks during craniofacial analysis. While deep neural networks have been recently applied to segment cranial bones and identify cranial landmarks from computed tomography (CT) or magnetic resonance (MR) images, existing methods often provide suboptimal and sometimes unrealistic results because they do not incorporate contextual image information. Additionally, most state-of-the-art deep learning methods for cranial bone segmentation and landmark detection rely on multi-stage data processing pipelines, which are inefficient and prone to errors. In this paper, we propose a novel context encoding-constrained neural network for single-stage cranial bone labeling and landmark localization. Specifically, we design and incorporate a novel context encoding module into a U-Net-like architecture. We explicitly enforce the network to capture context-related features for representation learning so pixel-wise predictions are not isolated from the image context. In addition, we introduce a new auxiliary task to model the relative spatial configuration of different anatomical landmarks, which serves as an additional regularization that further refines network predictions. The proposed method is end-to-end trainable for single-stage cranial bone labeling and landmark localization. The method was evaluated on a highly diverse pediatric 3D CT image dataset with 274 subjects. Our experiments demonstrate superior performance of our method compared to state-of-the-art approaches.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_28

SharedIt: https://rdcu.be/cVRZa

Link to the code repository

https://github.com/cuMIP/ctImage

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a method for cranial bone segmentation and landmark localization based upon the well known unet architecture.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method is interesting and appears to be performing well. The authors had a thorough analysis of their experimentation with both ablation studies and p-value tests. Adequate discussion about related works

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The methodology section could be written a bit better with some sections being a bit too consice for an easy understanding of the method- I assume this is the product of heavy editing to make the paper fit in the prescribed page limit. There are some questions though that naturally arise from the description of the method. • why is the context module only taking into account features from the bottleneck stage of the U-Net? It is well known that the most crucial part of a U-net is the skip connection - I would assume more interesting performance and information could be extracted if the skip connection information was also incorporated.

    • Why were the landmarks regressed as heat maps and not as coordinates ?

    Moreover, there is comparison with only 1 other piece of literature - It would be helpful if the authors compared and discussed the benefits and limitations of their method as they relate to other papers.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Paper appears to be meeting reproducibility criteria

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    See above

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper appears to be interesting and could potentially be beneficial to the community

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    This paper proposes a context encoding-constrained neural network for single-stage skull bone segmentation and landmark detection from cranial CT images. The authors designed a context encoding module based on U-Net for feature learning that considers the global image context, .and introduced an auxiliary regression task that models the relative spatial configuration of the anatomical landmarks to promote the landmark detection. The method was evaluated on pediatric 3D CT images, showing superior performance compared with the related methods

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors used displacement vector maps to learn landmark context to improve representation learning.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It is unclear what are challenges of automatic skull bone segmentation and landmark detection from cranial CT images. Compared with related works (i.e., Ref. 14, 18, 19, 20), the task of this paper is less than difficult, as the skull structure is generally easy to extract from CT images and the number of localized landmarks reported in this paper is very limited. Besides, only a few of relevant methods were tested for this relatively simple task.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It is not difficult to reproduce the method of this paper with the released code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The qualitative evaluation results should be reported.
    2. More relevant methods should be included in the comparison evaluation.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is unclear what are challenges of automatic skull bone segmentation and landmark detection from cranial CT images. Compared with related works (i.e., Ref. 14, 18, 19, 20), the task of this paper is less than difficult, as the skull structure is generally easy to extract from CT images and the number of localized landmarks reported in this paper is very limited. Besides, only a few of relevant methods were tested for this relatively simple task.

  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    In this study, the authors developed a novel method for a single-step segmentation and landmark localization of the cranial bones. The authors achieved this by incorporating a context encoding network into the U-net architecture. This context encoding mechanism helped capture image related feature information in order to avoid isolating pixel level prediction from the global image context. At the same time, the authors have added an auxiliary task of modeling the relative relationships of different anatomical landmarks spatially.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    As segmentation and landmark annotation is a tedious process which has high variability between raters and subject to errors. While previous studies make use of a multi-step process, this study attempts to perform the segmentation and detect landmarks jointly in a single step and on a 3D image.

    The context encoding attention mechanism which is further augmented by landmark displacement maps to guide the learning of context. Together with the spatial relationships of all the other landmarks, the final objective for the model is formulated as a weighted sum of all the regularizing terms. Thus enforcing contextual learning in a novel way.

    The experimental design is well set up and the authors have also evaluated the contribution of the context encoding mechanism through an ablation study.

    Moreover, The paper is well written, with the information presented in a clear and logical manner. At the same time, the methods and results are well explained.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There are no major weaknesses present in the paper. However, the paper lacks visualization of the outputs generated along with the comparisons with other networks.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducibility is highly likely, as the novel contextual encoding mechanism is well explained in detail. Furthermore, implementation details such as training parameters, software and hardware details are reported as per the best practices.

    However, access to the implementation code should increase the reproducibility further along with information of pre/post processing of data.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The authors should clarify if any pre/post processing was performed for the images and the labels, along with any augmentation strategies that were utilized (if any).

    Moreover, the authors should specify the list of all the weighting parameters, from which the final values were empirically selected. Finally, the authors should consider adding visualizations of the images and its corresponding predicted segmentations and landmarks for all the methods compared.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The single-step joint prediction of segmentation and landmarks architecture with contextual encoding mechanism proposed by the paper is highly applicable in many medical imaging tasks.

    At the same time, the paper explained the novel mechanism with clarity.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This work presents a multi task learning algorithm for simultaneous cranial bone segmentation and landmark localization from pediatric CT images. The idea while not entirely novel is relevant in the specific application. Changes during development of the skull while growing makes this task more challenging in infants/children compared to adults. Still, as one reviewer mentions, the problem of CT skull segmentation is relatively easy compared to other anatomical structures, and the work does not highlight the specific challenges of their dataset in more detail. Moreover, comparison to state of the art methods, both pre deep learning and with deep learning, are not performed extensively - there is only one state of the art method for skull segmentation that is compared here. Additionally, the landmark localization task is not compared with state of the art methods at all.

    The idea of using context is not new, one may argue that context is already picked up in a standard UNet through the skip connections and the multi resolution hierarchy. While the authors show that their context encoding is beneficial compared to a standard UNet, it is still not fully convincing where the performance improvement comes from. Especially because the context encoding is solely applied at the bottleneck layer of the UNet, which has lowest resolution, but still aims to model landmark displacement fields, where the defining landmarks are defined on the highest resolution, but the bottleneck layer resembles the coarsest resolution of feature maps. This is counter intuitive. Finally, no limitations of the proposed method are discussed and it is hard to reproduce the work since the sole dataset that is evaluated seems to be private. Showing generalization of the idea to another public dataset would have been beneficial.

    Taking reviewer assessments into account, this work is to be invited for a rebuttal. Authors should clarify the issues raised by reviewers and especially focus on the issues raised above.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6




Author Feedback

We appreciate the comments from the Reviewers (R1, R2 and R3). Challenges of the task (R2). We agree that it might be not very challenging to perform binary bone segmentation from CT images. Our method, however, does not perform a binary segmentation of the skull. Instead, it aims to identify the calvaria (the part of the cranium that encloses the brain) via identification of landmarks at the cranial base [2], and to label differently each of the five major cranial bones in the calvaria. Bone separation at the cranial sutures is often hard to identify and hence, bone labeling is a more challenging task because of the small size of the sutures, the limited image resolution, and the large cranial suture variability (both in size and Hounsfield intensities) between pediatric patients at various stages of development, especially in the first two years of life. This is evidenced by the differential performance of our network between the two tasks: Dice score of 87.02% at the binary segmentation task (we will add this to Table 2) and of 81.96% at bone labeling. Moreover, although we aim to locate only four landmarks, the glabella and opisthion are usually classified as Bookstein type III and II landmarks, respectively. This means that their accurate placement is highly challenging because of their difficult anatomical definition and identification. We will clarify these in the revision of Section 1. Comparison with more methods (R1 and R2). In addition to the comparison presented in our manuscript, we compared with the implementation from [19] and [20]. However, the 2D scheme adopted in [19] prevented the identification of bones that look symmetrical in 2D slices (e.g., the left and right frontal bones look the same in sagittal slices). Moreover, cranial base landmarks cannot be reliably identified using only 2D slices (we obtained an average error of 7.29 ± 3.77 voxels), and the long short-term memory model used in [19] for closely spaced landmarks would not be applicable due to our small number of spaced cranial landmarks. Hence, significant modifications would be necessary to adapt [19] to our application. In addition, we could not obtain successful results by implementing [20]. Although we contacted the authors and asked for their implementations, we did not receive a response. None of previous implementations were available to us. Context encoding (R1). We would like to clarify that the purpose of our context-encoding module and attention mechanism is to highlight global context-related features to enhance the extraction of relevant information for both bone labeling and landmarks localization. Its purpose is not to recover accurate landmark coordinates in the bottleneck. Instead, our channel-wise attention mechanism upweights dense features that are indicative of the global anatomical location and orientation of the cranial base encoded as landmark displacement maps, which is important information for both landmark detection and bone labeling. We will clarify this by adding “Our context encoding module upweights global features indicative of the global anatomical location and orientation of the cranial base encoded as landmark displacement maps” to Section 2.2. Heatmap regression (R1). We chose a heatmap-based regression because it has been shown great performance in landmark localization [21], as mentioned in Section 2.1. Reproducibility and private data (R2 and R3). Since private health information is present in the images (e.g., face), HIPAA regulations forbid sharing our data. However, as indicated in the submission, we will make our model and code publicly available upon paper acceptance so other researchers can evaluate the performance of our model with other datasets. Qualitative results (R2 and R3). Due to the page limit, we cannot provide prediction examples and qualitative comparisons in the manuscript. We will provide qualitative results in supplementary materials and together with our implementation.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The key strength of this work is a task specific pediatric bone landmark localization tool that features context encoding. Authors addressed reviewer concerns regarding the difficulty of the task, and clarified their context encoding strategy. All reviewers agree on the merit of this work after the discussion.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal addressed the conerns on challenges and context encoding part. All reviewers suggest to accept after rebuttal.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The strength of the paper is the proposition of a novel context encoding-constrained neural network for single-stage cranial bone labeling and landmark localization. Authors’ rebuttal clearly addressed reviewers’ concerns, as reflected by the changing of review opinions.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



back to top