Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Dazhou Guo, Jia Ge, Ke Yan, Puyang Wang, Zhuotun Zhu, Dandan Zheng, Xian-Sheng Hua, Le Lu, Tsung-Ying Ho, Xianghua Ye, Dakai Jin

Abstract

Visible lymph node (i.e., LN, short axis≥5mm) assessment and delineation in thoracic computed tomography (CT) images is an indispensable step in radiology and oncology workflows. The high demanding of clinical expertise and prohibitive laboring cost motivate the automated approaches. Previous works focus on extracting effective LN imaging features and/or exploiting the anatomical priors to help LNsegmentation. However, the performance in general is struggled with low recall/precision due to LN’s low contrast in CT and tumor-induced shape and size variations. Given that LNs reside inside the lymph node station (LN-station), it is intuitive to directly utilize the LN-station maps toguide LN segmentation. We propose a stratified LN-station and LN sizeencoded segmentation framework by casting thoracic LN-stations into three super lymph node stations and subsequently learning the LN size variations. Four-fold cross-validation experiments on the public NIH 89-patient dataset are conducted. Compared to previous leading works, our framework produces significant performance improvements, with an average 74.2% (9.9% increases) in Dice score and 72.0% (15.6% increases)in detection recall at 4.0 (1.9 reduces) false positives per patient. When directly tested on an external dataset of 57 esophageal cancer patients, the proposed framework demonstrates good generalizability and achieves70.4% in Dice score and 70.2% in detection Recall at 4.4 false positives per patient.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16443-9_6

SharedIt: https://rdcu.be/cVRyc

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    This paper proposes a new approach based on LN-station-specific and size-aware LN segmentation framework. As a result, high performance LN segmentation could be achieved with good generalizability.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Stratified LN-statin and LN size encoded segmentation are validated based on the LN size variations. Three super LN stations and learning framework were proposed and showed the high performance in metrics compared with the previous approaches.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Three super LN stations are used in this paper but it is not clear how many super stations are optimal in general.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It is better that code is available to confirm the whole procedures proposed in the paper although the paper is well written and reproducibility can be recognized.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The proposed approach improves the LN segmentation performance for the corrected test data of actual esophageal cancer patients. The effectiveness of the original idea is shown in the experimental evaluations. It is better that some failure examples are shown with the reasons as far as the segmentation performance is not perfect.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Three super LN stations and learning framework are most important idea of the paper. Proposed model consisting of super station based stratified encoders, size-aware decoder branches and a post fusion module performs high segmentation performance. It is shown new approach improves the segmentation performance further.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper describes a novel way to segment thoracic lymph nodes (LN). By first combining LN-station to form 3 “super-stations”, then differentiating between large and small LNs, and finally going through a post-fusion module to generate the final prediction. This is an interesting framework made to guide the learning, and it succeeded in improving the performance comparing to other methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The formulation of the framework is interesting. It seems intuitive to group the LN-stations to form “super-stations” to try to help the learning. Similarly with the large/small differentiation. These “pre-processing” of the data seemed to have helped the learning.

    The use of an external dataset to validate the result is also interesting, as they have different resolution comparing to the training data.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    One would be tempted to think why a CNN would not be able to figure out by itself the grouping method for the super-stations, nor the large/small differentiation.

    While the criteria for the super-station grouping is intuitive, perhaps the authors should try different criteria and see the change of performance. Also with the large/small differentiation, in which values other than 10mm could be tried.

    The datasets used in external testing are substantially different in terms of slice spacing, as these are 5mm vs 1.2mm in training data. Also, these are all CT scans from patients with esophageal cancer, and I am not sure if there is any bias because of this.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Adequate.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    For the CT scans, the slice thickness may have an effect. It is unknown if these were contiguous slices or overlapping slices.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    There does not seem to have enough contribution for an acceptance. The method, however, has good results comparing to others.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #4

  • Please describe the contribution of the paper

    To overcome the difficulties of segmenting visible lymph node (LN) from CT images, a novel LN-station-specific and size-aware LN segmentation framework is proposed, which can explicit utilize the LN-station priors and learn the LN size variance. Two-stage learning process is proposed, thoracic LN-stations are segment and then grouped into 3 super lymph node stations firstly. A multi-encoder deep network is designed to learn LN-station-specific LN features; secondly, to learn LN’s size variance, two decoding branches are proposed to concentrate on learning the small and large LNs, respectively. Validated on the public NIH dataset and further tested on the external esophageal dataset, the proposed framework demonstrates high LN segmentation performance while preserving good generalizability.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Thoracic LN-stations are segment and then grouped into 3 super lymph node stations firstly. A multi-encoder deep network is designed to learn LN-station-specific LN features
    2. To learn LN’s size variance, two decoding branches are proposed to concentrate on learning the small and large LNs, respectively.
    3. Validated on the public NIH dataset and further tested on the external esophageal dataset, the proposed framework demonstrates high LN segmentation performance while preserving good generalizability.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. In a word, this paper is not well organized, and the writing is needed improved thoroughly. And the English of the manuscript should be significantly polished before further consideration for accept.
    2. The architectural settings are rarely discussed. The reason of choice of nnunet blocks should be explained. And the structure of nnunet block should be described simply.
    3. The images offered in the manuscript are low-quality. It is recommended to improve with high-quality images. Moreover, it is better to repaint some illustrations with unclear intentions. For example, such Fig 2, the contents are too small.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code of this work were not provided. The reproducibility is slightly worse.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    To overcome the difficulties of segmenting visible lymph node (LN) from CT images, a novel LN-station-specific and size-aware LN segmentation framework is proposed, which can explicit utilize the LN-station priors and learn the LN size variance. Two-stage learning process is proposed, thoracic LN-stations are segment and then grouped into 3 super lymph node stations firstly. A multi-encoder deep network is designed to learn LN-station-specific LN features; secondly, to learn LN’s size variance, two decoding branches are proposed to concentrate on learning the small and large LNs, respectively. Validated on the public NIH dataset and further tested on the external esophageal dataset, the proposed framework demonstrates high LN segmentation performance while preserving good generalizability. This study provided a meaningful approach, but there are several weaknesses:

    1. In a word, this paper is not well organized, and the writing is needed improved thoroughly. And the English of the manuscript should be significantly polished before further consideration for accept.
    2. The architectural settings are rarely discussed. The reason of choice of nnunet blocks should be explained. And the structure of nnunet block should be described simply.
    3. The images offered in the manuscript are low-quality. It is recommended to improve with high-quality images. Moreover, it is better to repaint some illustrations with unclear intentions. For example, such Fig 2, the contents are too small.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    To overcome the difficulties of segmenting visible lymph node (LN) from CT images, a novel LN-station-specific and size-aware LN segmentation framework is proposed, which can explicit utilize the LN-station priors and learn the LN size variance Validated on the public NIH dataset and further tested on the external esophageal dataset, the proposed framework demonstrates high LN segmentation performance while preserving good generalizability. Although there is no novel structure or theories proposed in this study, the proposed method demonstrates high LN segmentation performance while preserving good generalizability. Thus, I suggest receiving it after major revision.

  • Number of papers in your stack

    7

  • What is the ranking of this paper in your review stack?

    6

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper presents a LN segmentation method by integrating the information of LN station stratification and size. It is expected in the rebuttal that the authors could further clarify the contribution.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4




Author Feedback

We thank all reviewers for their comments, especially in noting that our paper proposes an interesting/intuitive/novel idea to segment lymph nodes (LNs), achieves high performance on a public dataset, and shows good generalizability on an external dataset. We emphasize our main contributions and address individual concerns as follows.

Main contributions: 1) We are the first deep learning work to exploit LN-station to tackle the challenging LN segmentation problem explicitly. 2) Motivated by clinical insights that LNs in different stations exhibit different levels of identification uncertainties and enlarged LNs often yield different patterns and shapes from the smaller ones, we design a new deep segmentation architecture with multi-encoding paths with each to focus on learning the LN features in a specific LN Super-Station (SS) and two decoding branches to concentrate on learning the small and large LNs explicitly. 3) Extensive experiments demonstrate the significantly improved performance on a public NIH dataset and an external testing dataset, as compared to a diverse leading approaches

Q1: Different criteria to group SS. In our paper, we form three LN SS (stations 1-4, 5-9 and 10-14) based on oncologists’ clinical experience. We have also tried other grouping criteria where results are listed below: – Two SS: stations 1-9 (mediastinal), 10-14 (lung); – Five SS: stations 1-4 (superior), 5-6 (aortopulmonary), 7 (subcarinal), 8-9 (inferior mediastinal), 10-14 (lung); – Six SS: stations 1 (supraclavicular), 2-4 (superior mediastinal), 5-6 (aortopulmonary), 7 (subcarinal), 8-9 (inferior mediastinal), 10-14 (lung). In NIH dataset, DSC for two, five, six SS are 70.9%, 73.8%, 72.6%; Recall are 67.8%,71.5%, 70.2%; FP-PW are 4.6,4.0, 4.1. Two SS lead to the lowest performance. Five SS achieve a slightly lower performance as compared to our adopted three SS. When six SS are used, the performance witnessed a clear drop. Although LN’s intra-class variations (e.g., texture, shape) could be well captured by more SS encoders, too many encoders would introduce more learnable parameters and increase the model optimization difficulty leading to decreased performance.

Q2: LN-size stratification criteria. In addition to 10mm, we have tried 7, 15, and 20m sizes. DSC for other setups are 73.4%, 74.3%, and 73.9%. The performance is comparable when using 10mm and 15mm criteria. Hence, we simply follow RECIST guideline [20] to use 10mm as our size stratification criteria.

Q3: If CNN can learn station grouping and LN size differentiation by itself? The context-based CNN, by its nature, cannot explicitly learn the LN-station grouping or LN-size stratification. Therefore even the strong baseline nnUNet, only leads to a DSC of 58.2%, and a Recall of 55.3% at 6.2 FP-PW (much inferior to our method). LN segmentation is an extremely difficult yet clinically desirable task, so a suitable way to decompose its segmentation complexity would bring great performance benefits.

Q4: External dataset bias. Our method shows good generalizability in external testing despite large spacing differences in training and external datasets and different cancer types. In contrast, other methods experienced a large performance drop. The robustness of our method might come from the fact that segmenting LNs in a much-confined station region is more achievable than searching LNs from the entire CT image.

Q5: Failure cases. Under-segmentation along the z-direction for LNs in the inferior mediastinal region is observed. Reasons might be: 1) unclear boundaries of the inferior mediastinal LNs, and 2) most LNs are relatively short in z-direction and the model might bias toward the majority average.

Q6: We will polish the writing and image qualities in Figs.

Q7: We adopted the default setup of the “3D full-res” version of nnUNet because of its leading performance in a wide range of challenges. The structure of the nnUNet block will be described in the later version.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper focus on a challening task of lymph node segmentation and with a novel architecture of stratified encoders and size-aware decoders. The architecture seems effective as demonstrated by the large leap in the experimental results, and also in its generalizability to an in-house dataset. The rebuttal is effective to address the concerns from reviewers.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This article proposes a new method to segment thoracic lymph nodes by integrating information from their stratifications and station sizes. It is an interesting framework designed to guide learning and provides high segmentation performance. The use of an external dataset to evaluate the results is also interesting, showing the robustness of the proposed method. The authors’ responses in the rebuttal help clarify some critical points. My proposition is therefore “acceptance”.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    9



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The key strength of this work lies in its challenging task it aims to tackle. i.e. lymph node segmentation, which is tricky due to lack of clear shape, intensity, or texture priors. While the proposed architecture is not necessarily “surprising”, or providing significant originality in its methodological contribution, it still seems to be well adapted for this challenging task. Another strength is its experimental evaluation, where the proposed method shows strong benefits compared to state of the art methods, and also shows that it generalizes from its publicly available dataset on which the method was developed, to an external validation set with differences in imaging parameters (severely differing slice thickness). There were concerns on some details of the experimental evaluation, however, one reviewer mentions that the author rebuttal clarified these aspects, to which I agree. Overall, this work seems promising and, in line with reviewer opinion (two out of three reviewers vote for acceptance), I think it is valuable to be presented at MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



back to top