Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Wookjin Choi, Navdeep Dahiya, Saad Nadeem

Abstract

Spiculations/lobulations, sharp/curved spikes on the surface of lung nodules, are good predictors of lung cancer malignancy and hence, are routinely assessed and reported by radiologists as part of the standardized Lung-RADS clinical scoring criteria. Given the 3D geometry of the nodule and 2D slice-by-slice assessment by radiologists, manual spiculation/lobulation annotation is a tedious task and thus no public datasets exist to date for probing the importance of these clinically-reported features in the SOTA malignancy prediction algorithms. As part of this paper, we release a large-scale Clinically-Interpretable Radiomics Dataset, CIRDataset, containing 956 radiologist QA/QC’ed spiculation/lobulation annotations on segmented lung nodules from two public datasets, LIDC-IDRI (N=883) and LUNGx (N=73). We also present an end-to-end deep learning model based on multi-class Voxel2Mesh extension to segment nodules (while preserving spikes), classify spikes (sharp/spiculation and curved/lobulation), and perform malignancy prediction. Previous methods have performed malignancy prediction for LIDC and LUNGx datasets but without robust attribution to any clinically reported/actionable features (due to known hyperparameter sensitivity issues with general attribution schemes). With the release of this comprehensively-annotated CIRDataset and end-to-end deep learning baseline, we hope that malignancy prediction methods can validate their explanations, benchmark against our baseline, and provide clinically-actionable insights. Dataset, code, pretrained models, and docker containers are available at \url{https://github.com/nadeemlab/CIR}.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16443-9_2

SharedIt: https://rdcu.be/cVRx8

Link to the code repository

https://github.com/nadeemlab/CIR

Link to the dataset(s)

https://zenodo.org/record/6762573


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose an end-to-end trainable deep learning architecture for combined lung nodule malignancy classification as well as vertex-wise spiculation and lobulation classification from thoracic CT scans. The architecture is a slightly extended version of Voxel2Mesh. The extension mainly focuses on adding the classification heads. Moreover, the authors provide lung nodule segmentation masks, spiculation/lobulation annotations, and area distortion maps for the publicly available LIDC and LUNGx data sets. Those data sets are also used to quantitatively evaluate the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel idea for lung nodule malignancy classification
    • Special focus on challenging vertex-wise spiculation and lobulation classification
    • Derived masks and annotations will be made publicly available
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Extensions of the original Voxel2Mesh architecture appear to be minimal (limited novelty on that end)
    • Results are hard to assess as no baseline numbers or comparisons are provided
    • Description of the architecture and the training process could be improved
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors will make the masks/annotations that were used publicly available. The same is true for the code and pretrained models. Most of the parameters that are important for training are given. I, therefore, believe that the results presented will be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    While I really like the contribution, I see two major problems with the paper in its current form:

    (1) The actual novelty is hard to assess as the paper is not really clear on how much the proposed architecture differs from a vanilla Voxel2Mesh network. It seems like that the major difference between Voxel2Mesh and the pipeline used here, is the addition of the classification heads to perform (global) malignancy classification as well as vertex-wise spiculation and lobulation classification. I don’t think that this would actually be a major problem as the proposed application scenario seems to be novel, but the authors should make this more clear then. I am also wondering how the Voxel2Mesh part is trained. Is the whole pipeline trained end-to-end using the BCE loss? That seems to be unlikely, but I am not able to find additional information in the paper. It would be, for example, interesting to learn how the authors accurately segment the spiculations, which will most likely be smoothed out by Voxel2Mesh. Is this achieved by just removing that part of the loss? Is that what the authors refer to in Sec. 2.1 when saying “We did not apply regularization terms to the deformations to capture their irregular and sharp surfaces.”

    (2) If find it hard to assess the results of the quantitative evaluation. The authors neither compare their approach to any baselines nor do they discuss their numbers wrt to existing work. While I understand that (most likely) no comparable approaches exist for vertex-wise spiculation and lobulation classification the performance of the malignancy classifier should have been discussed and compared to other methods. The authors indicate in Sec. 5 that “Although the segmentation performance was better than the previous deep learning methods, […]”. However, I cannot find any comparisons to other DL methods in the paper.

    Minor comments:

    • Fig. 2 is not referenced in the paper and at least parts of its caption should be moved to the main text of Sec. 2
    • The first two sentences of Sec. 4 are not really results
    • In terms of reproducibility it would be better to directly mention the GPU(s) being utilized instead of saying that “Nvidia HPC clusters” were used
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I like the idea, but I’m not entirely sure (1) if there is enough novelty and (2) if the evaluation supports the author’s claims.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors’ rebuttal kind of confirms my initial assessment: (1) very limited methodological novelty. (2) main contribution is the annotated data set. The (baseline) results provided for NoduleX in the rebuttal are a nice addition and should be added to the paper in case of acceptance. I’m still leaning towards ‘weak accept’ as such data sets are extremely valuable for the community and the effort to curate them should be rewarded.



Review #3

  • Please describe the contribution of the paper
    • almost 1000 annotations of lung nodules for open datasets, including segmentation masks and classification w.r.t. spiculation
    • approach to segment lung nodules, classify speculations, and estimate malignancy in one network architecture
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    I like the idea of the presented approach: very clean solution and includes explainable features for radiologists. Plus: the application of lung nodule analysis is clinically very relevant.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Evaluation not convincing: some smaller issues (see below in comments), and malignancy prediction results look not competitive with state of the art (e.g. NoduleX)

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Apparently, the authors have not understood how to fill out the checklist: all questions are answered with yes, yet most of the information is missing in the paper (e.g. only everage values are given for comparison and no variation, no tests for statistical significance, …) On a positive note, datasets and code will be released, which should answer many of the open questions.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    “Voxel2Mesh for multi-objects, this paper single object”: unclear what this means or what the challenge is Which 32 mesh features are used? Sec 3.1: unclear how classification of peaks was done - by radiologists? Generally good description of experiments with many details. Sec 4, about Table 2: “On the external LUNGx testing dataset (N=70), the hybrid voxel classifier model does better in terms of both the metrics for all three classes.” -> Jaccard index for nodules is actually worse. Table 3: LIDC-PM results are surprisingly good, better than on training set?!? LUNG-X results much worse, this might indicates open issues, would be nice to have a discussion on this. No comparison with other state of the art methods wrt malignancy prediction. Fig. 3 not that convincing, unfortunately - is this a typical example? Sec 5: “the segmentation performance was better than the previous deep learning methods”: I could not find a comparison in the paper.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Not an easy decision, but in the end I did not find sufficient evidence for the impact of the new approach, as performance was a mixed bag overall.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors could resolve most of my concerns regarding the non-competetive results vs. NoduleX (which apparently was due to different test cohorts), so I have changed my opinion to weak accept.



Review #4

  • Please describe the contribution of the paper

    The authors propose a method combining shape features and deep features (both based on Voxel2Mesh) in an end-to-end model for the automatic prediction of lung cancer malignancy. The motivation for the use of Voxel2Mesh is to extract spiculations on the lung nodule surface, which are predictive of malignancy, without smoothing the contours. The authors therefore extend the use of Voxel2Mesh to a multi-class problem to segment nodules, and classify vertices into 3 classes: nodule, spiculations and lobulation. Spiculations annotations are performed on the public LIDC-IDRI dataset for training/validation (annotations and code will be made publicly available upon acceptance). The models are evaluated on one internal and one external test set.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The annotated spiculations will be shared publicly. The method is simple and makes good use of the existing Voxel2Mesh method with a good design (multi-class Voxel2Mesh and malignancy prediction). Mesh classification and final malignancy prediction results are reported, using only mesh features vs mesh + deep (encoder) features

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There is no comparison with SoA best results on the datasets.The benefit of the deep features is not evident from the results, particularly on the external test set.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Public dataset and annotations will be shared, together with the code and model weights.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    I think the title is not well suited to the paper. There is no mention of lung nodule and it is not a toolkit that is proposed. Although mentioned, it is not clear from the beginning that the method is end-to-end. I first thought it was a mistake. It could be worth briefly mentioning early in the paper how it is trained end-to-end. I would tend to contest the motivation in the abstract “tend to smooth out … making subsequent outcomes prediction difficult” Deep models can capture it, not necessarily needing precise segmentations. Some typos to fix: e.g. “to classy” Maybe the Voxel2Mesh part, that is most of Fig. 2, could be evidently split in the figure from the novelty that is taking the encoder features for the Malignancy prediction. The results seem promising, yet I agree with the limitations stated by the authors on the vertex-level classification. (This comment could be merged with the Limitations and Future Work section)

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is simple and well designed. The results are difficult to assess without comparison with state of the art methods.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    It is unclear from the response how the authors will modify the paper to clarify the different points, mainly the evaluation and comparison. The response could benefit from 1) grouping reviewer comments and summarizing them before answering 2) identifying the reviewers and 3) Stating what will be modified in the revision.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    a challenging borderline paper, maybe a rebuttal period can give a chance to authors to clarify the innovations. Mixed reviews, with slightly more positive feedbacks.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4




Author Feedback

Our main contribution is the release of almost 1000 radiologist QA/QC’ed spiculation/lobulation annotations on segmented lung nodules for two public datasets, LIDC (with visual radiologist malignancy RM scores for the entire cohort and pathology-proven malignancy PM labels for a subset) and LUNGx (with pathology-proven size-matched benign/malignant nodules to remove the effect of size on malignancy prediction). The purpose of the multi-class Voxel2Mesh extension is to provide a good baseline for end-to-end deep learning lung nodule segmentation, peaks’ classification (lobulation/spiculation), and malignancy prediction; Voxel2Mesh is the only published method to our knowledge that preserves sharp peaks during segmentation and hence its use as our base model. We do not claim novelty over this extension. For R4, we can change the title to “End-to-End Deep Learning Lung Nodule Segmentation/Classification and Malignancy Prediction.”

The primary motivation of this work comes from our collaborators in radiology inquiring about the importance of clinically-reported LUNG-RADS features such as spiculation/lobulation in SOTA malignancy prediction. Previous methods have performed malignancy prediction for LIDC and LUNGx datasets but without robust attribution [1] to any clinically reported/actionable features. This motivated us to annotate clinically-reported features at voxel/vertex-level on public lung nodule datasets (using negative area distortion computed via spherical parameterization to annotate spiculations/lobulations on meshes [7] followed by radiologist QA/QC) and relating these to malignancy prediction (bypassing the “flaky” attribution schemes [1]). With the release of this comprehensively-annotated dataset, we hope that previous malignancy prediction methods can also validate their explanations and provide clinically-actionable insights.

NoduleX (http://bioinformatics.astate.edu/NoduleX) reported results only on the LIDC RM cohort, not the PM subset. When we ran the NoduleX pre-trained model on the LIDC PM subset, the AUC, accuracy, sensitivity, and specificity were 0.68, 0.68, 0.78, and 0.55 respectively versus ours 0.71, 0.63, 0.74, and 0.53. On LUNGx, AUC for NoduleX was 0.67 vs ours 0.71. MV-KBC [2] (implementation not available) reported the best malignancy prediction numbers with 0.77 AUC on LUNGx and 0.88 on LIDC RM (NOT PM). The Jaccard index for nodule segmentation on a random LIDC training/validation split via UNet, FPN, and Voxel2Mesh was 0.775/0.537, 0.685/0.592, and 0.778/0.609, and for peaks segmentation it was 0.450/0.203, 0.332/0.236, and 0.459/0.456.

The model was trained end-to-end using the following total loss (with default Voxel2Mesh weights): loss = 1 * bce_loss [malignancy classification] + 1 * ce_loss [vertex classification] + 1 * chamfer_loss [nodule mesh] + 1 * chamfer_loss [spiculation mesh] + 1 * chamfer_loss [lobulation mesh] + (0.1 * laplacian_loss + 1 * edge_loss + 0.1 * normal_consistency_loss) [regularization] We have added extra deformation modules to the mesh decoder to capture peaks and classify these into lobulations/spiculations. The mesh decoder deforms the input sphere mesh to segment the nodule, and the deformation is controlled by chamfer distance between the vertices on the mesh and the ground truth nodule vertices, with additional regularization terms (laplacian, edge, and normal consistency). The mean cross entropy loss between the final mesh vertices and the ground truth vertices is used for vertex classification. Finally a fixed-size feature vector per vertex is extracted and classified as benign/malignant using Softmax classification with two fully connected layers, and the results are evaluated using binary cross entropy loss. [1] Bansal, et al. “Sam: The sensitivity of attribution methods to hyperparameters.” CVPR, 2020.

[2] Xie, et al. “Knowledge-based collaborative deep learning for benign-malignant lung nodule classification on chest CT.” IEEE TMI (2018): 991-1004.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    a successful rebuttal period, with a good data set, but authors need to add substantial information to clarify the data sets and other questions raised by the reviewers. Acceptance is suggested.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    All reviewers are leaning towards an acceptance after evaluating the authors’ rebuttal letter. With the authors’ commitment to incorporate reviewers’ feedback in their revised manuscript, I recommend accepting this paper in MICCAI publication.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Considering the fact that the major contribution is the annotated dataset, the weakness of limited methodological novelty can be regarded as secondary concern. Thus I would recommend acceptance, while authors would need to add sufficient information according to the reviews.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



back to top