Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yin Luo, Wei Liu, Tao Fang, Qilong Song, Xuhong Min, Minghui Wang, Ao Li

Abstract

Accurately classifying the histological subtype of non-small cell lung cancer (NSCLC) using computed tomography (CT) images is critical for clinicians in determining the best treatment options for patients. Although recent advances in multi-view approaches have shown promising results, discrepancies between CT images from different views introduce various representations in the feature space, hindering the effective integration of multiple views and thus impeding classification performance. To solve this problem, we propose a novel method called cross-aligned representation learning (CARL) to learn both view-invariant and view-specific representations for more accurate NSCLC histological subtype classification. Specifically, we introduce a cross-view representation alignment learning network which learns effective view-invariant representations in a common subspace to reduce multi-view discrepancies in a discriminability-enforcing way. Additionally, CARL learns view-specific representations as a complement to provide a holistic and disentangled perspective of the multi-view CT images. Experimental results demonstrate that CARL can effectively reduce the multi-view discrepancies and outperform other state-of-the-art NSCLC histological subtype classification methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_35

SharedIt: https://rdcu.be/dnwHe

Link to the code repository

https://github.com/candyknife/CARL

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    The authors propose a multi-view model for classifying Non-Small Cell Lung Cancer into its histological subtypes. The proposed model focuses on learning view-invariant features thanks to a common encoder and enforcing a low discrepancy among features extracted from the sagittal and coronal views with the axial one (principal). Additionally, the proposed model extracts view-specific features through three separate encodes to supplement the previous one and produce a global representation of the considered combining view-invariant and view-specific features. All these features are used to create a holistic representation of the lesion that is employed for its classification.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The proposed model, considering the underlying nature of the medical images (CT with higher in-plane resolution), aims at embedding a more comprehensive understanding of the considered lesions without requiring a big 3D model.
    • The results are reported based on 5-fold cross-validation and demonstrate the ability of the proposed model to outperform both classical DL models and other multi-view solutions in the state of the art
    • The presented ablation study further supports the authors’ claim that combining view-specific and view-invariant characteristic improves the classification model’s discriminative power.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The model requires as input the patch containing the lesion to be classified in the three views, and from what is described, this is not automated. Therefore in an applicative scenario, the clinician still has to go through the entire scan and identify the lesion. The automation of the patch creation would further push the applicability of the proposed solution as an end-to-end support tool.

    • In the description of the data employed, there is no mention of the considered resolutions, both in-plane and along the acquisition axis. During the pre-processing, the authors stated that the volumes were resampled to a common resolution which is not provided. How discrepant is the in-plane resolution with the one along the acquisition axes? Does the proposed model be applicable when considerable differences are present?

    • The authors do not explain how many patches per view are needed/used to achieve the classification. It is unclear if one slice per view is enough (and if yes, which is the selection criterion) or if the model requires all the slices presenting the lesion. It would be interesting to see an ablation study regarding the model performance varying the number of employed slices per view.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors suggest that they will open the codebase of the proposed work and they have used public dataset, so it seems to be ok.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    In general, the paper is well-written and easy to follow, although some missing details hinder reproducibility. For example, the authors do not describe in depth how the encoder is made except for mentioning residual blocks, to which resolution the CT volumes were resampled. The authors also stated that dealing with a multi-view volume is lighter than a DL 3D model, but they do not compare the number of model parameters or training time compared to the selected baselines. Additionally, I understand the exclusion of the cases with no segmentation since it would be impossible to recover the lesion location. However, it is unclear why the authors excluded the cases that present contouring inaccuracy since the model does not exploit the segmentations for the classification task. For the selected baselines, especially for the other multi-view approaches, it would be helpful to report if they were designed with a preferred resolution of the given volumes and if it is close to the one employed in this paper.

    Additionally, it can be interesting to consider work developed for the multi-view classification of breast cancer or also to test CARL on that type of tumor since it as well benefits from multi-view integration like: “Lopez, Eleonora, et al. “Multi-View Breast Cancer Classification via Hypercomplex Neural Networks.” arXiv preprint arXiv:2204.05798 (2022).” or “Khan, Hasan Nasir, et al. “Multi-view feature fusion based four views model for mammogram classification using convolutional neural network.” IEEE Access 7 (2019): 165724-165733.”

    As further improvements related to the previously described weaknesses, I would recommend better describing the employed resolutions and mismatch thereof and the structure of the input data required for the classification.

    As minor notes: In Fig.1, the classification loss is denoted as L_cla in Eqn.5 and 6, it becomes L_cls, while in Fig.2(b) and Tab.2, it returns to L_cla. Additionally, in Sec.3.1, there are two places where the dataset is called NSCLS-TCIA instead of NSCLC-TCIA. Finally, Fig.2 can be improved a bit since it seems to have low resolution, which does not help distinguish the different lines.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed paper is well structured and describes into detail the proposed solution which also achieve good results over different work in the state of the art on 5-fold cross-validation. Additionally, the authors propose an ablation study to further support the introduction of different losses.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    (1) This paper proposes a novel method called cross-aligned representation learning (CARL) to learn both view invariant and view-specific representations for more accurate NSCLC histological subtype classification. (2) They introduce a cross-view representation alignment learning network which learns effective view-invariant representations in a common subspace to reduce multi-view discrepancies in a discriminability enforcing way. (3) CARL learns view-specific representations as a complement to provide a holistic and disentangled perspective of the multi-view CT images. (4) Experimental results demonstrate that CARL can effectively reduce the multi-view discrepancies and outperform other state-of-the-art NSCLC histological subtype classification methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) A novel cross-aligned representation learning method called CARL is proposed for NSCLC histological subtype classification. The idea is pretty novel. (2) They employ a view-specific representation learning network to learn view-specific representations as a complement to the view-invariant representations. (3) They conduct experiments on a publicly available dataset and achieve superior performance compared to the most advanced methods currently available. (4) The writing of this paper is good and the structure is excellent.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) How to determine the values of \alpha,\belta and \gammar, only based on the experimental results?

    (2) The Figure 2 can be more clear.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I think this paper can be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    See the main weaknesses of the paper.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See the main strengths and weaknesses of the paper.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors have answered my concerns. I think this paper can be accepted.



Review #4

  • Please describe the contribution of the paper

    In their manuscript, the authors propose a novel framework for multi-view NSCLC subtype classification. Notably, they involve both common, as well as view-specific projections and augment the model by various auxiliary tasks such as latent space alignment and image reconstruction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The manuscript is overall well-written, succeeds in stating a clear motivation, and succinctly emphasizes its contributions. The manuscript structure is clear and easy to follow and the results look promising.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Regarding the method, I felt that some of the model parts lack intuition and that their value should be demonstrated more clearly.

    Regarding the evaluation, I felt that the manuscript currently not succeeds in clearly demonstrating the superiority of the method at hand. This is especially as the comparison algorithms have been a) self-implemented, and b) have not been built for the task at hand. It seems that the method therefore might benefit from a comparison to state-of-the-art work. Moreover, this comparison should comprise an inference statistical evaluation, as this would allow assessing the significance of this superiority.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors agree to publish their training and evaluation code on acceptance, which greatly improves reproducibility. In contrast to the reproducibility form, the manuscript does not include a measure of variation, such as error bars.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    As state above, the manuscript is overall well-written, succeeds in stating a clear motivation, and succinctly emphasizes its contributions. However, I felt that in its current form the manuscript might benefit from some modifications regarding the methodological description and the evaluation section, which I would like to point out as follows:

    (Newly introduced references are written as [rX].)

    ####################### Methodology #######################

    • Currently, I felt that the overall value of some of the introduced components is not completely clear, which especially comprises the L_dsim loss from Eq. 3, in which the cross-entropy term clearly dominates the encoder similarity constraint with a weight of 110:1. As such, the overall value of the L_sim loss remains questionable. This is even more, as the weight alpha, according to Sec. 2.5, is set to 6*10^-4, hence, the weight of L_dsim is a thousand times smaller than the orthogonality or construction loss. – Consequently, the results in Tab. 2 indicate only a small value of this loss, and the significance of this difference felt rather unclear. – Could the authors add some sentences facilitating a better intuition for this weighting?

    • Secondly, regarding the shared encoder, I was not certain about the usefulness of the shared encoder part, as the neighborhood relation between different perspectives of the same image is only given for the middle of the image. Still, it seems that the different views are concatenated as channels of the same input. – I would strongly recommend containing a specific evaluation of the contribution of the various lanes in the framework.

    ####################### Evaluation #######################

    • Regarding the evaluation, my main point addresses the choice of comparison algorithms. – First, the authors chose to reimplement the presented methods, complicating the reproduction of the results, and having the potential to introduce errors. – Secondly, the choice of multi-view algorithms felt unfortunate, as two of the approaches [8,18] have been trained for shape detection, and the other two [11,22] aimed for other clinical modalities (mammography) and/or diseases (COVID). I would recommend comparing directly related work, such as [r1]. – Third, with respect to the ablation study, I felt that the authors should have included all potential variants, such as L_dsim + L_rec, etc., as this would have better highlighted the value of each component.

    • While the results are overall promising, I missed inference statistical measures, such as standard deviations, error bars, or confidence intervals. I would recommend adding confidence intervals, e.g., using bootstrapping [r2].

    ####################### Minor #######################

    • Regarding Fig. 1, I would recommend including as much information in the caption as needed to understand the figure. This should include relevant abbreviations and components (CMD, D_v, L_orth, L_dsim, …), and further give an overview of the data processing.
    • Regarding Eq. 2, could the authors add some more intuition regarding the non-uniformity of this loss, i.e., why has there been a choice of a main view, rather than aligning all views with each other?
    • I felt that Eq. 5 might simply be omitted, stating that a cross-entropy loss has been used.
    • Currently, the result section contains longer assessments of the achieved results (“These results demonstrate that CARL […]”, “More importantly, CARL […]”). I felt that this rather belongs to the discussion section.

    ####################### References ####################### [r1] Li, H., Song, Q., Gui, D., Wang, M., Min, X., & Li, A. (2022). Reconstruction-assisted feature encoding network for histologic subtype classification of non-small cell lung cancer. IEEE Journal of Biomedical and Health Informatics, 26(9), 4563-4574. [r2] Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American statistical Association, 82(397), 171-185.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the results felt promising, I think that they currently lack an evaluation that clearly demonstrates their significance. The methodology of the model components is mostly state-of-the-art, therefore the main impact of the work would be a clear demonstration of the value of this specific combination.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I felt that the authors successfully addressed some of my concerns by adding extended ablation experiments, comparing additional SoA work, and adding inference statistical measures. The rating was adapted accordingly.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Propose a novel method called cross-aligned representation learning (CARL) to learn both view-invariant and view-specific representations for more accurate NSCLC histological subtype classification

    • Excellent context on previous work provided, contributions well explained
    • Cross validation results only on public dataset, no statistical analysis performed
    • Very marginal improvement over SOTA
    • Ablation analysis provided, though empirically set parameters not well justified.




Author Feedback

We thank all the reviewers for valuable comments. We will revise the paper accordingly in the final version.

  1. Performance and Statistical analysis (R1, R4): We apologize for not presenting the results with sufficient statistical analysis. Standard deviations, 95% CI, and DeLong statistical test for AUC comparison were conducted. Notably, the DeLong test revealed our method achieved significant improvement of AUC over almost all compared methods (p-value < 0.05). The only exception is [22] (marginally insignificant p-value of 0.12), probably because this method showed unstable performance and extremely large variance in AUC. Compared methods: [3], —[6], —[4], —[13], —[23], —[9], —[18], —[8], —[22], —[11] ———– p-value: 0.001, 0.017, 0.010, 0.011, 0.031, 0.001, 0.002, 0.001, 0.120, 0.004 (— added for formatting)

  2. Optimization (R3, R4): Apologies for the confusion caused by our weight setting. In Eq. 3, we observed that the cross-entropy (CE) term is hundred times smaller than L_sim, so we conducted a grid search for the weight of CE term lambda in the range of 10^2-10^3 to balance the magnitude of CE term and L_sim. In Eq. 6, to normalize the scale of L_dsim which is much larger than the other terms, we introduced a scaling factor S=0.001, and performed a grid search for the weights alpha, beta, and gamma in the range of 0.1S-S, 0.1-1, and 0.1-1, respectively.

  3. Data pre-processing (R2): 1) To ensure data quality, we removed samples of inaccurate contours or blurry boundaries caused by breathing artifacts under the guidance of clinicians, following the exclusion criteria in [9][13]. 2) The CT data from NSCLC-TCIA has an in-plane resolution of 1mm×1mm and a slice thickness of 0.7-3.0 mm. We resampled the data using trilinear interpolation to a common resolution of 1mm×1mm×1mm. Then one 128x128 pixel slice was cropped from each view as input based on the center of the tumor. Due to time limitations, we did not investigate the effect of large intra- and inter-plane resolution differences or varying number of slices used per view. These factors will be considered in future work.

4.Experiments setup and Evaluation (R2, R4): 1) We used publicly available codes of comparison methods and implemented models for methods without code. All multi-view methods used the same structure of input data. 2) As requested, we tested Li et.al (JBHI.2022.3192010) under the same experimental setup and the averaged AUC was 0.809±0.044. The result showed that even with a quite simple reconstruction module and routine MSE loss, CARL still yielded slightly better performance than Li et.al, which instead heavily relies on carefully designed network architecture and dedicated loss functions for reconstruction enhancement. Meanwhile, we found marginal difference (<1% AUC) in performances of CARL without reconstruction and the full model, suggesting that our multi-view learning framework can be further strengthened with improved performance by utilizing more sophisticated reconstruction schemes in future work. 3) We used floating point operations (FLOPs) to compare the computational complexity between our method (0.9 GFLOPs) and the 3D method [9] (48.4 GFLOPs), showing that our multi-view method requires less computational resources. 4) We extended our ablation study to different combinations of losses and the results will be presented in the final version of our paper. The evaluation suggested that while single loss already contributed to performance improvement, the combinations of losses can further enhance classification results. 5) A specific evaluation was performed by removing the shared encoder and related loss, resulting in a remarkable decrease in AUC (0.778±0.050) compared with the full model. The DeLong test confirmed the significance of this performance difference of two models (p-value of 0.003), showing that the shared encoder and L_dsim play a crucial role in alleviating view divergencies.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Proposes a novel method called cross-aligned representation learning (CARL) to learn both view-invariant and view-specific representations for more accurate NSCLC histological subtype classification. Rebuttal presents detailed statistical analysis showing significant (even if marginal) improvements, explanation of optimization, details on data processing, as well as additional experiments against SOTA including analysis of complexity. Concerns have been comprehensively addressed.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper has great merit, all reviewers agree that the authors have address all the concerns, and that the paper should be accepted.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The work proposed a multi-view model to learn both view invariant and view-specific representations. After the rebuttal, the reviewers reached consensus about its acceptance.



back to top