Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Hongyan Xu, Dadong Wang, Arcot Sowmya, Ian Katz

Abstract

Basal cell carcinoma (BCC) is a prevalent and increasingly diagnosed form of skin cancer that can benefit from automated whole slide image (WSI) analysis. However, traditional methods that utilize popular network structures designed for natural images, such as the ImageNet dataset, may result in reduced accuracy due to the significant differences between natural and pathology images. In this paper, we analyze skin cancer images using the optimal network obtained by neural architecture search (NAS) on the skin cancer dataset. Compared with traditional methods, our network is more applicable to the task of skin cancer detection. Furthermore, unlike traditional unilaterally augmented (UA) methods, the proposed supernet Skin-Cancer net (SC-net) considers the fairness of training and alleviates the effects of evaluation bias. We use the SC-net to fairly treat all the architectures in the search space and leveraged evolutionary search to obtain the optimal architecture for a skin cancer dataset. Our experiments involve 277,000 patches split from 194 slides. Under the same FLOPs budget (4.1G), our searched ResNet50 model achieves 96.2% accuracy and 96.5% area under the ROC curve (AUC), which are 4.8% and 4.7% higher than those with the baseline settings, respectively.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_26

SharedIt: https://rdcu.be/dnwJI

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    In this study, the neural architecture search (NAS) approach was used to identify the optimal network structure for the skin cancer detection task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    In this study, the neural architecture search (NAS) approach was used to identify the optimal network structure for the skin cancer detection task.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors only limited their method in the optimization of NN for the skin cancer detection task. The performance of other tasks should be validated to confirm the generalization of the proposed method.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility is hard to predict because the evolutional search does not always give out similar results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The authors only limited their method in the optimization of NN for the skin cancer detection task. The performance of other tasks should be validated to confirm the generalization of the proposed method.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is not new enough.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a classification method to detect basal cell carcinoma on histological images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work presents an innovative approach to optimize Convolutional Neural Network (CNN) architecture, achieving great results in differentiating cancer patches. The study’s contribution lies in developing an automated method to identify the most suitable architecture for the given task. This approach leads to a more efficient and effective design of CNN models, improving the accuracy of cancer detection. The study’s results demonstrate the success of this approach and provide a promising direction for further research in the field.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It is important to consider the potential bias that may arise when multiple patches are extracted from the same tissue slide in a medical image analysis experiment. The samples taken from different patients can exhibit significant variations, and classifying them without separating the patients can introduce biases in the analysis.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The article includes details on the architecture and its hyperparameters which makes it possible to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The project execution is good, but more information is needed about how the data was separated for training and testing. It is important to know if the data from the same patient was in both sets, as this can cause the model to be too optimistic in its performance. This can lead to a model that performs well in the training data but poorly in real-world situations. It’s essential to make sure the data separation process is randomized and includes patients’ information to improve the model’s reliability.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My recommendation is due the fact this paper successfully achieved its goal, and the novel optimization method used to find the CNN architecture is a significant contribution to the field. The paper’s approach stands out for its innovative and effective method of optimizing CNN architecture, which enhances the accuracy of the results.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a new deep learning architecture Skin-Cancer net (SC-net), which uses Neural Architecture Search to analyze skin tumor images for automated whole-slide analysis (WSI). The results suggest that the proposed network outperforms traditional methods in skin tumor detection because it takes into account training fairness and mitigates the effects of evaluation bias. The experiments involved 277,000 patches divided by 194 slides and achieved 96.2% accuracy and 96.5% area under the ROC curve (AUC) with the same budget of FLOPs (4.1G), 4.8% and 4.7% more than the baseline settings, respectively.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The work addresses the problem of skin cancer classification, but most importantly considers the important aspects that current methods rely on models pre-trained on natural images rather than medical images. This problem is addressed by using NAS to find the optimal network settings for a given task. 2) The experiments are well justified and well performed. However, since the data set acquired and used is unbalanced, the experimental setup needs to be explained in more detail, as described in point 6 below.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) From Table 1, we can see that the dataset is unbalanced: there are ~130k positive and ~90k negative patches for training, and ~30k positive and ~22k negative patches for testing. Given the experimental results, I suggest that the authors add an appropriate metric to judge the goodness of the method applied to the unbalanced dataset. In particular, it would be preferable to add another metric, such as F-measure or balanced accuracy, to determine whether the proposed method is indeed better than the others.

    2) In the tables and the experimental results in general, it is not clear to me how the reported metrics were calculated. First, are they the standard metrics, per class, or the average per class? If it is the average, how was it calculated? In any case, the information on how the metrics were calculated is missing and should be pointed out.

    3) Also, the heatmaps generated in Figure 3 are important to give the method some explanatory power.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Based on what was reported by the authors of the article and based on the Reproducibility Checklist, I believe that the reproducibility details are adequate. However, if it were possible to focus my concerns on the experimental part described above, the reproducibility of the paper would also benefit.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Dear authors, Your paper is very well motivated and the experiments are comprehensive and well done. However, I have some concerns regarding the reported data, each of which is listed in the “Weaknesses” section. In particular, I think that a clarification of all details concerning the experimental evaluation is needed. In addition, I also add some typos/minor concerns that I found during my reading of the manuscript:

    • abstract, line 8: “the skin cancer dataset” -> “a skin cancer dataset” because it has not been presented yet;
    • abstract, line 9: what is intended with “more applicable”?
    • page 7, caption of table 2: how many GPUs were used?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I find the work to be fully justified and adequate in its description and execution of the experiments. However, as much as I find the manuscript acceptable, there are a number of details not provided and/or not addressed (listed in the “Weaknesses” section) that must be provided and justified to make the manuscript fully acceptable.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors mainly addressed all my concerns, in particular, the one regarding data splitting. Using a NAS strategy in this context can be considered a moderate novelty.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Reviewers raised most experiment-related concerns, such as the evaluation of model generalization, the problem of model bias / imbalance patch set and evaluation metric.




Author Feedback

We are grateful for your constructive feedback and will clarify and expand on details in our revised manuscript accordingly. To Reviewer #1: Q1: Experiments on other tasks are needed to verify the generalization of the model. A1: We evaluate our model’s generalization with experiments on MedMNISTv2 (specifically the ChestMNIST and DermaMNIST subsets) and CIFAR-10 datasets, following MedMNISTv2 paper protocols. The results, in comparison to benchmarks set by the MedMNISTv2 paper, are as follows. Model | ChestMNIST | DermaMNIST |
——– | AUC | ACC| AUC | ACC ———————————————————– ResNet-18 | 76.8 | 94.7| 91.7 | 73.5 ResNet-50 | 76.9 | 94.7| 91.3 | 73.5 auto-sklearn | 64.9 | 77.9| 90.2 | 71.9 AutoKeras | 74.2 | 93.7| 91.5 | 74.9 Google AutoML | 77.8 | 94.8| 91.4 | 76.8 s_ResNet50(ours) | 79.2 | 95.5| 93.1 | 77.8 The experimental results of our model on the CIFAR-10 dataset are as follows: | Model | ACC | AUC MetaPruning | 94.2 | 94.7 AutoSlim | 94.0 | 94.3 ori_ResNet50 | 93.4 | 93.8 s_ResNet50(ours) | 94.7 | 95.2 Our s_ResNet50 outperformed the original ResNet-50 on all datasets, with a 2.3% and 1.8% higher AUC on ChestMNIST and DermaMNIST, and 1.3% on CIFAR-10. These results demonstrate the model’s robust generalization. Q2: The proposed method is not new enough. A2: Our work presents two significant innovations: ① We employ a Neural Architecture Search (NAS) strategy, distinct from typical reliance on existing model architectures, tailoring it specifically for the Skin Cancer Detection Task. ②We introduce SC-Net, a framework that guarantees equitable treatment of all architectures within the search space. It diminishes the impact of evaluation bias, promoting fairness in training. Additionally, we adopt an evolutionary search methodology within SC-Net to identify the most suitable architecture for our dataset. To Reviewer #2: Q1: How the data was separated for training and testing. Are data from the same patient in both sets? A1: We carefully separated the training and testing data, ensuring no patient data was present in both sets. We enforced a strict division by patient to minimize bias, providing a true reflection of our model’s performance. To Reviewer #3: Q1: It is recommended to add a metric to judge the quality of the method applied to the imbalanced dataset. A1: Appreciating your suggestion, we calculated F-measure for our models and the existing methods: Tian et.al. (92.2) | Hekler et.al. (92.8) | Jiang et.al. (91.4) | MetaPruning (93.3) | AutoSlim (93.0) | ori_ResNet50 (90.5) | s_ResNet50 (ours: 95.2) | ori_MobileNetV2 (86.2) | s_MobileNetV2 (ours: 90.7) Results show that our models have superior F-measure scores compared to other evaluated methods. Q2: How the reported metrics were calculated. A2: For experimental results, we performed each experiment three times, reporting the mean value. Metrics were calculated as follows: Accuracy: (TP + TN) / (Total Observations), Sensitivity: TP / (TP + FN), Specificity: TN / (TN + FP), and AUC: Area under the Receiver Operating Characteristic curve. We’ll make sure to clarify this in the revised manuscript. Q3: The heatmaps are important to give the method some explanatory power. A3: Heatmaps highlight the model’s ability to identify BCC regions accurately, thereby enhancing our method’s interpretability. We will clarify this role in our revised manuscript. Q4: Some typos/minor concerns: - abstract, line 9: what is intended with “more applicable”? - page 7, caption of table 2: how many GPUs were used? A4: The term “more applicable” refers to SC-Net’s customization using NAS, ensuring optimal performance specifically for skin cancer detection. As for the resources, our experiments utilized two NVIDIA RTX A6000 GPUs.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Although one of the reviewers raised his rate, R#2 was not satisfied with the rebuttal and maintain the negative rate.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Great study that reviewers and metareviewers find both novel and well done.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors provided a good rebuttal, and one of the reviewers increased the score. As a result, the final score became among the ones on the higher-side in my pool.



back to top