Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Suraj Mishra, Yizhe Zhang, Li Zhang, Tianyu Zhang, X. Sharon Hu, Danny Z. Chen

Abstract

Automatic classification of pigmented, non-pigmented, and depigmented non-melanocytic skin lesions have garnered lots of attention in recent years. However, imaging variations in skin texture, lesion shape, depigmentation contrast, lighting condition, etc. hinder robust feature extraction, affecting classification accuracy. In this paper, we propose a new deep neural network that exploits input data for robust feature extraction. Specifically, we analyze the convoutional network’s behavior (field-of-view) to find the location of deep supervision for improved feature extraction. To achieve this, first we perform activation mapping to generate an object mask, highlighting the input regions most critical for classification output generation. Then the network layer whose layer-wise effective receptive field matches the approximated object shape in the object mask is selected as our focus for deep supervision. Utilizing different types of convolutional feature extractors and classifiers on three melanoma detection datasets and two vitiligo detection datasets, we verify the effectiveness of our new method.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16431-6_68

SharedIt: https://rdcu.be/cVD7o

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    Authors present a Class Activation map based deep supervision method for training skin image classification model. They highlight that due to absence of object-level labels, the models are deemed to the restricted to image level information and feedback for most classification problems. They propose the use of effective receptive fields as a deep supervision mechanism to solve this problem to some extend and boost the classifier performance. The also present a layerwise effective receptive field determination strategy to make the mode in variant to the size of the object within the field of view. The resulting model is tested on several skin image datasets

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Overall the paperis well writen and the presentation is satisfactory.

    The lack of object level feedback in classification problems is an important problem to tackle.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper reads like, authors utilized methods from the literature, on a novel application.

    Presentation lacks of clarity at a few places.

    The use of CAM from the last conv layer in order to determine the L_{target} in LERF may not be robust.

    The isotropic nature of the LERF is not well justified.

    Conclusions section is very short and lacks of discussions.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It is not clear, what type of images do the inhouse dataset contains; dermoscopy, close up clinical, etc… An analysis of situations in which the method failed is not given. (authors said yes to this in the reproducibility statement)

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The methods are not novel. ERF and LERF methods are adapted from the literature. The paper reads like, authors utilized methods from the literature, on a novel application.

    Presentation lacks of clarity at a few places.

    The use of CAM from the last conv layer in order to determine the L_{target} in LERF may not be robust. CAMs are known to be problematic in finding the lesion area (or diagnostically critical area) in many skin lesion classification applications and therefore there is not guarantee that the LERF selection will also be from a diagnosticly critical area.

    The isotropic nature of the LERF is not well justified. For skin images this assumption may not hold. Similarly, the unimodel assumption may not hold for skin images. as seen in Figure 3, first and third row, skin lesions may contain multifocal areas of high response (or high diagnostic importance)

    The inference procedure is not well explained. Do LERF have any significance during inference.

    Page7, 2nd sentence: “ To neutralize variations…”. This statement is not clear. Please further elaborate on the issue and the proposed solution.

    Overall, the table captions are too short and does not convey enough information in regards to the table.

    What does V and R stand for in column C of Table 2

    The message that the authors would like to convey via Figure 4 is not clear. Are CAMs after deep supervision better, more precise or ?

    Conclusions section is very short and lacks of discussions.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The presented problem is important especially in explainable AI sense. Even if the authors approach has some points that requires further explanation and evaluation, the reviewer thinks that the contribution is valuable.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper shows that adding deep supervision to the layer whose effective receptive field size approximately matches the average object size, improves the performance of a skin lesion classification model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The idea is novel, particularly in the area of skin lesion classification.
    2. The study is performed on multiple datasets and performance gains are reported on all the datasets
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. While the study shows improvements, it is not clear if the improvements are indeed significant (no confidence intervals or statistical tests for example).
    2. While the average performance seems to increase, it is not clear if it also introduces more variability in performance– for example, does the model perform significantly worse on cases where the object size is too small or too large (vs competing models)? There is no analysis presented which stratifies the performance by size of the object.
    3. Due to 2 above, it is not clear if the method is generalizable to different datasets, particularly if the average size of the object varies across different datasets.
    4. Additionally, because of the dependence of object size, it is not clear how the method will work with multiple classes with different sizes. Indeed, for ISIC 2018, the best result reported in the paper with deep supervision (0.701) falls far short of the best result (0.845).
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducible

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. I appreciate the fact that the method was tested on multiple different datasets and the method shows consistent improvement in each case.
    2. Adding confidence intervals to the presented results will add context to the presented metrics and give us a hint of the variability of the results.
    3. Since the key idea is based on object sizes, I suggest adding some analysis by stratifying the test group by sizes of the objects and analyzing the performance dependence on object size.
    4. It will be more convincing if instead of training and testing on the same dataset, you could train on one and test on another (for example on the Vitiligo private and public datasets)
    5. For the LERF computation, 20 iterations are used. Adding some information on the relative variability across the 20 iterations (maybe in supplementary) is useful as it will show the robustness of the LERF computation.

    Minor comment:

    1. In the introduction, the citation of ISIC 2018 has not been added.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the method is interesting and it shows some improvement, but I am worried that the performance will depend heavily on the size of the objects (compared to standard methods) and the method will not generalize well to cases where the sizes of the objects are variable, or if there are multiple objects of different sizes, etc.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposed a supervision generation method for skin lesion classification, where the total framework composes a LERF module, morphological object size approximation, and deep supervision. The experimental results on several skin lesion datasets have shown the better performances.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method can provide accurate lesion localizations through activation mapping. The obtained quantitative results are competitive or better than baseline methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There lacks the comparisons to other activation mapping method such as Grad-CAM.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method can be reproduced.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    It is better to provide localization maps of other method such Grad-CAM, Score-CAM.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The whole framework is complete and the experiments are extensive. The accurately localized lesions are helpful for assisting clinical diagnosis.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper proposed a Class Active map-based deep supervision method for skin lesion classification. Also, a layerwise effective receptive field determination strategy is designed to improve the classification scores. The experiments on several skin lesion datasets have shown the presented method obtains competitive performance. However, there are some recommendations from reviewers. The authors should revise this manuscript according to the comments from reviewers.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2




Author Feedback

We thank all the reviewers for their valuable feedback and suggestions. We address the major review comments below.

R1+R3: CAM may not be robust We agree with the reviewers’ observations. We’ll explore techniques like grad-CAM [1] and score-CAM [2] in future work for improved robustness.

R1: Do LERF have any significance during inference. No. Only the main output is used as the final output during inference.

R2: Missing citation of ISIC 2018 Thanks for pointing this out. We will address it in the final manuscript.

R1+R2: Multiple classes of different sizes. We agree with the reviewers’ concern regarding the single target lesion assumption. Note that in all the five skin lesion datasets used in our experiments, the majority of the images contain a single continuous object of interest. We plan to extend our method to handle multiple target classes of different sizes in our future work based on the data-driven deep supervision segmentation method for multiple object sizes in [3].

R1+R2 - LERF robustness. Following [4], the LERF calculation experiments were averaged over 20 iterations.

R1: Table captions are too short and do not convey enough information in regards to the table. What does V and R stand for in column C of Table 2. We apologize for the confusion. In column C of Table 2, V and R stands for VGG-type and ResNet-type classifiers, respectively. We will modify the captions in the final manuscript.

R1: Inhouse dataset details We apologize for not providing enough details of the in-house dataset. To maintain anonymity during the review process, self-citation was avoided. The experimental dataset used in [5], is studied as the in-house dataset in this work.

“The in-house dataset consists of images from retrospective consecutive outpatients obtained by the dermatology department of Qingdao Women and Children’s Hospital (QWCH) in China. The data acquisition effort was approved by the institutional review board of QWCH (QFELL-YJ-2020-22 protocol). For each patient with suspected vitiligo (e.g., pityriasis alba, hypopigmented nevus), three to six clinical photographs of the affected skin areas were taken by medical assistants using a point-and-shoot camera Canon EOS 200D (as described in [5]).

We extracted 11,404 lesion images recording 1,132 patients with suspected vitiligo. Five thousand nine hundred seventy-one images with insufficient quality or duplicate lesions were excluded. The remaining 5,433 images (including 2,685 close-up and 2,748 Wood’s lamp ones) from 989 patients were provided to two board-certified dermatologists with 10- and 20-years of clinical experience. The dermatologists classified these images into two classes (vitiligo, or not vitiligo) using only image-based information. Unanimous consensus was reached for 2,201 close-up images, which formed the experimental set. The experimental set of [5] is further processed to discard images with duplicate lesions in the dataset (13 images were discarded). The remaining 2188 images were divided into train (1227), validation (308), and test (653) sets.

[1] Selvaraju, R.R., Cogswell, M., Das, A., et al., 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. CVPR, pp. 618-626. [2] Wang, H., Wang, Z., Du, M., et al., 2020. Score-CAM: Score-weighted visual explanations for convolutional neural networks. CVPR Workshops, pp. 24-25. [3] Mishra, S., Zhang, Y., Chen, et al., 2022. Data-Driven Deep Supervision for Medical Image Segmentation, IEEE Transactions on Medical Imaging, doi: 10.1109/TMI.2022.3143371. [4] Luo, W., Li, Y., Urtasun, R., et al., 2016. Understanding the effective receptive field in deep convolutional neural networks. NeurIPS, pp. 4905–4913. [5] Zhang, L, Mishra, S, Zhang, T, et al., 2021. Design and Assessment of Convolutional Neural Network Based Methods for Vitiligo Diagnosis. Frontiers in Medicine, section Dermatology, Vol. 8, Article 754202. DOI: 10.3389/fmed.2021.754202 .



back to top