Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Kyungsu Lee, Haeyun Lee, Thiago Coutinho Cavalcanti, Sewoong Kim, Georges El Fakhri, Dong Hun Lee, Jonghye Woo, Jae Youn Hwang

Abstract

Federated learning (FL) has emerged as a promising technique in the field of medical diagnosis. By distributing the same task through deep networks on mobile devices, FL has proven effective in diagnosing dermatitis, a common and easily recognizable skin disease. However, in skin disease diagnosis, FL presents challenges related to (1) generalization rather than personalization and (2) limited utilization of mobile devices. Despite its improved comprehensive diagnostic performance, skin disease diagnosis should aim for personalized diagnosis rather than centralized and generalized diagnosis, due to personal diversities and variability, such as skin color, wrinkles, and aging. To this end, we propose a novel deep learning network for personalized diagnosis in an adaptive manner, utilizing personal characteristics in diagnosing dermatitis in a mobile- and FL-based environment. Our framework, dubbed as APD-Net, achieves adaptive and personalized diagnosis using a new model design and a genetic algorithm (GA)-based fine-tuning method. APD-Net incorporates a novel architectural design that leverages personalized and centralized parameters, along with a fine-tuning method based on a modified GA to identify personal characteristics. We validate APD-Net on clinical datasets and demonstrate its superior performance compared to state-of-the-art approaches. Experimental results demonstrate that APD-Net improved personalized diagnostic accuracy by 9.9\% in dermatitis diagnosis, making it a promising tool for clinical practice.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_37

SharedIt: https://rdcu.be/dnwBx

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper suggests a federated-learning based approach that addresses skin diseases. Moreover, the authors publish a novel fluorescence dataset that covers common skin diseases (rosacea, eczema etc.)

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is generally well-written and covers an important topic of health care that is of high practical concern. The figures help the understanding of the paper and the seems to cover enough information to implement the algorithm for an actual application. The analysis features an ablation study that aims to quantify how much the proposed algorithms GA & DP effect diagnostic quality and ultimately shows that the integrated approach yields a better accuracy while also shortening the time needed to fine-tune. The plots and illustrations are comprehensive, legible and help to get the gist of the paper.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There could be more introduction w.r.t federated learning. This would help people outside the field to more easily understand how the approach works. A justification why such a “simpler” approach would not work would also be desirable.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper describes how the approach can be implemented for end-user devices, having access to the actual code would further improve the reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    There is little to criticise about the paper, a more thorough treatise of FL models would help the general understanding.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper features a federated-learning based approach for various skin-diseases. The approach is well-explained both by the text itself and by the illustrations. The ablation study shows that the specific combination of the submethods is necessary to reach the reported model fidelity.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper
    1. The development of a mobile- and FL-based learning (APD-Net) for skin disease diagnosis that achieved superior performance on skin disease diagnosis for public and custom datasets.
    2. The introduction of a customized GA for APD-Net, combined with a corresponding network architecture, resulting in improved personalized diagnostic performance as well as faster prediction time.
    3. The provision of a new fluorescence dataset for skin disease diagnosis containing 2,490 images for four classes, including Eczema, Dermatitis, Rosacea, and Normal. This dataset is made publicly available for future research in the field.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The proposed approach achieves adaptive and personalized diagnosis using a new model design and a genetic algorithm (GA)-based fine-tuning method, which improves diagnostic performance as well as faster prediction time.
    • The authors provide a new fluorescence dataset for skin disease diagnosis containing 2,490 images for four classes, including Eczema, Dermatitis, Rosacea, and Normal. This dataset is made publicly available for future research in the field.
    • The paper presents a comprehensive evaluation of the proposed approach on both public and custom datasets, demonstrating its superior performance compared to other state-of-the-art methods in skin disease diagnosis.
    • The authors provide a clear and concise description of their methodology and results, making it easy for readers to understand their approach and its implications for personalized skin diagnosis.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The proposed approach was evaluated on a relatively small dataset and may not generalize well to larger and more diverse datasets. Additionally, the proposed approach may require significant computational resources to train and fine-tune the deep learning network, which could limit its practicality in real-world settings. Finally, while the proposed approach achieved superior performance on skin disease diagnosis compared to other state-of-the-art methods, it is unclear how it would perform in clinical settings where human dermatologists are involved in the diagnosis process.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Not claimed.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The proposed approach shows promising results in personalized skin diagnosis using federated learning. However, it would be beneficial to evaluate its performance on larger and more diverse datasets to further validate its effectiveness and generalizability.
    • The proposed approach requires significant computational resources to train and fine-tune the deep learning network. It may be helpful to explore ways to optimize the computational efficiency of the approach without sacrificing its diagnostic performance.
    • While the proposed approach achieved superior performance on skin disease diagnosis compared to other state-of-the-art methods, it is unclear how it would perform in clinical settings where human dermatologists are involved in the diagnosis process. Future work could explore ways to integrate the proposed approach with human expertise to improve diagnostic accuracy and efficiency.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors could consider providing more detailed explanations of their methodology and results to help readers better understand their approach and its implications for personalized skin diagnosis. Additionally, they could discuss potential limitations or areas for improvement of their approach to encourage further research in this field.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #5

  • Please describe the contribution of the paper

    This paper proposes a personalization approach using genetic algorithm for federated learning. The method is evaluated on skin cancer classification and outperforms related (non-FL) work.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well-written and easy to follow.

    2. The idea of personalization in federated learning using evolutionary algorithms is interesting and well-motivated.

    3. The method outperforms (non-FL) related work.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The work is not novel. The combination of evolutionary algorithms for the personalization step in federated learning has been proposed before.

    2. The literature review is very limited, specifically in the federated learning area. There are a lot of personalization approaches that the authors neither cite, nor compare against. Here, I list some of them: [a] Xu, Jinjin, et al. “A federated data-driven evolutionary algorithm.” Knowledge-Based Systems 2021. b Mou, Yongli, et al. “Optimized Federated Learning on Class-Biased Distributed Data Sources.” ECML PKDD 202. [c] Yeganeh, Yousef, et al. “FedAP: Adaptive Personalization in Federated Learning for Non-IID Data.” DeCaF 2022. d Tan, Alysa Ziying, et al. “Towards personalized federated learning.” IEEE Transactions on Neural Networks and Learning Systems (2022). e Shamsian, Aviv, et al. “Personalized federated learning using hypernetworks.” ICML 2021.

    3. The method is only compared to non-federated approaches. Since the main contribution of the paper is in personalization, the method should be compared against other personalization methods (see above) in FL.

    4. The authors mention the method is evaluated on two datasets (HAM10K and ISIC). However, the two datasets are almost identical.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper seems to be reproducibe using the information in the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    See weaknesses.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    2

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The evaluation is limited.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Strengths:

    • The paper effectively introduces a federated learning-based approach for skin disease diagnosis and presents a novel fluorescence dataset, offering practical value in the healthcare field.
    • The proposed APD-Net model demonstrates superior performance in skin disease diagnosis across various datasets, showcasing its adaptive and personalized diagnostic capabilities.
    • The inclusion of a new fluorescence dataset for skin disease diagnosis and the comprehensive evaluation of the proposed approach contribute to its strength and reliability.
    • The paper is well-written, presenting the methodology and results in a clear and understandable manner.

    Weaknesses:

    1. Insufficient introduction to federated learning: The paper would benefit from providing a more detailed explanation of federated learning to aid readers unfamiliar with the field. Additionally, a justification for why a “simpler” approach would not suffice is necessary.
    2. Limited evaluation on a small dataset: Concerns arise regarding the generalizability of the proposed approach due to its evaluation on a relatively small dataset. The computational resource requirements may also limit its practicality, and the performance in clinical settings involving human dermatologists remains unclear.
    3. Lack of novelty: The work lacks novelty, as the combination of evolutionary algorithms for personalization in federated learning has been proposed previously. The literature review, particularly in the area of federated learning, is limited and overlooks other relevant personalization approaches.
    4. Limited comparison to relevant methods: The proposed approach is only compared to non-federated approaches, neglecting its main contribution of personalization within federated learning. It should be compared to other personalization methods specifically designed for federated learning.
    5. Dataset similarity: The evaluation on two datasets, HAM10K and ISIC, which are almost identical, raises concerns about the dataset’s diversity and representativeness.

    Feedback:

    1. Evaluation on larger and diverse datasets: To validate the effectiveness and generalizability of the proposed approach, it is recommended to evaluate its performance on larger and more diverse datasets.
    2. Optimization of computational resources: Exploring methods to optimize the computational efficiency of the approach without compromising diagnostic performance would enhance its practicality.
    3. Integration with human expertise: Future work could focus on integrating the proposed approach with human expertise in clinical settings to improve diagnostic accuracy and efficiency through collaborative decision-making frameworks.




Author Feedback

  1. Related works and comparative models (Rvw #1/ Rvw #5-3-2,3)

We acknowledge the lack of sufficient descriptions of prior work and comparative analysis of FL models in our manuscript. Although the initial manuscript included relevant points, the comparisons were unfortunately omitted due to page limits. So, as suggested, we conducted experiments and included prior work [b, d, and e] that Rvw3 specifically noted. We will include the experiments, literature studies, and provide simple illustrations as shown below:

where the performance compared to our framework with respect to Table 3, Fig 5, and prediction time. We will also update the related work and experimental results accordingly.

  1. Datasets (Rvw #3-6-1,3/Rvw #5-3-4/Meta-1,3)

We agree that HAM is a subset of ISIC with similar characteristics. However, in a clinical setting, it is possible that two different individual hospitals may not exhibit distinctly different data features. There are cases where similar characteristics can be observed depending on certain conditions, such as similar patients or devices. Considering the conditions, we included HAM in our experimental settings. However, we have verified that the data in HAM and ISIC do not perfectly overlap.

Additionally, our paper mentioned the use of an additional “7pt” and a “custom dataset”. We are confident that using four datasets can demonstrate the performance of our model in common skin diseases, leading to its generalization. Furthermore, we obtained the custom data under IRB approval and successfully applied our framework in a real-world clinical scenario. By applying our framework to a real-world dataset, we explicitly demonstrate its potential for appropriate use in real-world clinical settings.

  1. Computational Complexity (Rvw #3-6-2/ Meta-2)

One of the contributions of our paper is that we significantly shorten the fine-tuning time by using an evolutionary/genetic algorithm. Since the GA method is free from the usage of GPU resources, we couldn’t directly compare it with deep learning models, such as the number of parameters and computing resources. However, we compared the relative prediction time in a mobile environment (see Fig. 5). Experimental results show that when compared with domain adaptation (Y. Gu) or test-time adaptation (Lee) networks for personalization, our framework achieved high diagnostic performance with high efficiency, reducing the fine-tuning and prediction time by about 25%. Additionally, our GA method applies simple methodologies in the computing architecture field, such as multi-threading and scheduler techniques, allowing fast fine-tuning and predictions even in mobile environments. We anticipate that by incorporating computing performance enhancement methodologies from related fields or employing lightweight techniques like quantization, it is possible to achieve even faster prediction speeds. In the manuscript, we will add this as future work.

  1. Novelty (Rvw #5-3-1)

The strengths of our paper would be beyond the simple combination of EA and FL, and can be summarized as follows: (1) The organic and novel design of the network structure for EA and novel mathematical modeling of Genes, (2) a novel mathematical design of fitness function for EA and network architecture, (3) superior diagnostic performance in various skin diseases including a real-world application, (4) acquisition and publication of the custom clinical dataset for skin disease in a real-world clinical scenario, and (5) implementation of mobile-based application designed for low computational cost yet high efficiency for EA and FL. We kindly request the reviewers to carefully evaluate the novel mathematical modeling approach for integrating the EA and FL models. We believe our approach goes beyond a simple combination and offers unique contributions to the field.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors’ efforts in incorporating the additional analyses and addressing the reviewers’ suggestions have greatly improved the quality and relevance of the work. We recommend the acceptance of the paper



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper has several major weakness, especially limited novelty in methodology and limited comparisons with related/previous methods using small datasets. While 2 reviewers gave a good score but their confidence to reviews are low, and they also noted several important weaknesses of the work. Reviewer #5 correctly pointed out that this paper ignored a large body of work in the area of personalized federated learning, which were not discussed, nor compared with. This is a big flaw and thus this work is outside of the SOTA of the topic. It is therefore below the standards of acceptance.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents APD-Net, an innovative mobile- and Federated Learning (FL)-based approach to skin disease diagnosis. It also introduces a custom Genetic Algorithm (GA) for APD-Net, aligned with a suitable network architecture, resulting in enhanced personalized diagnostic performance and reduced prediction time. The study is further enriched by the introduction of a new fluorescence dataset for skin disease diagnosis, encompassing 2,490 images across four classes: Eczema, Dermatitis, Rosacea, and Normal, which is publicly available for future research.

    The manuscript is commendable for its multiple strong points: it not only proposes a pioneering FL-based approach for skin disease diagnosis and provides a distinctive fluorescence dataset, but also demonstrates superior diagnostic performance of the proposed APD-Net model across diverse datasets. The paper is well-structured and articulately written, presenting the methodology and results in a lucid manner.

    Nonetheless, there are specific limitations that warrant attention. A more detailed introduction to FL would better assist readers less familiar with the field, and justification for why more rudimentary alternatives are inadequate is necessary. The method’s evaluation is based on a relatively small dataset, which may cast doubt on the generalizability of the proposed approach. The resource requirements could potentially constrain its practical applicability, and its performance in a clinical setting involving human dermatologists remains ambiguous. There are also questions regarding the study’s novelty, given the previous introduction of a blend of evolutionary algorithms for personalization in FL. Furthermore, the paper could benefit from a more thorough literature review, particularly regarding personalization approaches in the FL field. The method’s comparison to non-FL methods overlooks the need for comparison to other personalization methods specifically designed for FL. Finally, concerns about the diversity and representativeness of the chosen datasets arise due to their similarity.

    In their rebuttal, the authors addressed these concerns. They defended their choice of datasets, arguing that data from two different hospitals in a clinical setting might not exhibit markedly different features, and the datasets used (HAM and ISIC) do not perfectly overlap. They asserted that the use of four datasets, including an additional “7pt” and a “custom dataset,” could effectively showcase the model’s performance in diagnosing common skin diseases, thus aiding its generalization. They also underscored the real-world applicability of their framework, demonstrating its potential in practical clinical scenarios.

    Concerning computational complexity, the authors emphasize their contribution in significantly reducing fine-tuning time through a GA. They illustrated the relative prediction time in a mobile environment and demonstrated that their framework achieved a roughly 25% reduction in fine-tuning and prediction time compared to other networks. They also suggested the possibility of achieving even faster prediction speeds through the adoption of computing performance enhancement methodologies.

    Addressing concerns about the novelty, the authors contended that their paper’s strengths transcend a simple combination of EA and FL. They underscored the novel design of the network structure for EA, the mathematical modeling of Genes, superior diagnostic performance in various skin diseases, acquisition and publication of a custom clinical dataset, and the creation of a mobile-based application designed for low computational cost and high efficiency for EA and FL.

    Given the substantial strengths of the paper, its significant application in healthcare, and the authors’ thoughtful rebuttals and proposed revisions, I am persuaded to support the acceptance of the paper.



back to top