Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Nicolas Wagner, Moritz Fuchs, Yuri Tolkach, Anirban Mukhopadhyay

Abstract

Although deep federated learning has received much attention in recent years, progress has been made mainly in the context of natural images and barely for computational pathology. However, deep federated learning is an opportunity to create datasets that reflect the data diversity of many laboratories. Further, the effort of dataset construction can be divided among many. Unfortunately, existing algorithms cannot be easily applied to computational pathology since previous work presupposes that data distributions of laboratories must be similar. This is an unlikely assumption, mainly since different laboratories have different staining styles. As a solution, we propose BottleGAN, a generative model that can computationally align the staining styles of many laboratories and can be trained in a privacy-preserving manner to foster federated learning in computational pathology. We construct a heterogenic multi-institutional dataset based on the PESO segmentation dataset and improve the IOU by 42% compared to existing federated learning algorithms. An implementation of BottleGAN is available at https://github.com/MECLabTUDA/BottleGAN.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_2

SharedIt: https://rdcu.be/cVRqZ

Link to the code repository

https://github.com/MECLabTUDA/BottleGAN

Link to the dataset(s)

https://zenodo.org/record/1485967

Reviews

Review #1

Please describe the contribution of the paper

The paper proposes a method to account for stain heterogeneity across different sites whilst training using Federated Learning. The method includes a generative model, known as BottleGAN,
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed method only includes 1x1 convolutions, which appears to be well-suited to the task of stain normalization. This is an interesting adaptation of more generic architectures to this problem, with benefits of fewer parameters to be shared. The Many to one to Many formulation of the GAN seems to be a novel approach that offers benefits over other approaches that do not scale as efficiently with more stain varieties. The displayed results certainly seem to be compelling when compared to simpler FedAvgM approaches.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Despite the claims there have been papers that have shown decent results for FL in non IID settings (e.g. arXiv:2009.01871v3) . There are different types of non IID that could have different effects on FL training and this is not really explored in the paper. Little detail is provided on the actual training task. Labels and annotations are mentioned but it is not clear what role they play in the training. “…architectures like U-Net probably process a pixel differently depending on its position within a crop” - U-Net often ingesting larger tiles than they predict to get around this. The fact that the styles seem to be imposed artificially on the client images undermines the credibility of the value of this - especially with small numbers of WSIs at each client site. Very little detail on the actual training (e.g. local epochs, batch sizes, tools used, loss functions, optimizers etc.) Some comparison with FixMatch and FedAvG but there are lot of factors at play in this setup and it would have been much stronger to have done more rigorous ablation studies. Not sure about the need for the central server to have its own public dataset. What would happen if each of the client’s simply used this too? Having both a local and a central training cycle could leave all the client hardware underutilised for long periods.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

They say that the code will be made available, but from the paper itself it would difficult to reproduce anything. The dataset used is already public.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

I think that the original motivation for the paper is reasonable, but much of the evidence provided to back this up is somewhat convenient/selective. There are many stain-normalisation techniques and data augmentation techniques out there which are orthogonal to Federated Learning and it is not clear what inadequacies these might have that BottleGAN does not (and why). This is always the risk when combining techniques (GAN, SSL, FL) - that you obfuscate where the unique benefit is coming from. What is unfortunate is that there might be something useful and interesting in this work, but it has not been sufficiently teased out and rigorously tested.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

4
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

A combination of too many missing pieces and weak assertions. Although the paper makes for a reasonable read, it does not seem to meet the standards I have come to expect of MICCAI publications
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

5
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Not Answered
[Post rebuttal] Please justify your decision

Not Answered

Review #2

Please describe the contribution of the paper

This paper proposed BottleGAN for stain normalization in an unsupervised way, and further integrated BottleGAN into WA-based FL. The experiments outperform on conventional FL algorithms.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

this work introduced a GAN-based architecture for staining style transfer, and combining Federated learning for across laboratory training in a privacy-preserving manner
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1: Since the proposed BottleGAN network can explicitly transfer staining style, what is the advantage of integration of BottleGAN into WA-based FL? 2: In this paper, it said that BottleGAN network to learn staining style transfer with linear growth, it is suggested to add a training time for comparison with other GAN network. 3: In 3.2, author said the proposed architecture is entirely independent of the size of the input image, how to make it, it needs to make clarification?
1. In the experiments, this paper is to solve stanning style normalization, why choose IOU evaluation criteria for performance evaluation？There are no statements in the manuscript.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors has clarified the reproducibility，for all code related to this work that they will release if this work is accepted
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
1: Since the proposed BottleGAN network can explicitly transfer staining style, what is the advantage of integration of BottleGAN into WA-based FL? 2: In this paper, it said that BottleGAN network to learn staining style transfer with linear growth, it is suggested to add a training time for comparison with other GAN network. 3: In 3.2, author said the proposed architecture is entirely independent of the size of the input image, how to make it, it needs to make clarification?
1. In the experiments, this paper is to solve stanning style normalization, why choose IOU evaluation criteria for performance evaluation？There are no statements in the manuscript.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

novelty of idea and experiments
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

4
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Not Answered
[Post rebuttal] Please justify your decision

Not Answered

Review #3

Please describe the contribution of the paper

This paper presents a BottleGAN generative model for computational alignment of staining styles of many laboratories. The purpose is to apply deep federated learning in computational pathology for creation of datasets that reflect diversities of many laboratories. That is expected to provide a vast amount of training data for deep networks, and that is prerequisite for computer-aided diagnosis, prognostication and assessment.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The idea to apply federal learning paradigm to computational pathology can really be of great practical importance. In this regard, the authors offered solution for the major obstacle: how to solve problem created by different privacy-protected staining-styles protocols in laboratories supposed to cooperate in creating large datasets necessary for deep networks training.

The main novelty is architecture of BottleGAN following Many-One-Many paradigm. That allows staining style transfers between clients using only two generators and two discriminators. That is in a strong contrast with the existing solutions such as Stain-GAN and Star-GAN-based. For K clients, they respectively require K and K^2 generator-discriminator pairs. Regarding generator architecture, it is based on 1x1 CNN without pooling and skip connections. Since there is no long-distance correlation modeling between the pixels, the architecture is independent on the size of the input image. Two discriminators have different roles. First one to decide whether an image is destained or reference stained, and second one on a particular staining style.

The main strength of proposed concept is capability to implement federated learning paradigm in computational pathology by solving problem where different laboratories (clients) have their private (non-shareable) staining styles.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The main strength of proposed FL BottleGAN concept seems to be also its main weakness (in a relative sense). The concept requires a shareable public dataset of reference stained or destained whole slide images owned by a server. It is not certain whether this assumption can fulfilled in some real-world privacy concerned scenario.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors promised availability of the code. Details related to the network architecture are provided in a supplement associated with the paper.

Proposed concept is evaluated on the public PESO dataset of prostate specimens. It contains 102 hematoxylin and eosin stained whole slide images with corresponding segmentation masks.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

The idea to apply federal learning paradigm to computational pathology can really be of great practical importance. In this regard, the authors offered solution for the major obstacle: how to solve problem created by different privacy-protected staining-styles protocols in laboratories supposed to cooperate in creating large datasets necessary for deep networks training.

The new original architecture of the Bottle-GAN staining-destaining network is also novel and creative contribution. Its demonstration for different staining styles transformations to a reference staining style and, afterward, destaining is impressive.

Thus, to make proposed concept closer to the application in practice, authors should try to evaluate it on more datasets. In particular, in my view, it is important to verify whether assumption on availability of a shareable public dataset of reference stained or destained whole slide images (WSIs) owned by a server is met in practice. It is not certain whether this assumption can fulfilled in some real-world privacy concerned scenario. It is also important, since it is not totally clear to me, whether publicly shareable datset have to be composed of destained WSIs or reference stained WSIs or both?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper presents a BottleGAN generative model for computational alignment of staining styles of many laboratories. The purpose is to apply deep federated learning in computational pathology for creation of datasets that reflect diversities of many laboratories. This can really be of great practical importance. In this regard, the authors offered solution for the major obstacles: how to solve problem created by different privacy-protected staining-styles protocols in laboratories supposed to cooperate in creating large datasets necessary for deep networks training.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

6
[Post rebuttal] Please justify your decision

As I commented in my review, the main weakness of proposed FL BottleGAN concept seems to be its requirement related to existence of the shareable public dataset of reference stained or destained WSIs owned by a server. It is not certain whether this assumption can be fulfilled in some real-world privacy concerned scenario. After carefully reading the rebuttal I did not see that authors provided an answer to my concern. That is why my decision characterizes the concept with moderate weakness.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper proposes BottleGAN, a generative model that can computationally align the staining styles of many laboratories and can be trained in a federated learning manner. Whilst reviewers find the study interesting, particularly the many-one-many style transfer setting and 1x1 convolutions, they have concerns in 1) the role of cenrtal server and how to mange the hardware during the local and central training (R1, R3), 2) the advantage of integration of BottleGAN into WA-based FL and the training time comparison (R2), 3) why IOU evaluation (R2). Authors need to address these issues in their rebuttal.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

4

Author Feedback

We thank all the reviewers and the meta reviewer for their kind reviews (especially for appreciating the many-one-many style transfer concept) and constructive comments.

@MR@R1@R3 Central server & hardware management & data availability. Without a central server, all clients would have to distill all staining styles redundantly. This is a very computationally intensive step. Realistically, we do not expect every client to have access to large GPU servers. In our architecture, even clients with sub-par hardware can participate since they only need to train their local BottleGAN once. In addition, the communication overhead would significantly increase since each client would have to send its local BottleGAN to every other client instead of once to the server. This communication scheme is much more challenging to implement in practice.
We simulate that only a percentage of all clients participate in every round. This is a common assumption in FL literature to allow clients not to wait in an idle state for the next round but to participate only when available. In our experiments, we used a single WSI as the public dataset. Such a WSI, according to our experience, will always be available through teaching examples alone and does not need to be destained.

@MR@R2 BottleGAN into WA-based FL & IOU. With the absence of widely accepted SOTA metrics of accuracy, we consider success in downstream tasks as the best performance indicator for stain normalization. Segmentation of histological objects with WA-based FL is an example of such (from clinical perspective extremely common) downstream tasks, with IoU being a typical metric. Additionally, as working with large WSI datasets can be very inconvenient, we showed that BottleGAN can also be used online and does not require an offline generation of large datasets beforehand. Importantly, BottleGAN is a flexible building block of virtually any other FL algorithm or downstream task.

@MR@R2 Run times. It is challenging to compare runtimes reliably as we would need to pick one evaluation metric as a stop target missing out on the other metrics. To overcome this inherent challenge, we compared the evaluation metrics after a pre-specified runtime. This demonstrates the advantages of the linear growth in the number of mappings of BottleGAN in contrast to other studied architectures (outlined in Table 2).

@R1@R2 Size independence. We only use 1x1 kernels and a global embedding. With this approach, we guarantee that single pixels are always processed uniformly, independent of the position in a processed patch. Importantly, U-Nets fails to provide this guarantee which, in our opinion, is a vital property necessary for reliability of the model.

@R1 Sources of IID. We put emphasis on the simulation of real-world conditions, namely different amounts of data and different amounts of labels among participating clients, different client participation schemes, typical inter- and intra-institutional variability of staining styles, common artifacts in DP, and clients with only non-labeled data. These assumptions of IID are much stronger than usually tested in other works on FL and cover most sources of variations that have been identified, e.g., by Schömig-Markiefka et al. [26]

@R1 Staining styles. We use staining styles from a subspace of the real ones and apply ‘unnatural’ nonlinear changes, which vigorously tests the capabilities of BottleGAN in extreme setups. These extreme cases are rather common real-world implementations.

@R1 Other methods. We test against the U-Generator, i.e. the SOTA in stain normalization given by StainGAN, and use common data augmentations and digital pathology artifacts to test against the SOTA for SSL, FixMatch. Hence, we expect to have compared against a reasonably broad spectrum. We want to emphasize our focus on federated computing and that non-federated methods can not be applied canonically.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal address most of the concerns and I also feel the merits of the paper overweight the weaknesses. Therefore I recommend to accept the paper
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

<5

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper explores federated learning and GAN for stain normalization. It tackles a practical problem that each laboratory probably has different privacy-protected staining-styles protocols. The proposed method is computationally efficient and performs good in downstream tasks. The rebuttal has addressed most of the reviewers’ concerns, but some remain such as how to ensure the pre-requisite shareable public dataset in practical. Also, it is suggested that the authors include more details about the training for reproducibility.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

5

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper presents a novel BottleGAN method to address the stain normalization for digital pathology via federated learning. In the first-round review, the concerns focused on the local and global hardware management and the rigor of experimental design. I think the rebuttal addressed most of the concerns. However, the limitation of this work is that the experimental evaluation is limited with very few baseline methods. However, as this study is one the pioneer works in this problem setting, the limited experiments might be acceptable. Thus, my recommendation leans towards acceptance.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

6

back to top

Federated Stain Normalization for Computational Pathology