Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Gagandeep B. Daroach, Savannah R. Duenweg, Michael Brehler, Allison K. Lowman, Kenneth A. Iczkowski, Kenneth M. Jacobsohn, Josiah A. Yoder, Peter S. LaViolette

Abstract

The latent space of a generative adversarial network (GAN) may model pathologically-significant semantics with unsupervised learning. To explore this phenomenon, we trained and tested a StyleGAN2 on a high quality prostate histology dataset covering the prostate cancer (PCa) diagnostic spectrum. Our pathologist annotated synthetic images to identify learned PCa regions in the GAN latent space. New points were drawn from these regions, synthesized into images, and given to a pathologist for annotation. 77% of the new points received the same annotation, and 98% of the latent points received the same or adjacent diagnostic stage annotation. This confirms the GAN network can accurately disentangle and model PCa features without exposure to labels in the training process.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_39

SharedIt: https://rdcu.be/cVRrW

Link to the code repository

https://github.com/NVlabs/stylegan2

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    This work trained and tested StyleGAN2 on prostate histology dataset to generate new prostate cancer images. These images were draw from the GAN latent spaces, and the author demonstrate that the latent space learned by GAN can accurately disentangle and model prostate cancer features without exposure to labels in the training process.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The paper is well written and easy to follow with. (2) The authors asked independent pathologist to evaluate the generated images, who provides professional opinion from clinical view. (3) The paper provides limitations of their work.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The major weakness of this paper is the clinical application of this approach. Generating new pathological images using GAN is not new. Though this paper demonstrate that the learned latent space by StyleGAN2 is able to disentangle different prostate cancer features, but how we should use this technique and whether it is beneficial to the clinics is not clear to me. Some potential application might be: (1) use the generated images for some downstream tasks such as PCs classification or segmentation; however some previous studies show that this approach is not very effective; (2) treat this technique as an approach to anonymize patient’s data; however the accuracy of the generated images labeled by the pathologists v.s the latent cluster is not good enough; especially when we see the big discrepancy of G3 and G4FG.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors provide some details of the training process; StyleGAN2 code is public available; however the training dataset seems un-available to the public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Clinical application of this approach is not clear. I would suggest the author to provide some evidence to show that this approach can be beneficial to some clinical tasks.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major consideration for the recommendation is the lack of clinical application of this approach.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Thanks to the authors for the response. The authors provide several potential clinical applications for the proposed technique. Given the exploratory nature of the work and the technical focus of MICCAI, I think the response is reasonable. Therefore, I would raise my score to weak accept. Nevertheless, the reproducibility of the work is still challenging, and the authors don’t answer directly whether their methods can be reproduced in a public dataset such as PANDA.



Review #3

  • Please describe the contribution of the paper

    A GAN network was trained and then the network accurately modelled Prostate Cancer features without exposure to labels in the training process.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. A novel approach to produce and evaluate learnt latent features in PCa;
    2. An evaluation by a pathologist with annotations;
    3. Challenging indication (PCa) and relevant problem (synthetic data generation and validation).
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The dataset is tiny especially the evaluated one (160 256x256 patches, right?);
    2. The paper would contribute from public repo with the code otherwise it’d be hard to reproduce;
    3. Why not validate the results on public data (e.g., PANDA Kaggle competition data)?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Without the code and validation on public data (PANDA dataset) it’d be hard to validate the results. Also, some technical details are missing, for example, how GAN was trained?

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    It’d be great it was possible to validate the method on public data (PANDA dataset).

    Also, some technical details are missing, for example, how GAN was trained? More details on the training process would be useful to share, ideally the code.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think it’s a very challenging indication (prostate cancer) and important problem of generating synthetic data and validating it by pathologists.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #4

  • Please describe the contribution of the paper

    This papers describes an experiment showing that the latent space of a deep generative model (StyleGAN2) can contain structures that reflect clinically relevant information (here, Gleason score grading). The model was trained in an unsupervised manner on image tiles (digital pathology). Random samples from the learned latent space were generated and annotated in a first round. From these landmarks, cluster regions in latent space are estimated via PCA. To validate the approach, the authors then generated samples from the latent clusters and let a pathologist annotate them. They found a considerable agreement in the respective grades (exact match or neighboring category).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper describes a very interesting approach
    • The finding that there seems to be “semantic” structure in the latent space of an unsupervised generative model that can in principle be used for downstream classification opens some interesting directions for further research for digital pathology.
    • For a first proof-of-principle study, I think the paper has some encouraging results.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper as it stands doesn’t give any direct insights (e.g. via visualization) into the latent space structure or its geometry, which would be very interesting to see
    • Not all technical details or general ideas behind the methods are explained well enough.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Would be nice if the trained model could be made available if possible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • A general comment: I think not all choices or ideas are clearly explained in the paper. It would be good to check the manuscript in general and try to explain all ideas and assumptions as clearly as possible.

    • Could you briefly explain why you used annotated samples in the first place to train the model? Was the annotation necessary to have a balanced dataset for the unsupervised training part? (Maybe I have missed it while reading)

    • p2. “Finding the latent point from an image – whether real or synethetic – is known as GAN inversion. While recent work improves GAN inversion [19,7,22], we found these approaches not pixel accurate on histology.” Could you explain this a bit clearer? What was the problem with those images?

    • p2 “Considering all the points sharing the same category, we apply principal component analysis (PCA) to describe the variation of these points within a unimodal latent cluster.” What was the (geometric) intuition behind this idea? Could you briefly explain it?

    • p.6 “Each latent point was truncated toward the mean of entire latent space using factor of psi=0.6 [15], reducing the number of unrepresentative features within the cluster while preserving diversity.” Also here, could you briefly explain why you do this?

    • p/.6 “In addition to the Z channel, the StyleGAN network has a random noise channel that influences the layout of features within the image [8]. In generating images within a category, the noise channel was fixed so that the layout of glands, nuclei, etc, in the images would remain fixed while the classification of the images changed.” A few, brief explanations here would help: What are the different components of the model, what kind of information do (we think) they capture?

    • p.7 “Second, the GAN provides a quantitative approach to comparing Gleason grades. Gleason patterns are often arranged from least-cancerous to most-cancerous tissues. The categories confused in Figure 2 (b) follow this same scheme” This is a very interesting finding. Have you tried to visualize the geometry (e.g. via dimensionality reduction) of the latent space to see if it fits your interpretation?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I like the idea in principle. I think the paper as it is needs to be improved in terms of clarity.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Multiple concerns were raised with regard to the clinical utility of the proposed work, as well as the lack of details in the experimental design. Please clarify these aspects in the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    12




Author Feedback

One reviewer raised concerns on the clinical applicability of GANs. We address this first before addressing the concerns on reproducibility expressed by all three authors.

Quoting the concern: “Clinical application of this approach is not clear. I would suggest the author to provide some evidence to show that this approach can be beneficial to some clinical tasks.”

This view is understandable. As the reviewer noted, GANs may not add significant value as data augmentation techniques. Yet evolving evidence shows GANs will contribute to clinical AI in other ways. We elaborate here on those mentioned in our submission.

The clinical applications stem from our core contribution: pathologist validation of the latent space clusters. Knowing these latent space representations of different cancer grades will allow us to model the transition between them, potentially exposing mechanisms of cancer progression unknown to science.

GANs learn the structure of the image space without expensive expert annotations. Trained on tens of thousands of images, only a few hundred pathologist annotations clustered the latent space. The unsupervised nature of training implies the GAN must engineer features automatically. We confirmed learned correlations with standard prostatic pathological practice (Gleason System), and in future recommend investigation of additional learned correlations to boost clinical methods, as discussed in our conclusion. Also, being able to interpolate may allow new 3D histology interpolation schemes that will need accurate GAN interpolation.

Critical to interpretable AI, GANs can visually demonstrate human-interpretable variations to an image that change classification decisions. Unlike a CNN producing the same latent code for many images, a GAN produces a unique image for every point in its latent space. When making a medical decision based on a latent space, pathologists have more than a set of arbitrary numbers. Instead, they can explore feature correlations in both latent space (via tuning across subspaces defined by covariance) and histology space (by producing a full image) to draw conclusions with real patient data. The strength of this process is boosted with larger cohorts and pathologist latent annotations, as mentioned in our limitations. As separate studies improve GAN inversion, real histology can be directly inverted into this latent space and analyzed.

Capturing abstract underlying structures, GANs show promise as systems capable of image domain translation. For example in stained histology, discrete diagnostic latent spaces between stain modalities may exist. In digital pathology, AI systems could translate MRI images into histology images through the GAN latent space.

Each of the reviewers expressed concerns about different aspects of reproducibility in the paper. The largest concerns were that the code and data used in the paper be publicly available.

As stated in the paper, the primary code-base for the GAN and training are publicly-available: “The StyleGAN2 Configuration F network architecture and training loop were selected from the author’s [15] Tensorflow source code without any adjustments to the parameters”. The network topology, optimizer, learning rate, loss functions, validation metrics, multi-gpu training loop, and data preparation scripts were cloned from this code-base. We made slight changes to the data scripts for our python dictionary image set definitions. The training required exclusive use of a Nvidia DGX1 compute station in the Milwaukee School of Engineering supercomputer for 2 weeks.

Our addition to this code-base is our technique for representing clusters in latent-space. The pseudo-code for doing this is provided by eq. 1 in the paper.

The whole slide image histology training dataset comes from a Medical College of Wisconsin lab group and is to remain private at this time. The trained model cannot be shared at this time, due to its ability to reproduce the training dataset.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    While some of the clinical applications were provided in the rebuttal, it is still not clear how the proposed approach will impact patient care. The applications listed in the rebuttal seem more speculative, partly because GANs are known to “create” new data which may not be validated against “true” pathology. Also as pointed by the other ACs, the concerns surrounding reproducibility are not sufficiently addressed. My final rating is based on these grounds.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    10



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors rebuttal mitigates the reviewers’ concerns on the clinical relevance of the work. However, another major concern about the reproducibility of the work remains. The authors provided a bit more details on model training, but didn’t clarify why the publication PANDA dataset wasn’t used for validation. There is no promise on sharing the source code from the work.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    10



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The novelty is considered limited to empirically exploring GANs latent spaces to reveal pathologically semantic information. The rebuttal addressed the clinical relevance of the work, however it failed to address reproducibility concerns raised by the reviewers. The work is not solid enough for MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



Meta-review #4

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    There was a divergence of recommendations between the reviewers and ACs. While the AC recommendations were unanimous in rejection, the reviewers expressed consensus in supporting acceptance including one reviewer raised score to support acceptance after rebuttal. The PCs thus assessed the paper reviews, meta-reviews, the rebuttal, and the submission. One primary concern raised before rebuttal was clinical application of the presented work, which the reviewer felt to be adequately addressed and raised score to reflect that. The remaining main concern was reproducibility of the method, which constituted the main reason for the rejection recommendations by the ACs. The PCs felt that the authors gave an understandable response to the reproducibility consideration, including the availability of the base code and the rationale of why the private dataset and the train models cannot be released at the moment. While less than ideal, the PCs agreed that the issue of reproducibility is outweighed by the merits of the paper such as the importance and challenging nature of the problem, the novelty of the method, and the inclusion of pathologists in evaluation, as appreciated by the reviewers. The final decision of the paper is thus accept.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



back to top