List of Papers By topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Matteo Pennisi, Federica Proietto Salanitri, Giovanni Bellitto, Simone Palazzo, Ulas Bagci, Concetto Spampinato
Abstract
Generative Adversarial Networks (GANs) have demonstrated their ability to generate synthetic samples that match a target distribution. However, from a privacy perspective, using GANs as a proxy for data sharing is not a safe solution, as they tend to embed near-duplicates of real samples in the latent space. Recent works, inspired by k-anonymity principles, address this issue through sample aggregation in the latent space, with the drawback of reducing the dataset by a factor of k. Our work aims to mitigate this problem by proposing a latent space navigation strategy able to generate diverse synthetic samples that may support effective training of deep models, while addressing privacy concerns in a principled way. Our approach leverages an auxiliary identity classifier as a guide to non-linearly walk between points in the latent space, minimizing the risk of collision with near-duplicates of real samples. We empirically demonstrate that, given any random pair of points in the latent space, our walking strategy is safer than linear interpolation. We then test our path-finding strategy combined to k-same methods and demonstrate, on two benchmarks for tuberculosis and diabetic retinopathy classification, that training a model using samples generated by our approach mitigate drops in performance, while keeping privacy preservation.
Link to paper
DOI: https://doi.org/10.1007/978-3-031-43898-1_41
SharedIt: https://rdcu.be/dnwBB
Link to the code repository
https://github.com/perceivelab/PLAN
Link to the dataset(s)
N/A
Reviews
Review #3
- Please describe the contribution of the paper
The authors suggest PLAN, a strategy for privacy preserving synthesis of data using generative adversarial networks based on latent walks. They propose a training strategy enforcing privacy, class consistency and diversity of generated data. They illustrate the effectiveness of their approach on two datasets.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is very well written with detailed description of the method, evaluation and procedures.
- Privacy is an important topic especially in combination with generative models that are known to embed real samples
- The proposed formulation to synthesize privacy preserving samples is novel and outperforms current methods
- They address the shortcomings of the current approach to privacy in generative models and show superior performance of the proposed method on two datasets
- The performance of the model is close to real training data while being privacy preserving
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- No major weaknesses
- Please rate the clarity and organization of this paper
Excellent
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Reproducibility reported by the authors is accurate
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- How are initial points sampled? E.g. how is it ensured that they are in the same class so that interpolating between them while ensuring class consistency makes sense?
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
7
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Well written paper with no to minor weaknesses concerning a highly relevant topic to the community. They perform a well devised evaluation of the proposed method showing performance loss mitigations of previous methods while preserving privacy.
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #4
- Please describe the contribution of the paper
The paper presents a strategy for GANs that aims to generate diverse synthetic samples addressing privacy concerns. Their method uses an auxiliary identity classifier that guides a non-linear walk between points in the latent space, aiming to reduce the risk of revealing near-duplicates of real samples.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper is well written with some minor errors. The method is interesting and could potentially impact the privacy preservation in a fundamental manner. They use two medical image darasets: tuberculosis classification and diabetic retinopathy classification. I think Fig 2 is an interesiting diagram, that shows the method is following a trajectory in the latent space that could potentially have more quality and be better for the downstream tasks.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The literature review is very limitted, and it focuses on only the manupulation of the image generation in the latent space, however they do not show how they are compared with methods like differential privacy. The authors claimed that the generated images are of high quality, but the results in the table 1 supports their claims for the classifier but FID does not confirm their hypothesis fully. The idea is interesting, but the selling point of the paper is production of high quality images while preserving privacy for a downstream task, however, it is not compared with neither image generator models and privacy preserving techniques. Therefore the empirial support is limitted. The authors reported LPIPS in Fig 2, why not reporting LPIPS and other quality metrics in the tables? It appears that it has enough space for that.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The information that the authors provide is satisfactory, however thecode is not provided.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
More methods of image generation and privacy preserving methods is needed to be added to the results section. The metrics and if possible experts reports could be beneficial. Fig 2 could be part of the supplementary material.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
3
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
It is not compared with the relevant works that aim the same objectives.
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #5
- Please describe the contribution of the paper
The paper proposes a novel GAN-based method to generate synthetic datasets that preserve privacy. They propose a non-linear sampling in the latent space to avoid near-duplicates of real samples. The authors evaluate their method on classification tasks for tuberculosis and diabetic retinopathy.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-The paper is very well motivated, well written and easy to follow. -The paper addresses an issue (data privacy) that is of importance in medical applications. -The proposed method is easy to implement into an existing GAN framework. -The method is well described and evaluated. A comparison to other methods is performed.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
-The authors state that “To the best of our knowledge, no method has tackled the problem from a privacy-preservation standpoint”. However, in my understanding, [15] tackles a similar problem. Where is the difference? -It is not clear why the “Equidistance” objective (Equation 1) leads to the desired output. Can we be sure that equidistant points in the latent space produce synthetic images that differ enough? If so, why? -We train a network for identity classification. How many images do we have that share the same identity? What happens if we only have one image per identity? -In Table 1, what does the “accuracy” score refer to? Is this the accuracy of phi_down? Please indicate this. -What scores are reported in Table 2? Please indicate this in the caption.
- Please rate the clarity and organization of this paper
Excellent
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The datasets used are publicly available. The hyperparameters and training details are described in the paper. They promise to release the code.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
-Please refer to the points listed under “weaknesses”. -The proposed method was tested where the generative model is a GAN. Could it also be applied if the generative model is a denoising diffusion model? Please comment on this.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
6
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper tackles a problem that is of importance in medical applications. It is well presented and well evaluated, only with minor flaws that are addressed under “weaknesses”. The results seem to be convincing.
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
The work is tackling an important challenge of data privacy in medical applications. The proposed idea is interesting and novel and has been well described and evaluated. The authors are suggested to clarify on its comparison against [15] and other methods in differential privacy, and also to clarify the reason for not comparing with image generator models and privacy preserving techniques. The authors would also need to provide more details on Eq.1 and provide some further clarifications on the results in Table 1.
Author Feedback
We would like to thank the reviewers (R3, R4, R5) and meta-reviewer for their valuable feedback. We are pleased to acknowledge their satisfaction with the overall quality of the paper. However, reviewers raised a few concerns that the meta-reviewer asked us to address in the rebuttal:
R3 (Strong accept)
- Initial points sampling. Since GAN is conditioned we know beforehand that the initial and final points share the same class. The navigation strategy then guarantees that, along the trajectory, the class remains the same through class-consistency loss.
R4 (Reject)
- Comparison to differential privacy methods. Our approach stands out by offering an alternative methodology to traditional privacy preservation techniques like differential privacy. While differential privacy focuses on incorporating privacy guarantees during the training process, our approach takes a post-hoc perspective as it aims to preserve privacy by manipulating the latent space of a pre-trained GAN. Furthermore, methods based on differential privacy cannot be applied to medical images as they disentangle identity and non-identity attributes from the learned latent spaces and only non-identity are preserved. While this may make sense on face images, in medical images there is not a clear distinction between the two sets of attributes.
- High quality image generation. Similarly, our approach is not meant to be a new generative model. The idea revolves around navigating the latent space of any pre-trained GAN, by manipulating the latent vectors in order to control the generated images’ characteristics while ensuring privacy preservation. This allows us to retain the high visual quality as demonstrated in Tab. 2, where the FID score after applying our approach closely matches the FID score of the original GAN. In summary, the main advantages of our approach w.r.t. existing privacy preserving and generative methods lie in its post-hoc nature and its suitability for being applied to medical data. It offers a flexible and adaptable approach to privacy preservation that can be applied to all generative methods that learn a latent space, such as diffusion models.
R5 (Accept)
- The paper in [15] (published in a minor venue) proposes a face de-identification approach using generative deep neural networks. Although our work acknowledges its relevance in the related works section, there are notable distinctions between our method and it. The goal of [15] is to achieve k-anonymity for face de-identification, where facial features are obscured to protect individual identity and prevent face recognition. This is accomplished through k-anonymity applied to a combined dataset of original images and a proxy dataset, generating “unrecognizable” versions of faces. In contrast, our method focuses on privacy preservation in image generation by enforcing specific properties on samples from a latent space to retain class information for downstream classification tasks. Furthermore, the scope of our method extends beyond face de-identification, as it is applicable to a broader range of image generation tasks in the medical domain.
- Clarification on Tab. 1. Yes, the table reports the results of phi_down. We will clarify it in the paper.
- Equidistance on Eq. 1. The equidistance property, as described in Eq. 1, aims to prevent the collapse of all trajectory points into a single trivial point, ensuring essential generation variability for the downstream task. While it is not necessary for images along the trajectory to differ significantly, it is crucial that they do not appear identical. Additionally, transitioning between points in the latent space inherently leads to a mixing of visual features from the two points.
- Identity classification with one single image per identity. We apologize for not being clear enough. Our phi_id is already trained in the worst case using only one image per identity.
- Scores in Tab. 2 represent classification accuracies.
Post-rebuttal Meta-Reviews
Meta-review # 1 (Primary)
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The rebuttal has well addressed the concerns raised by reviewers and meta-reviewer, and the clarifications on the comparison against other approaches are clear. The method itself is interesting and novel and the rebuttal resolves the concerns that have been raised. Acceptance is therefore recommended.
Meta-review #2
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The paper proposes a method for generation of diverse synthetic images from the latent space of a GAN in privacy-preserving manner. I believe that the method is novel and I enjoyed reading the paper. The rebuttal does a good job to highlight major differences to [15] and also clarifies the major concerns raised by R4 regarding the lack of comparison to other differential privacy methods and the image quality aspects.
Meta-review #3
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
Pros:
- Topic: Privacy is an important topic especially in combination with generative models that are known to embed real samples.
- Novelty: The proposed formulation to synthesize privacy preserving samples is novel
- Style: The paper is very well written with detailed description of the method, evaluation and procedures. Cons:
- Reference: The literature review is very limitted, missing some important methods.
- Results: The authors claimed that the generated images are of high quality, but FID does not confirm their hypothesis fully. After Rebuttal:
- the authors failed to convince the reviewer gave low score, but to me, the clinical need and novelty is sufficient for a conference paper;
- the two positive reviews are general consistent to acknowlege the contribution of this work