Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Fangda Li, Zhiqiang Hu, Wen Chen, Avinash Kak

Abstract

Immunohistochemical (IHC) staining highlights the molecular information critical to diagnostics in tissue samples. However, compared to H&E staining, IHC staining can be much more expensive in terms of both labor and the laboratory equipment required. This motivates recent research that demonstrates that the correlations between the morphological information present in the H&E-stained slides and the molecular information in the IHC-stained slides can be used for H&E-to-IHC stain translation. However, due to a lack of pixel-perfect H&E-IHC groundtruth pairs, most existing methods have resorted to relying on expert annotations. To remedy this situation, we present a new loss function, Adaptive Supervised PatchNCE (ASP), to directly deal with the input to target inconsistencies in a proposed H&E-to-IHC image-to-image translation framework. The ASP loss is built upon a patch-based contrastive learning criterion, named Supervised PatchNCE (SP), and augments it further with weight scheduling to mitigate the negative impact of noisy supervision. Lastly, we introduce the Multi-IHC Stain Translation (MIST) dataset, which contains aligned H&E-IHC patches for 4 different IHC stains critical to breast cancer diagnosis. In our experiment, we demonstrate that our proposed method outperforms existing image-to-image translation methods for stain translation to multiple IHC stains. All of our code and datasets are available at https://github.com/lifangda01/AdaptiveSupervisedPatchNCE.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_61

SharedIt: https://rdcu.be/dnwKh

Link to the code repository

https://github.com/lifangda01/AdaptiveSupervisedPatchNCE

Link to the dataset(s)

https://github.com/lifangda01/AdaptiveSupervisedPatchNCE

Reviews

Review #1

Please describe the contribution of the paper

The paper introduces Adaptive Supervised PatchNCE (ASP) to directly deal with the input to target inconsistencies in a proposed H&E-to-IHC image- to-image translation framework. Also, the Multi-IHC Stain Translation (MIST) dataset is introduced which contains aligned H&E-IHC patches for 4 different IHC stains.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. A novel loss for image to image transformation is introduced;
2. The loss can automatically recognise patch locations that are inconsistent and adapt the SP loss so that the severely inconsistent patch locations will have lesser effects on training;
3. Releasing of a dataset for the community;
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. It’d be great to see some confidence intervals / variation in the evaluation metrics, if possible? Maybe some sort of cross validation?
2. Ideally’ it would be great to have some pathologists’ assessment of the results.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Since all the code and datasets will be made public, there are good chances for being a reproducible paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. It’d be great to see some confidence intervals / variation in the evaluation metrics, if possible? Maybe some sort of cross validation?
2. Ideally’ it would be great have some pathologists’ assessment of the results.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Interesting novel loss and release of a dataset for the community;
Reviewer confidence

Somewhat confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

This paper proposed a novel loss function - adaptive supervised patchNCE to deal with the inconsistencies in imperfect HE-IHC image pairs. It also conducted weight scheduling to mitigate the negative impact of noisy supervision. At last, a MIST dataset is proposed which contains 4 groups of aligned HE-IHC pairs.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Clear description of technical challenges and their motivation. The proposed ASP loss is suitable and useful for such situations.
2. Public datasets which contains 4 different IHC stains which would improve the analysis of the related field.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Lack of visualization of other baselines. It is not enough to only present metrics to show the advantage on the task of image generation.
2. Limited clinical significance. Generated IHC images help little to pathologists due to their low quality and interpretability.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

It is reproducible since it is clearly described and code will be public.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. Please give some example of other methods. It is important to present your results with others visually for a straightforward comparison.
2. I want to know whether the proposed loss function improves consistently on other networks. For example, apply the loss function on PyramidP2P.
3. In Table 1. CycleGAN performs good in BCI(her2) datasets but very bad in MIST datasets. What makes this large difference?
4. Since the pair of HE-IHC is not pixel-perfect, can existing pair-wise metrics like SSIM really reflect the quality of generated images?
I am open to raise my scores If my concerns are stressed clearly.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

4
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Clear description but insufficient visualization results and experiments.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

The author shows the results can be imporved consistently with different backbones. I think it is an effective method.

Review #4

Please describe the contribution of the paper

The paper proposes a method for translation from H&E to IHC without simulataneous labels on same slides. The use a model image-to-image translation framework and a shared encoder. The method uses contrastive loss to minimize difference between encoding of corresponding H&E and IHC patches and maximizes the difference across other patches. A positive pair comes from same location across sequential tissue slices. Since sequential slices are known to often suffer inconsistencies, they propose a scheduled weighting of pairs of patches based on their embeddings’ cosine similarity. They evaluate proposed method on several datasets using SSIM, Perceptual Hash Value (PHV), FID, and Kernel Inception Distance (KID) scores and show improved performance over other translation methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper adopts the contrastive loss approach to learn to translate between different histopathology modalities when the modalities come from different sequential slices that may not be well aligned at all locations.
- To address the issue of not perfectly aligned modalities, the authors proposed a weighting scheme based on learnt embedding.
- From the qualitative and quantitative results, the proposed approach seems effective.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The method seems to learn that darker tumor regions are all associated with the brown staining. One concern with such approaches is how biologically correct the results are, versus hallucinating qualitatively good images. In the medical domain, the accuracy of the generation is particularly important since analyses, research, and diagnoses could depend on it. The presented evaluation metrics do not assess this since they are not based on pixel wise ground truth or any manual annotation.
- One way to evaluate is a categorical evaluation. For example, divide the patches into none, medium, and high levels of staining. Do a stain deconvolution on the generated and gt IHC, and compare the amount of staining in each category.
- I’m not a pathology expert but it is not clear how well this will generalize when immune biomarkers are used rather than epithelial cells. Since epithelial cells are usually easier to differentiate in the H&E image while different immune cells look the same in H&E but are differentiated based on the protein biomarkers they express.
- Qualitative results are missing comparisons to other baselines.
- Discussion of failure cases, reliability, and biologically correctness would be valuable.
- how much inconsistency is tolerated in the training set
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The methods seems reproducible. The authors also promise to release the code.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Please refer to weaknesses (Q6).
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Please refer to strengths (Q5) and weaknesses (Q6).
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

The rebuttal has addressed most of my concerns. However, I am still not convinced about the failure of categorical evaluation. Also, still I am not sure how well the proposed method will perform when the staining is for immune cells rather than tumor cells.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper introduces an adaptive supervised patchNCE approach to address inconsistencies in imperfect HE-IHC image pairs. The reviewers acknowledged the value of utilizing public datasets with four different IHC stains, appreciated the clear problem definition, and found the writing to be clear and concise. However, they expressed concerns regarding the applicability of the method in real clinical practice, the potential for image hallucination, and the comprehensiveness of the evaluation. For detailed feedback, please refer to the reviewer comments. It is crucial for the author to address these major concerns in order to provide further insights and ensure the method’s practicality and reliability.

Author Feedback

We thank the reviewers for their invaluable feedback. Here we provide our response to the common concerns mentioned in the meta review, as well as the concerns of the individual reviewers. Common Concerns

Applicability. Regarding the applicability of the method in real clinical practice, note that a strong correlation has already been demonstrated between the H&E-stained tissue structures and their IHC +/- labels. For example, [1] showed that the molecular information important for lung cancer subtype classification can be predicted by a CNN directly from H&E-stained slides with pathologist-level accuracy.

Lack of comparisons with other baselines. In the additional space allowed for the final draft, we will include visual comparisons with the images generated by the CycleGAN and Pix2Pix baselines. These comparisons will illustrate the extent of false morphological alterations produced by those baselines. We believe that these hallucinations are a significant reason for why our framework performs much better quantitatively.

The metrics presented. We included SSIM for its popularity in previous works. However, on account of its shortcomings and also to take advantage of the fact that the corresponding regions in a GT pair are highly likely to share the same diagnostic label, our paper included results based on the PHV (Perceptual Hash Value) metric [2] that relies on a pretrained feature extractor to evaluate the generated images at multiple levels in a semantic hierarchy. PHV compliments SSIM, especially in the presence of inconsistent pairs, in two ways: (1) By calculating feature distances at deeper layers (e.g. layers 3 and 4), PHV focuses more on high-level semantics rather than the pixel-level differences; and (2) PHV uses hashing to lower its sensitivity to dissimilarities above a certain threshold, as would be the case for GT pair inconsistencies. Therefore, we believe that the metrics used in our paper are representative of the true correctness of the outputs despite GT inconsistencies. Additionally, as suggested by R4, we considered using categorical metrics such as assigning images into bins based on their percentages of IHC positive cells. However, the results turned out to be over-sensitive to the choices of the thresholds used – for both determining the IHC-positiveness of a pixel as well as the bin assignment for the whole image. Individual Concerns [R1] Cross-Validation: We acknowledge that CV would be a valuable additional test for our framework. Nonetheless, since the improvements made possible by our approach have been demonstrated through the results on five different datasets with four different IHC stains, we hope that the reviewer would see these results as constituting a sufficient experimental validation of our work. [R3] Performance of CycleGAN: As stated in our response to Common Concern 2, CycleGAN is prone to hallucinations caused by (we believe) the absence of paired supervision. Similarly, the low tolerance of the paired supervision in the Pix2Pix baselines towards GT inconsistencies is the reason for their poor performance. [R3] ASP on other networks: We plugged ASP into both Pix2Pix and PyramidPix2Pix and tested them on our MIST-HER2 dataset. For Pix2Pix, the new avg. PHV and FID values are 0.5371 and 124.5, respectively. And for PyramidPix2Pix, the values are 0.5149 and 96.7. Using ASP improves the performance for both frameworks. [R4] Tolerance of inconsistency: Unfortunately, there is no established criterion for quantifying the inconsistencies between the input H&E and the GT IHC images. Such tissue-level inconsistencies, as shown in Fig. 2(a)(c), are rather common in our data (in nearly every GT pair). References [1] Coudray, et al. “Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning.” Nature Medicine 2018. [2] Liu, et al. “Unpaired stain transfer using pathology-consistent constrained GANs.” IEEE-TMI 2021.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper gets three positive review after rebuttal. My previous concerns, as well as the reviewers’ concerns, have been largely addressed.

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The concerns raised by the reviewers have been mostly addressed in the rebuttal, resulting in R3 revising the score positively. However, in my personal opinion, it seems necessary to compare it with more recent image transformation techniques, especially those that preserve the shape while only altering the texture. Since all reviewers gave “Weak accept”, it would be appropriate to accept the paper if the points presented in the rebuttal are incorporated into the manuscript.

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper presents a method to synthesize breast cancer IHC marker images (Ki67, ER, PR, HER2) from H&E input. Since the images are not restained/co-registered and are in fact from neighboring sections with no clear alignment, there is lot of room for hallucination which undermines the clinical utility of this work. The dataset which is another contribution can easily be extracted from other public datasets such as the Breast ACROBAT challenge dataset since these are just roughly aligned crops so the significance of this contribution is low. Moreover, rather than focusing on just one marker and clearly showing the impact of the method similar to the BCI work, Ki67, ER, PR, and HER2 were all thrown into the mix hiding any significant contribution.

back to top

Adaptive Supervised PatchNCE Loss for Learning H&E-to-IHC Stain Translation with Inconsistent Groundtruth Image Pairs