Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

He Li, Yutaro Iwamoto, Xianhua Han, Lanfen Lin, Hongjie Hu, Yen-Wei Chen

Abstract

Anomaly detection using an unsupervised learning scheme has become a challenging research topic. Unsupervised learning requires only unlabeled normal data for training and can detect anomalies in unseen testing data. In this paper, we propose an unsupervised liver lesion detection framework based on generative adversarial networks. We present a new perspective that learning anomalies positively affect learning normal objects (e.g., liver), even if the anomalies are fake. Our framework uses normal and pseudo-lesions data training, and the pseudo-lesions data comes from normal data augmentation. We train our framework to learn to predict normal features by transferring normal and augmented data into each other. In addition, we introduce a discriminator network containing a U-Net-like architecture that extracts local and global features effectively for providing more informative feedback to the generator. Further, we also propose a novel reconstruction-error score index based on the image gradient perception pyramid. A higher error-index score indicates a lower similarity between input and output images, which means lesions detected. We conduct extensive experiments on different datasets for liver lesion detection. Our proposed method outperforms other state-of-the-art unsupervised anomaly detection methods.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_21

SharedIt: https://rdcu.be/cVRY3

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The main contribution of this work, applied to anomaly (tumour) localization, is a new form of data augmentation where lesions are simulated on healthy liver slices in CT. Each simulated lesion is a downscaled, randomly oriented, and jittered liver from some other slice/volume. An ablation study shows that this greatly improves lesion localization. By applying a CycleGAN for image-to-image translation between healthy and diseased images, applying a multi-scale gradient magnitude similarity deviation loss, and applying a UNet discriminator, performance is further improved.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The proposed lesion simulation strategy is novel, simple, and apparently very useful. It should not be difficult to adopt for other tasks.
    • The paper is clear and the ablation study is fairly convincing.
    • The method is compared against multiple anomaly localization methods on two similar datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The training strategy and the exact jittering and other parameters used to simulate lesions are not detailed, which impedes reproducibility.
    • While the data augmentation is novel, the UNet-based discriminator and image-to-image translation for anomaly localization have uncited prior work. The discriminator has been proposed in multiple works, including “A u-net based discriminator for generative adversarial networks” by Schonfeld et al. Anomaly localization works that rely on image-to-image translation between healthy and diseased data include: (1) “Towards annotation-efficient segmentation via image-to-image translation” by Vorontsov et al.; (2) “Visual feature attribution using wasserstein gans.” by Baumgartner et al.; (3) “Pathology segmentation using distributional differences to images of healthy origin” by Andermatt et al.
    • While the proposed method shows great performance, the metric is not defined. It is the AUC of a ROC curve but what is the measure over which you vary the operating point?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Please describe the training strategy, including the optimizer, the optimizer’s hyperparameters, the number of epochs used, the learning rate schedule (if any), and any other details required to reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • How is the AUC computed? What is the measure over which you vary the operating point?
    • Is AUC computed considering per-pixel detection?
    • The anomaly map looks like a pretty good segmentation! Please evaluate the Dice score with respect to the reference segmentation masks so that this fully unsupervised method could be compared to supervised and semi-supervised methods.
    • Please discuss other (tumor) anomaly localization works that rely on image-to-image translation between healthy and diseased data, such as: (1) “Towards annotation-efficient segmentation via image-to-image translation” by Vorontsov et al.; (2) “Visual feature attribution using wasserstein gans.” by Baumgartner et al.; (3) “Pathology segmentation using distributional differences to images of healthy origin” by Andermatt et al.
    • Is the unaltered CycleGAN architecture used in this work?
    • Thanks for the ablation study; please add a test without SSIM and possibly a test with non-multi-scale GMSD.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    A good, simple data augmentation trick that makes a big difference in at least one task (anomaly localization - so possibly useful for segmentation, too) is a valuable contribution to the community because it is easy to adopt.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    “The existing works mentioned by reviewers use healthy and real diseased data (semi-supervised, weakly supervised), whereas we proposed an unsupervised method and use healthy and pseudo-diseased data, without providing the ground truth images while training.”

    • The proposed method still relies on image-to-image translation between two domains, normal (healthy) and pseudo-diseased (diseased), to help with anomaly localization, similar to the uncited works I mentioned before. The proposed method is similar to this prior work but differentiates itself by generating pseudo-diseased images instead of using real diseased images. This branch of image-to-image GAN based anomaly localization is ignored. The prior work is similar enough that it should be referenced, with the key differences highlighted.

    • It would be interesting to compare (a) training with pseudo-diseased images to (b) training with real diseased images or (c) training with a combination of both. Although I expect that pseudo-diseased images are not as realistic as real diseased images, I suspect that they are realistic enough that performance would be c > a > b, since many more pseudo-diseased images can be generated than diseased slices could be extracted.



Review #2

  • Please describe the contribution of the paper

    This paper proposes an unsupervised liver detection method. Unlike standard unsupervised anomaly detection models which are based on autoencoders trained on normal images only, the authors propose to use an auto-encoder like architecture that can reconstruct lesion-free images from input lesion image. As for standard anomaly detection models, the reconstruction error between the input and output data outlines the lesion localisations. The first step is to train a cycle-GAN like architecture based on couples of normal and pseudo-lesion images. Pseudo-lesion images are derived by adding “lesions” to the corresponding normal slice. Once the model is trained, the generator enabling to generate lesion-free image serves as for the anomaly segmentation model. The other generator of Cycle-GAN model serves for data augmentation. The proposed architecture is evaluated on different datasets : The LiTS public dataset containing 131 CT scans and a private dataset of 90 CT scans.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -An ablation study is performed to evaluate the contribution of the different loss terms. -Comparison with state-of-the art method is performed

    • Use of the gradient magnitude similarity deviation (GMSD) in both the consistency loss term and for the computation of the reconstruction error index.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -The idea to use a Cycle-GAN model to translate normal to pathological data and vice-versa was proposed in Sun et al (JBHI 2020) for neuroimaging application to the segmentation of glioma based on the BraTS dataset. This reference should be added and discussed. -Some important methodological details are lacking: -1) The use of a UNET based discriminator should be clarified. It is not clear from Fig 1 what the output of this discriminator is (UNET-like architectures usually output images with dimension similar to the input image input…which is different from standard GAN discriminators outputting label (true or fakes). The authors mention “This enables the discriminator to learn both global and local differences between real and fake images” but it is not clear how. The wording of the whole section 2.2 should be reworded as it contains typos impacting the understanding. 2) The authors should provide the backbone architectures of the different networks (generator, discriminator) 3) A clear definition of the loss terms, including the general GAN loss (different variants exist) and discriminator loss, should be provided. As stated above, introduction of the GMSD consistency loss term is interesting but requires clarification regarding its differentiability, for instance, which is not obvious. 4) GAN training is likely to be challenging, the authors should clarify stopping criterion and describe the validation dataset, if any.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors answered that they will make the code available which is not mentioned in the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    -Please see above

    • Some details regarding the implementation of the SOTA algorithms (eg fANOGAN etc should be provided) at least in a supplementary section to improve the soundness of the comparison. -The rationale of the method proposed to create pseudo-lesion creation should be motivated. Why did the authors not consider more simple shapes, eg spherical lesion? These pseudo-lesion have normal liver pattern, how do these patterns compare to lesional patterns? -The authors should detail if the AUC is computed at the voxel level and if some kind of processing is performed on the reconstruction maps (eg clustering, removal of small clusters etc..)
    • The paper should be proofread by a native English speaker.
    • Fig 1 is not correct, should X and Y should look similar, ie have the same background?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty is moderate, the paper lacks too many methodological details, which impairs the soundness of the study.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The paper describes a data augmentation approach to improve lesion detection. By learning to generate liver CT images with lesions from non-lesion images (via CycleGAN), a UNet segmentation network is trained to perform anomaly detection, i.e., detect lesions regions from the input images. Evaluation is performed on public (LiTS) and proprietary datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The key strength of the paper is the clever application of unsupervised learning/unpaired image translation techniques to lesion identification. The approach can also be extended to other region identification tasks in medical imaging, beyond lesions. The method is compared to several popular anomaly detection frameworks, and a detailed ablation study of various components of the system is presented.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It seems like the main weakness of the approach is the lack of detailed insight into when the method will succeed and will fail. It would be important to report statistics and examples of failure and success cases of lesions the method can handle. Please see detailed feedback section.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Most details of the networks and datasets are available. Please also include details about hyperparameters, and mage pre-preprocessing steps for both CycleGAN and the main anomaly detection network.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    An inherent limitation of the cycleGAN method is that it may not learn realistic lesions, or may learn a very polarized distribution of outputs. It would be important to analyze and discuss the types/sizes/ characteristics of lesions that can and cannot be detected with the proposed approach. It also seems like the the liver regions that are input to the algorithm are segmented from the CT. It would again be important to explain that the method is a proof-of-concept, and addresses a specific problem in the more general pipeline for lesion detection.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper shows how to use an unsupervised image translation technique (CycleGAN) to address the lack of paired data for accurate lesion detection. The AUC and other results show a demonstrated improvement of the proposed technique over other anomaly estimation baselines.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The reviews raised a number of missing literature and comparisons that may be helpful to include. While there exist other works that apply unpair image translation in the medical image domain, the proposed method seems to be the only one that uses healthy->pathology translation as a form of data augmentation. In addition, the use of local, pixel wise discriminator is interesting and novel.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper introduces a cycle-GAN based image translation as a data augmentation method. This is applied to adding lesions to health images. All reviewers commented positively on the technical contributions, novelty, ablation studies and comparisons to SOTA methods. Similarly, all reviewers were concerned about the lack of technical details in parts of the paper. A particular concern was the details about the U-net discriminator and lack of discussions of existing work in this direction. Another reviewer criticized the lack of detailed insights in the paper. While the reviewers identified similar strengths and weaknesses, they weighed them differently with 2 reviewers leaning to accept and one to reject. The authors should address concerns about the U-net discriminator in particular, lack of details/insights/links to existing works and other concerns.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5




Author Feedback

Thank you for your valuable comments. Our itemized responses to the questions are as follows:

Comment 1: How is the AUC computed? (Reviewer 1, Reviewer 2) Response: The detection is based on the anomaly score (a measure to classify the image into normal or abnormal), which is defined in Eq (7). of our paper. The AUC is computed by adjusting the threshold of the score.

Comment 2: Please evaluate the Dice score so that this fully unsupervised method could be compared to supervised methods. (Reviewer 1) Response: Thanks for your valuable comments. By binarizing the detection results, the proposed method can be considered as a fully unsupervised segmentation method. The Dice score of our method is 0.435, which is a comparable result with the Dice score (0.478) of U-Net (U-Net is a widely used supervised method for medical image segmentation)

Comment 3: Please discuss other (tumor) anomaly localization works that rely on image-to-image translation between healthy and diseased data. (Reviewer 1) Response: The existing works mentioned by reviewers use healthy and real diseased data (semi-supervised, weakly supervised), whereas we proposed an unsupervised method and use healthy and pseudo-diseased data, without providing the ground truth images while training.

Comment 4: The authors should provide the network backbone architectures and hyperparameters. (Reviewer 2, Reviewer 3) Response: We would like to apologize for not including the above-mentioned information in the paper because of the space constraints. If the paper is accepted, we will add these details in the supplementary material.

Comment 5: The use of a U-Net-based discriminator should be clarified. (Reviewer 2) Response: The conventional discriminator performs an image-based two-class classification. It extracts features from the whole image and classifies it as either fake or true. The conventional discriminator uses only global information, while our U-Net-based discriminator performs pixel-wise classification for local discrimination and uses both global and local information for discrimination.

Comment 6: The rationale of the method proposed to create pseudo-lesion creation should be motivated. Why did the authors not consider more simple shapes, e.g., spherical lesions? (Reviewer 2) Response: We have performed both experiments using a simple spherical pseudo-lesion and the proposed irregular shape lesion. The detection accuracy of our proposed method is much better than the simple spherical shape because the lesion is not perfectly spherical, and our method is robust to the irregularity of lesions.

Comment 7: Did you perform any processing performed on the reconstruction maps (e.g., clustering, removal of small clusters)? (Reviewer 2) Response: We have not performed post-processing on the reconstruction maps.

Comment 8: Fig 1 is not correct, should X and Y should look similar, i.e., have the same background? (Reviewer 2) Response: The reviewer’s comment is correct. We wanted to showcase that our proposed method is an unpaired image-to-image translation method in training, where X represents normal datasets, and Y represents pseudo-lesion datasets. We will get it modified as per the reviewer’s comment.

Comment 9: It would be important to analyze the different types of lesions that can and cannot be detected with the proposed approach. (Reviewer 3) Response: Thanks for the reviewer’s comment. The AUC results for different types of lesions are as follows: cysts is 0.904, focal nodular hyperplasia (FNH) is 0.578, hepatocellular carcinoma is 0.817, hemangioma is 0.790, and metastases is 0.863. The AUC results for different sizes of lesions are as follows: lesion’s diameter equal to or greater than 5cm is 0.895, lesion’s diameter less than 5cm is 0.651. Improving the detection accuracy for FNH and small lesions will be focused on in our future research work and the results will be included in our final paper.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Two of the reviewers remain positive about the paper after the rebuttal. They point out that there is novelty to the work and they also mention the strength of the experiments section. The reviewer who voted to reject did not update their review after the rebuttal; however, their main criticisms were about lack of details and clarity which I think the rebuttal has mostly addressed.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I am happy that the authors answered the questions

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper proposed to use the CycleGAN with unsupervised image translation for anomaly lesion detection. The experimental results on AUC and other metrics have demonstrated significant improvements of the proposed method over other anomaly detection methods (more anomaly detection methods can be compared). Given the technical contribution and effective performance, I vote for acceptance, although more details and rationale on pseudo lesion generation should be revised in the final version. In addition, the title could be revised to reflect the technical contribution.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8



back to top