List of Papers By topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Shawn Mathew, Saad Nadeem, Arie Kaufman
Abstract
Automated analysis of optical colonoscopy (OC) video frames (to assist endoscopists during OC) is challenging due to variations in color, lighting, texture, and specular reflections. Previous methods either remove some of these variations via preprocessing (making pipelines cumbersome) or add diverse training data with annotations (but expensive and time-consuming). We present CLTS-GAN, a new deep learning model that gives fine control over color, lighting, texture, and specular reflection synthesis for OC video frames. We show that adding these colonoscopy-specific augmentations to the training data can improve state-of-the-art polyp detection/segmentation methods as well as drive next generation of OC simulators for training medical students. The code and pre-trained models for CLTS-GAN are available on Computational Endoscopy Platform GitHub (\url{https://github.com/nadeemlab/CEP}).
Link to paper
DOI: https://link.springer.com/chapter/10.1007/978-3-031-16449-1_49
SharedIt: https://rdcu.be/cVRXo
Link to the code repository
https://github.com/nadeemlab/CEP
Link to the dataset(s)
N/A
Reviews
Review #1
- Please describe the contribution of the paper
The paper presents a method for creating synthetic colonoscopy images using CT imaging and that the authors term a CLTS-GAN. the CLTS-GAN is trained using an unsupervised approach, it allows texture and lighting to be disentangled. The authors also demonstrate the utilty in using the CLTS-GAN to improve segmentation in colonoscopy images
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- impressive augmentation results that create realistic looking colonoscopy images, generated by using a image to image translation framework that can transfer textures from real colonoscopy images to synthetic ones created by 3D modelling.
- the authors present a clear clinical task and 2 routes for use in training and in improving segmentation methods
- well thought out unsupervised method making use of a lot of useful concepts for applying textures and lighting to colonoscopy images
- adversarial loss
- cyclic loss
- identity loss
- novel architecture allow lighting and texture disentanglement
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- training and implementation specifics are limited. what preconditioning was done? What frameworks were used to impliment the methods
- the algorithm training explanation could be clearer, particularly figure 1
- the main advance is that texture and lighting are effectively transfered onto virtual colonoscopy frames, this is small advance on the styleGAN. The novelty above that could be better explained
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The dataset and the code will be released. The methods have been well described. So the work should be reproducible. The dataset is small sot the performance may differ with different imaging.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- figure 1: F and G should have arrows in and out. It can be confusing to understand the flow
- for simulation of colonoscopy do the augmentations need to conform to physical constraints? if a video fly through of a virtual colon is made do the video and individual frames look to be a coherent set?
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
6
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- a clear clinical need is identified
- an interesting method for transferring lighting and texture style separately from real to synthetic images
- good results
- manuscript is mostly well written but some parts could be more clear.
- Number of papers in your stack
5
- What is the ranking of this paper in your review stack?
3
- Reviewer confidence
Somewhat Confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
7
- [Post rebuttal] Please justify your decision
The authors addressed my concerns
Review #2
- Please describe the contribution of the paper
This paper proposes a one-to-many image-to-image data augmentation model for colonoscopy whereby image attributes such as colour, lighting, texture and specular reflexion are controlled by means of 1D vectors and 2D matrices.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Control of features for data augmentation using the proposed approach generating good results
- Evaluation of performance on other models by introducing augmentations resulting from this work showing improved results
- Code will be available upon publication
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Overall the mathematical description supported with Fig. 1 intuitively is correct but I think there are few mistakes that make the reading very confusing. I will list some of this in the detail comments below.
- It is unclear from the paper how the user will get satisfactory results when sampling randomly from a uniform distribution
- While differences can clearly be appreciated for colour and lighting, attributes such as texture and specular reflexion are not noticeable in the examples shown in the paper. I think the paper lacks a quantitative approach to measure this.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Source code will be available upon publication
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- When creating the VC data similar to [18], is this by using Blender? if not please describe the details of how this is done Also, it is unclear what the inverse square fall-off property is for or where you modify this. The reason there but what does it mean more accurately and why
- Mathematical description, Fig. 1 and text:
- In second sentence of Sec 4, isn’t it related to G rather than F? and later on F instead of G?
- Define G_im, G_cl and G_ts in Eq 2
- “The discriminator has the network use random …” is very confusing
- After Eq 7, “G may ignore Z_ts” is confusing since G only takes OC images as input
- Define I in L_text
- “Two VC images are passed to F” is confusing since F only receives one
- In the definition of L_t, shouldn’t it be F rather than G? F does receives as inputs the image, z_cl, and z_ts
- When setting values to parameters, did you mean lambda cyc rather than lambda T? But overall, how are these values found/set? are there any experiments to support their choice?
- Define VC at the beginning of Sec 3
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The definition of the model is at times very confusing between figure, text and equations
- Number of papers in your stack
4
- What is the ranking of this paper in your review stack?
4
- Reviewer confidence
Somewhat Confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
Not Answered
- [Post rebuttal] Please justify your decision
Not Answered
Review #4
- Please describe the contribution of the paper
The authors propose a method for the augmentation of colonoscopy images by changing color, lightning, texture and specular reflections of the original images which is realized by controlling noise parameters for color/lightning and texture/specular, respectively. The framework maps optical colonoscopy images (OC) to virtual ones (VC), extracting color/lighting and texture/specular feature representations (G). A second generator (F) generates OC from VC using color/lighting and texture/specular noise. The framework is trained end-to-end using a combination of GAN loss, cycle consistency loss, and additional regularization losses. The authors evaluate their method in the context of data augmentation for three datasets.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The authors present a sophisticated and flexible framework for manipulating colonoscopy data. The proposed framework allows a user to gain control over the relevant image capture settings after acquiring the videos and map from one image to many other appearances. Not only the data augmentation but also the training application has great potential.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
As the augmentations are performed on a frame-level, temporal consistency of the frames cannot be guaranteed. Therefore, the augmentation technique is only valid for frame-based downstream tasks. Especially for training applications, the temporal coherence of the generated video is crucial for realism. This is only rudimentally discussed in the paper.
The proposed framework has a lot of potential; however, the authors only evaluate one specific use case of the method. A more extensive evaluation would strengthen the work. Furthermore, a discussion of the different options how to use the framework for augmentation (last comment in the detailed feedback section) and also for training applications and colonoscopy simulation (using the VC data as reference) would clearly improve the presentation of the work.
Even though the method is very interesting and relevant, the presentation of the proposed method is lacking clarity (also because there are many different options how to use the framework and the presented approach is very complex). Instead of showing a lot of different applications of the framework without context, it might have been better for the limited space of a MICCAI submission to focus on one application and explain it in more detail.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The authors promise to publish the code and pretrained models upon acceptance. With only the explanations from the paper it would be very complicated to implement the presented approach.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
The abbreviation “VC” is never explained (I assume it is virtual colonoscopy).
The authors should make clear already in section 3 why the VC frames are needed and what they are used for. While reading the paper it is confusing that they are introduced in section 3 without giving any context.
The authors should explain what is the “number of matrices” in z_ts?
The authors should explain the forward and backward cycle in the paper and how exactly a train step is implemented. Furthermore, it should be explained how exactly the non-corresponding OC and VC frames are used in the framework and how they are fed to the model in the training process.
How do the authors determine the number of epochs? How is the quality of the generated samples assessed?
There is a typo in the caption of figure 4.
The authors should explain why a novel dataset based on VC is introduced even though there are no corresponding ground truth images available. Why did the authors not use an existing synthetic dataset?
The authors illustrate different use cases of the proposed method in qualitative figures, which include:
- passing an input image to F and random sample z_ts and z_cl (figure 4)
- extract z_cl from reference images with G and then passing it together with an input image to F (figure 3)
- colon-specific color and lighting, while polyp specific textures and speculars like it is used in the evaluation of the presented work
- Furthermore: combining input images and CLTS latent vectors from different datasets (figure 4) would also be possible
- … However, only one specific configuration is evaluated and discussed in the paper. The authors should at least discuss the different options of using their framework for augmentation.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
4
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The authors present and interesting and flexible approach for the manipulation of OC images. The paper is well written, but hard to follow (also due to the complexity of the presented approach). A more structured way of presenting the use cases of the proposed framework would improve the paper.
- Number of papers in your stack
7
- What is the ranking of this paper in your review stack?
3
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
5
- [Post rebuttal] Please justify your decision
The rebuttal addresses most concerns raised by the reviewers and the authors promise to improve the figures and parts of the manuscript to make the explanations clearer. While there are still some shortcomings in the presentation of the proposed method, the topic is interesting and novel. Additionally, for better understanding I recommend that the authors mention already in the abstract that the proposed CLTS-GAN is a CycleGAN variant.
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
The paper proposed a method for creating synthetic colonoscopy images where image attributes like color, lighting, texture, specular reflection can be augmented in a controlled way. The idea is interesting and the paper presentation somewhat clear. Clarity and more structured presentation is needed as the paper is hard to follow. I invite the authors to submit a rebuttal to address the comments raised by the reviewers.
- What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
5
Author Feedback
Thank you to the reviewers (R1, R2, R4) and the area chair for giving us the opportunity for rebuttal.
We presented CLTS-GAN for generating disentangled variations/augmentations of color, lighting, texture, and specular reflection attributes in individual optical and virtual (CT) colonoscopy frames. These augmentations were then shown to improve the performance of the SOTA model in the most clinically important polyp detection/segmentation use case.
Since CLTS-GAN augments the above 4 attributes at the individual frame-level, the purpose of showing virtual colonoscopy (VC) frame augmentations was simply to allude to the fact that large real optical colonoscopy variations (OC) can be generated from coarse virtual colonoscopy mesh renderings. Relevant to the polyp detection/segmentation use case, artificial protrusions to simulate flat and pedunculated polyps can be added to the VC mesh and then translated to real OC variations via CLTS-GAN, again for the explicit purpose of data augmentation. Since there is no temporal component in our current model, we can not generate temporally consistent OC simulation for VC frame sequence (like OfGAN [22]). Adding the temporal component is left for future work.
Implementation details, libraries, and preconditions will be fully described in the released code. Number of epochs and weights for the model were based on the hyperparameters reported in the CycleGAN and the XDCycleGAN [17] papers. As per R1’s request, arrows will be added to Figure 1 to clarify the flow and boxes will be added to highlight the changes in texture and specular reflection (please also see supplementary Figure 3 for large variations in texture/specular reflection).
For R2, the VC dataset was created in Blender and the inverse square fall-off property (that can be turned ON in Blender) allows for more physically realistic lighting, as explained in [15]. The comment about how users will get satisfactory results when sampling randomly from uniform distribution can be clarified for the testing phase: the testing distribution is the same as the one used in training.
For R4, unpaired (OC-VC) image-to-image translation via CycleGAN variants handles the complexities in real OC images much better than the models trained solely on synthetic datasets (with paired/ground truth images) or solely on OC datasets, as shown in [17]; this merits the use of unpaired OC-VC dataset in this paper. In this paper, we focused on the most immediate clinical use case of polyp detection/segmentation in which only color/lighting augmentation was used (texture augmentation modifies the polyp features and degrades performance). Other use cases (that are not of immediate clinical relevance) are depth inference [15,17] and folds detection [16]. We hypothesize that the full gamut of color-lighting-texture-specular augmentations can be used in these scenarios to improve performance. “Number of matrices” in z_ts are the 2D matrices sampled from a uniform distribution. For R2,
In the second sentence of Sec 4, isn’t it related to G rather than F? and later on F instead of G? The sentenced mention in section 4 has a typo. The F and G are swapped.
Define G_im, G_cl and G_ts in Eq 2 G_im, G_cl, and G_ts represent the individual outputs of the network, G.
“The discriminator has the network use random …” is very confusing This can be replaced with “The discriminator compares random noise vectors and vectors produced by F.”
After Eq 7, “G may ignore Z_ts” is confusing since G only takes OC images as input Typo, G is supposed to be F
Define I in L_text I denotes the images
“Two VC images are passed to F” is confusing since F only receives one This can be clarified: “F is applied to two different images, I.“
In the definition of L_t, shouldn’t it be F rather than G? F does receives as inputs the image, z_cl, and z_ts Yes, this is a typo. It will be changed to F
Define VC at the beginning of Sec 3 VC is virtual colonoscopy
Post-rebuttal Meta-Reviews
Meta-review # 1 (Primary)
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The authors have addressed the major concerns raised by the reviewers. I recommend an accept. Reviewers comments and changes promised by the authors in the rebuttal must be addressed in the camera ready.
- After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.
Accept
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
4
Meta-review #2
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The rebuttal has convincingly addressed the main criticisms, in particular from R4. All three reviewers acknowledge the merits of the paper and the AC concurs that this is a useful, timely and relevant work for MICCAI, which can be accepted.
- After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.
Accept
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
NR
Meta-review #3
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
Reviewers including myself think that the novelty of the approach is interesting and the qualitative results very convincing. They also vote for acceptance for the paper after considering the author’s comments, which I also follow.
- After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.
Accept
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
4