List of Papers By topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Mohamed A. Hassan, Brent Weyers, Julien Bec, Jinyi Qi, Dorina Gui, Arnaud Bewley, Marianne Abouyared, Gregory Farwell, Andrew Birkeland, Laura Marcu
Abstract
Incomplete surgical resection with residual cancer left in the surgical cavity is a potential sequelae of Transoral Robotic Surgery (TORS). To minimize such risk, surgeons rely on intraoperative frozen section analysis (IFSA) to locate and remove the remaining tumor. This process may lead to false negatives and is time-consuming. Mesoscopic fluorescence lifetime imaging (FLIm) of tissue fluorophores (i.e., collagen and metabolic co-factors NADH and FAD) emission has demonstrated the potential to demarcate the extent of head and neck cancer in patients undergoing surgical procedures of the oral cavity and the oropharynx. Here, we demonstrate the first label-free FLIm-based classification using a novel-ty detection model to identify residual cancer in the surgical cavity of the oro-pharynx. Due to highly imbalanced label representation in the surgical cavity, the model employed solely FLIm data from healthy surgical cavity tissue for training and classified the residual tumors as an anomaly. FLIm data from N=22 patients undergoing upper aerodigestive oncologic surgery were used to train and validate the classification model using leave-one-patient-out cross-validation. Our approach identified all patients with positive surgical margins (N=3) confirmed by pathology. Furthermore, the proposed method reported a point-level sensitivity of 0.75 and a specificity of 0.78 across optically interrogated tissue surface for all N=22 patients. The results indicate that the FLIm-based classification model can identify residual cancer by directly imaging the surgical cavity, potentially enabling intraoperative surgical guidance for TORS.
Link to paper
DOI: https://doi.org/10.1007/978-3-031-43996-4_56
SharedIt: https://rdcu.be/dnwP0
Link to the code repository
N/A
Link to the dataset(s)
N/A
Reviews
Review #1
- Please describe the contribution of the paper
This paper discusses the deployment of Mesoscopic Fluorescence Lifetime imaging (FLIm) and machine learning to identify residual cancer in the surgical cavity during transoral surgery. The use a one class classifier called Generalized One-class Discriminative Subspaces (GODS) that can recognize cancer samples as an anomaly. With this, they also show an image overlay system that can help visualize the location of detected residual cancer for additional resection by the surgeon.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-A novel approach is presented for residual cancer detection that seemingly takes advantage of existing surgical infrastructure. -The use of GODS for detecting cancer (as an anomaly) is very interesting. I personally have not read of something like this before -The experiments and conclusions of the papers are easy to follow and the paper is well written. A supplementary document is also provided which is useful -The visualization provided (particularly in Fig 2) demonstrates what can be considered a cancer map for the surgeon and I think this extension of the classification and imaging will have significant clinical impact
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
-Some parts of the methodology are unclear / difficult to follow: There are a few points that leave the reader with questions. For example, in Figure 1, the order of each step is unclear, and I am confused if the final image overlay is the result of the invivo or exvivo tissue scan. On page 3, there is a mention that they use the cumulative distribution transform to transform the resence decay curves, is this standard for this type of data? Why was this chosen?
- The overall picture is missing a bit: I like how the authors extend the contribution of GODS classification of this imaging to the overlap system but I’m missing the whole story. I’m left with questions like - did the operator have to teleoperate an exhaustive scan of the cavity? How long did this take? In the image, only a small area is annotated, is this sufficient for margin characterization? How do we characterize “uninvolved benign tissue”?
- Results and baselines: I think the comparison to a binary classification model would fit better in the manuscript (rather than the supplemental material). As this seems to be the natural choice for a 2 class classification problem (cancer vs. healthy tissue), I think this would help justify the need to use GODS. I am also curious if there are methods for data augmentation with FLIm, could this help improve the class imbalance? -Missing some relevant background information and details: The background of this paper is strong, but I think it’s missing a reference to other cavity scanning modalities and techniques. The justification for FLIm specifically is unclear – what is the advantage this specific modality for this specific surgery? -Results and contribution claims: It is mentioned that the test set contains both healthy and cancerous samples but what is the distribution of this data? Without this detail it’s hard to say if the reported accuracy, sensitivity and specificity indicate that this classification approach is viable or a good choice. Again, a comparison to binary classification and data augmentation to overcome class imbalance would also help indicate the significance of GODS.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
This paper does not seem to be easily reproducible because it does not appear that the data is available (likely beyond the contributors control). The robot used is described but the software used to create the visualization is not discussed. The GODS network is also explained but there is not reference to an existing implementation or the code used.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
Overall, I think this manuscript shows a very interesting approach for margin detection in TORS. In the weaknesses section, I’ve outlined a few questions that should be clarified if the manuscript is accepted.
One significant change that would be nice to see is better justification as to why GODS makes sense for this problem. In the motivation and background it seems that class imbalance is the only challenge associated with FLIm imaging. There are solutions available to address class imbalance that do not rely on one-class classification. It would also be nice to see a comparison of the reported accuracy and results to any existing literature. Are simple linear models sufficient for this kind of problem?
Another suggestion would be a paragraph at the end of the introduction that helps illustrate the idea for the whole system. How is this robotic margin detection system going to look? Why would this benefit the surgeon compared to standard pathological assessment?
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Overall, I think that this paper demonstrates a novel classification approach for cancer detection in TORS while showcasing how clinical implementation would look for such a system. I selected weak accept because I think with a few modifications (such as those suggested in the weakness section) the manuscript can be strengthened and will add value to the MICCAI proceedings.
The biggest issue I see is the lack of a clear motivation for using GODS, and proper comparisons to baseline binary classification approaches with methodology in place to address class imbalance. I also think that a high-level description of the overall vision for the full system.
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
6
- [Post rebuttal] Please justify your decision
I believe that this paper should be accepted given the changes and rebuttal response. The authors explain some much-needed details about the methodology. Upon reading the other review comments it seems that this was the most mutual concern. The author’s responses to my other comments answer my questions as well.
With this, the manuscript would greatly benefit from these clarifications being added and addressed in the text. The authors do not clearly indicate whether or not they intend to revise the manuscript (rather than the last response to R4 and the last sentence) to better clarify the points addressed by the reviewers. The meta-reviewer makes an excellent point about the registration method used for in-vivo to ex-vivo samples which should also be addressed in the revised manuscript.
Review #2
- Please describe the contribution of the paper
The authors used label-free fluorescence lifetime imaging (FLIm) to visualize surgical cavity and detect residual cancer during Transoral Robotic Surgery (TORS). Due to the highly imbalanced data in the surgical cavity (more healthy tissue than cancer tissue), they developed an anomaly detection model. The anomaly detection model is a Generalized One-class Discriminative Subspaces (GODS) classification model trained with healthy surgical cavity tissue only. Their GODS model was able to detect all three patients with residual cancer. They compared their GODS model with one class SVM and robust covariance model and found better performances in GODS.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The authors proposed a one-class classification model for residual cancer detection in the surgical cavity using fluorescence lifetime imaging during Transoral Robotic Surgery. The paper is clear and well written with good experimental details for reproducibility, although still missing some. It demonstrated the clinical feasibility for FLIm in surgery.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Pathology labels provide ground truth for training the GODS model. Therefore, measuring the accuracy for mapping pathology labels to in vivo images in the surgical cavity is crucial. The authors didn’t conduct such experiments.
- There is a lack of experimental and method details. 2.1 It is not clear how many FLIm points are used for training, validation, and testing, respectively. 2.2 How were the aggregated pathology labels spatially registered to in vivo images in the surgical cavity? 2.3 What was the penalty factor in GODS model? 2.4 What are the requirements for computation resources? How long does it take for prediction of one point and one image?
- Sensitivity and accuracy are not good evaluation metrics for imbalanced dataset.
- The authors should compare GODS with a deep learning-based model. The two other models compared are also statistical models.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The paper gives good amount of details about the experiments. However, there is a lack of experimental and method details. 1.1 It is not clear how many FLIm points are used for training, validation, and testing, respectively. 1.2 How were the aggregated pathology labels spatially registered to in vivo images in the surgical cavity? 1.3 What was the penalty factor in GODS model? 1.4 What are the requirements for computation resources? How long does it take for prediction of one point and one image?
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- The authors used novelty detection, anomaly detection, and outlier detection interchangeably in the paper. It will be easier to follow if they used one name.
- The paper can be better organized. For example, ‘The CDT of the concatenated decay curves is computed as follows … of the normalized CDF [13].’ should be moved to where CDT was first mentioned.
- The authors should talk about the time for acquiring images to generating predictions during surgery and whether it meets the clinical needs.
- The authors should talk about the clinical need for sensitivity and specificity. Does the GODS model meet clinical need?
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
6
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper is clear and well written with good experimental details for reproducibility, although still missing some. The classification model they proposed worked reasonably well. The results can be improved by using better metrics for imbalanced data and comparing with a deep learning-based model.
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #4
- Please describe the contribution of the paper
Summary: This paper aims to address the difficulties in creating FLIm-based classification models to identify residual cancer at resection margin in vivo by proposing a label-free method. The choice of a semi-supervised GODS method allows the authors to utilize a very imbalanced dataset and learn the tissues “normal” and “abnormal” FLIm signatures. This model is then applied to invivo data to detect residual positive margins by detecting cancer as “abnormal” .
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Impact / Clinical Feasibility : There is great importance of determining patients with PSM in real-time. To reduce the amount of recurrence is extremely important and this work could be a great guide for addressing this problem.
- Great work doing the augmentation to lay it over the field of view
- The figures are insightful, Figure 1 being a really great way of displaying all the different data types used in this study.
- Simplicity: The paper uses methods that are easily repeatable by other studies and also has presented a method that requires no positive cancerous data which starts to address the common issue of class imbalance in cancer datasets.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
Weakness:
- Cohort size: I have concerns about the cohort size, more specifically the very small number of positive patients used in this study.
- Cross Validation/ Train&Test definition: It is unclear how the patients were separated during training or how the results are reflected from the leave-on-patient-out cross validation. It is also unclear what exactly you are testing for? Again in table 1, it seems that there are 15 patients with PSM, but then the test set seems to have only 3 patients so is the testing data for the invivo detection or for the exvivo tissue labelling? Are all of those 15 patients used for the one class training even though they might have cancerous pixels?
- Results : Are the accuracy and sensitivity metrics calculated per FLIm point? As you mention in section 3, all models detected the presence of residual tumors but the metrics are for the entire surface - therefore I understood this as metrics related to positively identifying pixels on the invivo mask. If this is the case, please add a sentence to clearly identify that.
- Table 2 measures the Acc for the three positive patients. Were those patients in the leave-one-out cross validation set? If so, they have already been seen and therefore this table does not represent a true test set.
In general, it took a few reads of several paragraphs in the methods and results sections to understand what data was being used when and how. It is great that there are several different data types that were acquired for this study to help enlarge the dataset, but there needs to be either a table or better clarity around the data and patient flow.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
This study is relatively reproducible. They have made no indication that their data will be made public but given the tools and methodology described, the data could be recreated at another institution. The biggest weakness in the reproducibility seems to be the lack of information regarding the division of the data.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
Specific Comments:
- Introduction: the author writes “Our proposed approach identified all patients wit PSM compared to the IFSA reporting a sensitivity of 0.5”. After reading the discussion, I understand that you mean that IFSA reports a sensitivity of 0.5 but on first pass, it sounds as though you are claiming that the proposed approach has a sensitivity of 0.5, which is not a great metric and raises some questions.
- Table 1 : This clearly contains important information but is difficult to read. It looks as though there are both 7 and 19 patients with base of tongue data points. I assume one of them is for exvivo tissue scans and others for the margin scans ? The set up / titles of this table need to be reworked.
- Section 2.2 : “Human patients enrolled in this study after obtaining “ typo - patients were enrolled
- Section 3: “average sensitivity of 0.75±0.02 (see Fig. 2). “ type - I think you meant table two
- Figure 2: If the authors have the time and the data, it would be great to have the false positives and false negatives added to this figure. It would help visualize the problems explained in the discussion section and would also demonstrate to readers how these scans were labelled. Personally I feel it would also help to give the columns and rows titles in the figure so that a reader can quickly grasp what is being displayed.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
4
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
This paper is at first pass well written and easy to follow. They address important topics both clinically and technically such as imbalanced datasets and the invivo recognition of positive margins. However, their test set on positive patients is very small (N=3) which is a small set from which to draw strong conclusions. Additionally, as you get to the methods and results sections, the data flow and separation during training/ validation and testing is very confusing and makes it difficult to properly interpret the result metrics claimed. With reworking of these sections and the reviewer comments I think the manuscript could be much improved.
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
6
- [Post rebuttal] Please justify your decision
The authors have addressed all of our comments. There are still concerns about the size of the cohort and therefore the weight of the accuracy and performance of the model on the positive cases (n=3) but that is not addressable at this time.
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
This paper presents a novel method to detect remnant tumors in patients undergoing TORS. The authors propose to use the GODS model to classify the remnant tumor. The paper addresses a clinically significant problem and can potentially improve the detection of remnant tumors in patients. The results are also encouraging, albeit on a relatively small dataset of patients with remnant tumor. All reviewers received the paper positively and had a few questions regarding the methods. One of the reviewers’ concerns was the choice of the GODS method compared to other classification methods in the literature. The reviewers also commented on using better metrics for handling the class imbalance problem. It is also not clear to me how the FLIm images of the intraoperative surgical cavity were accurately mapped to the frozen section images of the remnant tissue excised from the surgical cavity. There is no simple method of registering the two images. The details of this are missing in the paper. This is an excellent paper overall, and I would encourage the authors to address the reviewers’ comments in the rebuttal phase.
Author Feedback
We would like to express our appreciation for the thorough review and constructive critique of our paper.
R1: GODS one-class classification, considered among the state-of-the-art in the field, was selected because it can be generalized for a small dataset compared to existing deep learning models. Furthermore, one-class classification simplifies the classification problem by focusing solely on learning the characteristics of the original dataset.
R1: To tackle the class imbalance issue, we explored different resampling and cost-sensitive learning methods. However, we encountered challenges such as model underfitting or overfitting. To address this, we are currently evaluating synthetic sample-generating techniques like GANs, ADASYN, and SMOTE to generate artificial samples of the minority class. However, modifying the dataset with synthetic samples introduces additional complexities. Nevertheless, we plan on investigating these in future work.
R1: To our knowledge, the in vivo label-free optical examination of positive surgical margins (PSM) in the surgical cavity in TORS procedures (or any other head and neck procedure) has not been reported. While a few studies [8] reported detection of PSM in head and neck procedures, these were conducted in ex vivo resected specimens and entailed the use of fluorescence probes (exogenous fluorophores) [8].
R1: Our label-free FLIm system is uniquely designed to integrate with TORS, enabling surgeons to utilize our technology for intraoperative in vivo examination of regions of interest. The FLIm contrast originates from intrinsic tissue properties. Cancer tissue has a different fluorescence lifetime characteristics than healthy tissue due to biochemical and metabolic changes (Page 2).
R2 (3.1): The point-level labels used for training and evaluating the model are derived from the annotated H&E sections from frozen and fixed resected tissue. The process involves manual co-registration of annotated H&E and in vivo images as previously published, which will be included in section 2.2. In brief, the protocol follows the standard histopathological evaluation by a pathologist. The resulting annotations are then transferred to the image of the ex vivo gross specimen. Based on tissue landmarks, the labels are mapped to the in vivo image of the surgical field.
R2 (3.2): The total number of healthy and cancer FLIm points are listed in Table 1. We measure 115 FLIm data points (‘pixels’) per second and a scan is on average 60 seconds resulting in approximately 6900 points per scan. During the LOOCV, all FLIm data points of the patient in the testing set are excluded from the training set.
R2 (6.3): sensitivity and specificity are common metrics for assessing the classification of imbalanced data because they are not affected by the prevalence of the disease. Accuracy can be affected by the prevalence of the disease. An alternative is the balanced accuracy.
R4 (6.2): Table 1 presents the dataset of 22 patients, with 19 cases classified as clear margins (residual cancer = no) and 3 cases with positive surgical margins (residual cancer = yes). The limited number of positive surgical margins is a characteristic of this clinical problem, Previous studies referenced in this manuscript report PSM rates ranging from 10% to 30% in head and neck cancer surgeries (see pages 1 and 2). Please note that the classification is performed for every FLIm point and there are 170,535 healthy points and 2,451 cancer points in the dataset.
R4 (3.2): Leave-one-patient-out cross-validation assesses the model’s effectiveness for each patient individually. During training, the models were exclusively trained with point-level healthy labels to enable generalization to the single class. Subsequently, during testing, point-level healthy and cancer labels were introduced to evaluate the model’s performance.
R4 (6.5): We will add FP and FN to Fig. 2
R1,R2,R4 point out some minor corrections which have been incorporated.
Post-rebuttal Meta-Reviews
Meta-review # 1 (Primary)
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The authors have adequately addressed the reviewers’ comments. There is sufficient value in the paper for acceptance to MICCAI.
Meta-review #2
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
Overall, I found this paper to be quite interesting and potential for high impact for in-situ characterization of remnant cancerous tissue. The rebuttal seems to have satisfied most of the reviewer’s concerns with regards to the chosen metrics and missing details in the methods. My concern would be how the authors plan to integrate all of these elements in the revised elements given the extent of these additions. Still, I would be leaning more on the positive side given the unanimity in the supportive reviews.
Meta-review #3
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
All reviewers agree that the authors have adequately addressed their original concerns and that the paper addresses a clinically significant problem in an interesting way which would be of interest to the community.