Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Sarah M. Muller, Sundaresh Ram, Katie J. Bayfield, Julia H. Reuter, Sonja Gestewitz, Lifeng Yu, Mark O. Wielpütz, Hans-Ulrich Kauczor, Claus P. Heussel, Terry E. Robinson, Brian J. Bartholmai, Charles R. Hatt, Paul D. Robinson, Craig J. Galban, Oliver Weinheimer

Abstract

Air trapping (AT) is a frequent finding in early cystic fibrosis (CF) lung disease detectable by imaging. The correct radiographic assessment of AT on paired inspiratory-expiratory computed tomography (CT) scans is laborious and prone to inter-reader variation. Conventional threshold-based methods for AT quantification are primarily designed for adults and less suitable for children. The administered radiation dose, in particular, plays an important role, especially for children. Low dose (LD) CT is considered established standard in pediatric lung CT imaging but also ultra-low dose (ULD) CT is technically feasible and requires comprehensive validation. We investigated a deep learning approach to quantify air trapping on ULDCT in comparison to LDCT and assessed structure-function relationships by cross-validation against multiple breath washout (MBW) lung function testing. A densely connected convolutional neural network (DenseNet) was trained on 2-D patches to segment AT. The mean threshold from radiographic assessments, performed by two trained radiologists, was used as ground truth. A grid search was conducted to find the best parameter configuration. Quantitative AT (QAT), defined as the percentage of AT in the lungs detected by our DenseNet models, correlated strongly between LD and ULD. Structure-function relationships were maintained. The best model achieved a patch-based DICE coefficient of 0.82 evaluated on the test set. AT percentages correlated strongly with MBW results (LD: R = 0.76, p < 0.001; ULD: R = 0.78, p < 0.001). A strong correlation between LD and ULD (R = 0.96, p < 0.001) and small ULD-LD differences (mean difference -1.04 ± 3.25%) were observed.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_42

SharedIt: https://rdcu.be/dnwBC

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    Quantifying air trapping on lung CT is a key step in characterization of diseases such as Cystic Fibrosis. Deep learning based methods to quantify air trapping can overcome standard threshold-based limitations for evaluating gas trapping. The authors of this study posit that a deep learning model to quantify air trapping using ultra low dose CT (LDCT) can perform as well as low dose CT in a cohort of 52 children with CF. They designed a U-Net based architecture with patch based training protocols to compare automatic air trapping predictions to radiologist ground truth. They found that the ULD model performed similarly to the LD model as measured by DICE. They also found a strong correlation between ULD and LD, and strong correlation between both with MBW.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper is an important work that illustrates the utility of deep learning for the automatic quantification of gas trapping in patients with lung disease. The current standard low attenuation area below -856HU uses a fixed threshold that may not be optimal for both adults and pediatric patients. Therefore, a deep learning approach for quantifying air trapping that performs well for ultra low dose CT protocols could be useful for evaluating gas trapping, particularly in younger patients.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors provide a number of interesting results, but there are no clear objectives detailed in the introduction section. Further, there were no statistical comparison between the models and therefore it is not clear if there is a difference in performance.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Details regarding the models and algorithms, and datasets used were provided. The code related to this work was not made available. Some details related to the reported experimental results were also provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Section 1: Introduction

    • “Conventional threshold-based methods for air trapping quantification (e.g. the −856 HU threshold on expiratory CT) are density-based and primarily designed for adults [6]. They depend on the CT protocol in use and the constitution of the patient.” The reference that is provided does not support the statement that threshold-based methods are not suitable in children. If there is no support for this statement, please focus on the radiation dose aspect of the motivation.
    • There is no related work section and therefore it is unclear what the novelty is of this study. Are there any other studies that have investigated deep learning for quantifying gas trapping?

    Section 2: Methods

    2.2 Post-processing -“The mean threshold from radiographic assessments, performed separately by two trained radiologists, was used as ground truth.” -Please provide more detail regarding how the Radiologists performed the radiographic assessment. Do you mean that the radiologists performed manual segmentations?

    2.3 Densenet Architecture and Training -It is not clear which slices of the lung (which area) was selected to use, and how and why they selected the slice used. Overall it is not clear how many patches of each scan were selected and how many images were considered as input, also what the image sizes were.

    • If your DenseNet has a common U-Net Structure, why is it considered a DenseNet and not a U-Net?
    • Why was the model trained for 100 epochs? This is a very large amount, which will usually cause overfitting, especially in smaller datasets that are not augmented. Did you implement an early stopping based on your validation data?
    • “Only patches containing at least 50% of lung tissue are considered to only include the most informative patches.” How much data did this leave you with? And how is this going to work in the future if you have to classify air trapping in patches that have <50% lung tissue? This is selective data filtering.

    Section 3: Results

    • In section 3: “For each model of each parameter combination obtained from cross-validation, the DICE coefficient was computed over the patches from the test set” Does that mean 11 patients were used as an external testing cohort and the 5- cross-validation was performed on remaining patients? If so, what do the mean and std mean for the test result?
    • Stride size should not be a hyperparameter that is tested. By increasing the stride size you decrease the amount of data you have available and therefore may alter the results to strictly include the middle-most or dense-most slices while ignoring the outermost patches.
    • The slight variations in your DICE scores could be due to a natural change in performance between training iterations and because significant differences were not tested between combinations of test scores it is difficult to rule out what is the best combination.
    • I found that following the results were a bit confusing - are you testing for the improvement in performance by addition of the inspiration scan, or improvement in performance through the use of DCT in comparison to LDCT. Please clarify.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is an important research question with interesting results. However, some clarification is required.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This work investigates the utility of a segmentation model to quantify regions of air trapping on low (LD) and ultra-low dose (ULD) chest CT in children with cystic fibrosis. Lung air trapping comprises the principle pathophysiology of cystic fibrosis. It can be estimated from CT scans and aid assessment of disease severity, but traditional Hounsfield Unit (HU) cutoffs are not reliable at lower doses–which are preferably in children. The authors demonstrate good DICE scores compared to quantification by trained radiologists on LD and ULD (0.80-0.81) and with excellent agreement between LD and ULD (R=0.962). There was good correlation with the clinical gold standard–lung clearance index (LCI) (R= 0.76-78). If validated prospectively and across multiple centers (difference scanners/geographic regions) this could be a helpful tool in assessing disease trajectory in patients with cystic fibrosis.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The strengths include focus on an unmet clinical need–a reliable, easy to perform quantification of air trapping in children with cystic fibrosis using ultra-low dose CT. While lung clearance index calculated from multiple nitrogen washout tests is a gold standard test, it is not widely available. Using CT as a means of measuring disease instead of LCI also has the advantage of pinpointing specific bronchi most susceptible to mucous plugging and could help target treatment.

    The authors provide a rigorous review of the performance of their model across various model architecture parameters, and also compare correlation and differences with previously described models. This helps to clearly show the potential impact of the model.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    As the authors describe there are no reliable HU cutoff values for LD and especially ULD CT studies, and so it is not clear how the radiologists assessed areas of air trapping for ground truth. It would be helpful if there was inter-reader comparisons and more detail on how radiologists defined air trapping and what segmentation tools were employed. It is not very surprising that the DenseNet model had the best DICE as compared to the radiologist interpretations because this is the ground truth that trained the model.

    The lung clearance index appears to be the clinical gold standard test and Table 2 shows the best correlation for both LD and ULD as being the HU -856 cutoff value. It would be important to discuss this point and clarify the advantages of the DenseNet method if a simple cut-off may suffice in estimating LCI.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    There is no code available nor data or annotations available for download and inspection. Given the relative lack of details on the annotation method this would be challenging work to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I congratulate the authors on addressing an excellent, unmet clinical need using an automated model to quantify disease severity in cystic fibrosis that would otherwise require substantial time and resources. Providing a better understanding of the segmentation routine performed by radiologists and addressing the apparent lack of correlation with LCI of the model trained from these annotations as compared to well-described thresholding would strengthen this work.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper addresses an valid clinical need for children with cystic fibrosis and describes the performance of their air trapping prediction model across various possible parameters and in comparison with established methods and a gold standard clinical test.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    Air trapping is an important sign often scene in both obstructive and restrictive lung diseases. In this paper, the authors present a deep learning framework for estimating air trapping in ultra low-dose pediatric chest CT scans (ULD-CTs). The ground truth thresholds for air trapping were determined by two trained radiologists. The authors demonstrate that the radiological thresholds are more suitable for pediatric scans and that the conventional thresholds may not be appropriate. Based on this premise, the authors used paired inspiratory-expiratory CT patches to segment air trapping. Although, the technical novelty of this work is low, the clinical problem that this work attempts to tackle is immensely important.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of this paper include:

    1. The paper presents solution to an important problem - determining AT in pediatric ULD CTs. The experiments were conducted using 82% dose reduction (when compared to low-dose CTs). This makes the findings of this work valuable
    2. The authors present exhaustive qualitative as well as quantitative results
    3. The results were also compared with MBW, which further adds to the significance of this study
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weaknesses are:

    1. The results are generated for a very small dataset
    2. There is very low technical contribution presented in this study - as the methods used are not novel
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Fair

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The authors should consider validating their model in an external cohort
    2. The authors should try more complicated image segmentation methods and see if it improve the overall performance
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the technical contribution is fairly low, the clinical importance of this work may outweigh that. The validation against multi-breathe washout is also insightful, the clinical idea is novel, and the results look promising (but should be subjected to further validation in larger cohorts).

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Although we would want the authors to have a more technically novel contribution, the clinical relevance of this work is significant. .



Review #4

  • Please describe the contribution of the paper

    The paper aims to investigate the potential of using a deep learning approach to quantify air trapping (AT) on ultra-low dose (ULD) computed tomography (CT) scans compared to low dose (LD) scans in children with early cystic fibrosis (CF) lung disease. The conventional threshold-based methods for AT quantification are not suitable for children due to the administered radiation dose. The authors trained a densely connected convolutional neural network (DenseNet) to segment AT on 2-D patches of CT scans. The mean threshold from radiographic assessments, performed by two trained radiologists, was used as ground truth. The quantitative AT (QAT) was defined as the percentage of AT in the lungs detected by the DenseNet models and was found to correlate strongly between LD and ULD scans. The structure-function relationships were maintained and were cross-validated against multiple breath washout (MBW) lung function testing.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper aims to investigate the potential of using a deep learning approach to quantify air trapping (AT) on ultra-low dose (ULD) computed tomography (CT) scans compared to low dose (LD) scans in children with early cystic fibrosis (CF) lung disease. The topic of study is interested. The structure is clear.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors’ contribution to the field was not clearly stated, and from a technological perspective, there is no novel contribution as the authors utilized established techniques. In terms of medical contribution, the authors presented results that were comparable between LD and ULD CT scans. However, the generalizability of these findings is limited due to a lack of external cohorts, a small internal cohort, no data augmentation techniques to avoid overfitting and imprecise ground truth extraction methods acknowledged by the authors. The authors need to provide more explicit details on their approach to extracting ground truth data for both LD and ULD scans. Additionally, the use of only two internal experts in a study where the authors themselves acknowledge the variability and laborious nature of CT scans may introduce bias.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It is reproducible. However more details about the way the authors extract the ground truth is needed.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Despite the intriguing topic, there are several concerns regarding the paper. Firstly, the authors’ contribution to the field lacks clarity, and from a technological standpoint, there is no new or innovative contribution since the authors utilized established techniques. It is imperative for the authors to validate their claims on other topologies to establish the generalizability of their findings.

    In terms of medical contribution, the study presented comparable results between low dose (LD) and ultra-low dose (ULD) CT scans. However, the generalizability of these findings is limited due to the absence of external cohorts, a small internal cohort, no use of data augmentation techniques to avoid overfitting, and imprecise ground truth extraction methods that the authors acknowledged. Therefore, the authors must provide more explicit details on their approach to extracting ground truth data for both LD and ULD scans.

    Moreover, the study’s reliance on only two internal experts in a field where CT scans’ variability and laborious nature are acknowledged may introduce bias. The figures can be improved, and Figure 4 could be presented as a small table. Additionally, the authors must compute other segmentation metrics, such as sensitivity and specificity, to validate their network’s performance.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors’ contribution to the field was not clearly stated, and from a technological perspective, there is no novel contribution as the authors utilized established techniques. In terms of medical contribution, the authors presented results that were comparable between LD and ULD CT scans. However, the generalizability of these findings is limited due to a lack of external cohorts, a small internal cohort, no data augmentation techniques to avoid overfitting and imprecise ground truth extraction methods acknowledged by the authors. The authors need to provide more explicit details on their approach to extracting ground truth data for both LD and ULD scans. Additionally, the use of only two internal experts in a study where the authors themselves acknowledge the variability and laborious nature of CT scans may introduce bias. The Major factor was that the authors did not deliver enough evidence of the comparison between LD and ULD scans,thus either the medical contribution can stand.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    2

  • [Post rebuttal] Please justify your decision
    1. The reviewers noted that the contribution of the paper was not stated clearly, the related work section fell short and the objective was not explicit.

    The authors responded to this concern, they need to update the information to presented clearly in the manuscript. However, the primary issue with the study is that it is a preliminary investigation, and therefore the evidence cannot be generalized, despite the importance and interest of the idea. An external out of distribution dataset is needed (as the internal cohort i.i.d is small for efficient cross validation evaluation).

    1. The reviewers expressed concerns regarding the missing evaluation on an external cohort.

    In response, the authors explained that the dataset used in the study was unique. Children were scanned at both inspiration and expiration using two different scan protocols, without leaving the CT table. This resulted in four scans for each patient and limited the availability of patients for inclusion in the study. Additionally, a wide range of multiple breath washout indices was generated. The uniqueness of the dataset explains why the model could not be easily tested on an independent dataset, as there is no comparable dataset available. However, this explanation did not address the issue of generalizing the AI approach, which is crucial for ensuring the robustness of the results.

      • The reviewers raised questions about the ground truth generation.

    The authors addressed these concerns and acknowledged the need to clearly include this information in the manuscript.

    To summarize, the authors need to validate their findings in a small out-of-distribution (OOD) external cohort. This verification is crucial to ensure the generalizability of the claims made in this preliminary study.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Authors propose to use deep learning for quantification of airway trapping in low-dose and ultra low-dose CT of children. This work is a good example of how medical image computing can be used to address clinical needs. The paper is well prepared and easy to follow. Although not methodologically novel, the work is solid and rigorous. However, the reviewers still have some questions that require addressing.

    Strengths

    • Reviewers agree that the works addresses an unmet clinical need.
    • Authors include extensive experiments to evaluate the robustness of their approach.

    Weaknesses

    • The methodological contribution of the work is relatively limited.
    • There is no evaluation in an external cohort.

    For the rebuttal, it would be good to address the reviewer’s comments (while refraining from including additional experiments), and particularly

    • highlight your contribution (Rev. 4)
    • address the concerns of Rev. 4 regarding the ground truth




Author Feedback

We thank all reviewers and the Area Chair for their feedback. In the following, we aim to address the major concerns raised by the reviewers.

  • The reviewers noted that the contribution of the paper was not stated clearly, the related work section fell short and the objective was not explicit. In 2021, Ram et al. [12] presented a deep learning approach to segment air trapping (AT) trained on low dose (LD) CT. It achieved a good AT quantification with respect to the ground truth derived from an algorithm developed by Goris et al. [4] for generating subject-specific thresholds. The structure-function relationships for the deep learning approach were investigated by Bayfield et al. (Deep Learning Improves the Detection of Ultra-Low-Dose CT Scan Parameters in Children with Cystic Fibrosis. International Conference of the American-Thoracic-Society, 2021), on the same ultra-low dose (ULD)-LD CT dataset, which we now used to train our model. The authors observed that the percentage of AT, detected by the model from Ram et al., did not correlate with pulmonary function test results. In a research letter just published in the European Respiratory Journal, Robinson et al. address the urgency of reliable AT quantification on ULDCT. In our study, we investigated the influence of dose reduction on AT quantification. We aimed to achieve a good AT segmentation on ULD as well as LDCT while maintaining structure-function relationships. In contrast to the aforementioned preceding studies, despite an 82% dose reduction, we were able to show comparable quantitative AT values, describing the percentage of AT in the lungs, detected by our model, for ULD and LDCT. Achieving a comparable AT quantification with an 82% reduced dose is an enormous benefit. This holds true especially for children with a chronic lung disease where recurrent CT scans are performed as part of the monitoring routine. The fact that performing CT scans in children to deliver cumulative doses of about 50 mGy could almost triple the risk of leukemia and doses of about 60 mGy could triple the risk of brain tumors (Pearce et al. Radiation exposure from CT scans in childhood, Lancet, 2012) underscores the relevance of our work.

  • The reviewers expressed concerns regarding the missing evaluation on an external cohort. We would like to point out that the dataset used in our study is a unique dataset. Children were scanned at inspiration and expiration, with two different scan protocols, without leaving the CT table. This results in 4 scans for each patient and explains the limited availability of patients to be included in the study. In addition, an extensive range of multiple breath washout indices was generated. The particularity of the dataset clarifies why the model could not easily be tested on an independent test dataset since there is none available obtained in a comparable manner.

  • The reviewers raised questions about the ground truth generation. We used the mean threshold from radiographic assessments, performed separately by two trained chest radiologists. First, the radiologist loaded the inspiratory and corresponding expiratory CT scan in our inhouse software. After loading, the scans were displayed next to each other where the radiologist could go through each of them individually. The segmentation was not drawn manually by the radiologist. Instead, we used a patient-specific threshold T. An AT map was generated by classifying all expiratory CT voxels < T as AT. Using an integrated slider functionality, the radiologist was asked to choose T for each patient such that the AT map best describes the trapped air. Since a manual AT assessment is very time-consuming, the slider-based approach provides a good trade-off between time consumption and accuracy. With this technique, we are able to guarantee a high ground truth quality since two trained radiologists selected a personalized threshold for each patient and no generic method was used as done by Ram et al. [12].




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Taking into account the original reviews, the rebuttal, and the response of the reviewers to the rebuttal, I think that the work is at too early a stage to warrant publication at MICCAI. Because the authors only show experiments on a very specific and small-scale problem, it is unclear how general the proposed method is and what its contribution to the field could be.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work seems to have intended to address an important clinical need with a deep learning solution. The rebuttal has clarified a few concerns raised by reviewers but concerns on the validation only on small and internal datasets could not be addressed provided that no additional experiments are allowed. Though the technical novelty of the work is limited, the clinical problem that it intends to address could be likely to bring some merits to the community. It could be of interest to the readers beyond the methodology itself.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I do not have the domain knowledge to judge the practical value and importance of this problem and the impact of the proposed solution.

    The paper is well-written, but the technical novelty is very limited and there are no comparisons with other methods. Results look good, but some of the results are almost obvious and not very interesting, such as those in the first paragraph on page 6.

    By the way, how did you normalize the images between 0 and 1? Did you account for outlier voxels? Otherwise normalization would be unstable.

    Results in Table 2 are important and seem to suggest the proposed method does not work well. However, the authors do not properly discuss this.



back to top