Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zhihao Li, Jiancheng Yang, Yongchao Xu, Li Zhang, Wenhui Dong, Bo Du

Abstract

Pulmonary nodules and masses are crucial imaging features in lung cancer screening that require careful management in clinical diagnosis. Despite the success of deep learning-based medical image segmentation, the robust performance on various sizes of lesions is in high demand, not only for lung nodules. Thus, we propose a multi-scale neural network with improved performance to address this limitation. Specifically, we introduce an adaptive Scale-aware Test-time Click Adaptation method that utilizes effortlessly obtainable lesion clicks as test-time cues to enhance segmentation performance, particularly for large lesions. The proposed method can be seamlessly integrated into existing networks. Extensive experiments on both open-source and in-house datasets consistently demonstrate the effectiveness of our method over CNN and Transformer-based segmentation methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_65

SharedIt: https://rdcu.be/dnwBZ

Link to the code repository

https://github.com/SplinterLi/SaTTCA

Link to the dataset(s)

https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=1966254,

https://lndb.grand-challenge.org/Data/


Reviews

Review #1

  • Please describe the contribution of the paper

    Detection of pulmonary nodules and masses is essential in diagnosing vital lung cancer. Despite the very successful and highly affected deep learning models for medical image segmentation, when the nodule and masses sizes are various, still segmenting remains challenging. To address this problem and improve segmentation performance, especially for the large lesions, this paper proposed a novel solution: the Scale-aware Test-time Click Adaptation (SaTTCA) method, which uses easily obtainable lesion clicks as test-time cues. Also, the method can be integrated into other networks without changing their architecture. Comprehensive experiments on public and private datasets show that the proposed method is more effective than some CNN and Transformer-based segmentation methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Due to the usage of lesion clicks to adapt the network parameters according to the scale-aware click during the testing, the model architecture does not have to be changed. Therefore, it can be combined or adapted with the other networks at the instance level. Easy integration makes it effortless to apply to different medical image segmentation tasks. Also, this method has overcome the issue of poor segmentation performance due to the imbalanced dataset with large-scale nodules and masses. This allows the network to achieve high recall and accuracy for large-scale lesions.

    Extensive tests have been carried out by applying expensive experiments, and it has been observed that the network is effective.

    The explanations of implementation details are enlightening and beautiful. It is nice that it writes whether the datasets it uses are public or private.

    Broad and understandable information is provided for readers. The number of datasets used to train the data is satisfactory. Two public and one private dataset are used, with many CT scans, including nodules and masses. Different evaluation metrics are given volume-based Dice Similarity Coefficient (DSC), surface-based Normalized Surface Dice, and recall rate.

    The formulas are written in a clear format. It is also perfect that they created a method that considers wrong clicks to introduce error information at some voxels during adaptive click adjustment. They developed a mapping function to generate masks adaptively based on the size of nodules and masses, which has a linear relationship with the side length. The masks degenerate into a voxel according to the size of nodules or masses. If the size is bigger, the axial and side lengths of bounding boxes follow a nonlinear quadratic relationship.

    One of the other strengths that I particularly appreciated was the visual aids such as graphics, tables, and other presentations. They were well-designed, easy to comprehend, and visually appealing.

    Comparisons have been made with many different models in the experimental part, which shows that the outcomes are pretty inclusive.

    Overall, the proposed method improves the segmentation performance of pulmonary nodules and masses, which are crucial imaging features in lung cancer screening.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Although the “Click Adaptation” method is used frequently, the word and the method itself are not explained clearly. For instance, the term “obtainable lesion clicks” is used, yet the meaning or examples of it were not given. Like the previous one, the terms used during paragraphs to describe the whole story could be explained better. They could have been briefly described and identified. Furthermore, some reasons for the conditions could be explained more. For instance, a short explanation of why the accuracy of the 3D nodule segmentation model is prone to decline significantly would be better for readers who are not experts in the field.

    Due to the expensive experiments, there may be minor reproducibility. However, if only the “Click Adaptation” part is considered, reproducibility can be much more likely for the other networks. However, it would still be tricky because there is a lot of data, including private and public datasets. Additionally, the proposed method is complex, involving multiple components, such as a multi-scale neural network and test-time click adaptation, which could make it difficult for other researchers to replicate or build upon the work.

    There is an error in Part B of Figure 2, where four different parameters are shown with different colours, but the name of the fourth parameter is not shown even though the names of the other three are written in the graph.

    This paper only compared the proposed method with TransBTS and nnUNet architectures regarding segmentation performance without comprehensively comparing it to other state-of-the-art nodule and mass segmentation methods.

    Lastly, the paper needs to discuss comprehensively the limitations of the proposed method or potential errors or biases in the evaluation part.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Due to the expensive experiments, there may not be much reproducibility. However, if only the adaptation of the “click” part is considered, it could be possible. However, it would still be difficult because a lot of data, including private data, has been used. Additionally, the proposed method is complex, involving multiple components, such as a multi-scale neural network and test-time click adaptation, which could make it difficult for other researchers to replicate or build upon the work.

    Also, since the model code is not publicly available, this will significantly affect the reproducibility or reusability of the methods. It will be challenging to produce all those identical inferences since they tested the model with many different subsets of the same data.

    On the other hand, since one of the data used is publicly available. That can affect reproducibility in a good way. Also, since altering the architecture of the network is not required to implement this method, the model can be developed and adapted to other imaging tasks too. Similar techniques can be developed to be used in different fields.

    Lastly, the proposed method contains multiple complicated components, including a multi-scale neural network and test-time click adaptation, which could make it difficult for other researchers to replicate or build upon the work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    This paper proposes a novel solution to the challenging problem of segmenting pulmonary nodules and masses of various sizes in medical images, which is crucial in diagnosing lung cancer. The proposed method, Scale-aware Test-time Click Adaptation (SaTTCA), uses easily obtainable lesion clicks as test-time cues to adapt the network parameters according to the scale-aware click during testing without requiring any changes to the network architecture. The proposed method overcomes the issue of poor segmentation performance caused by an imbalanced dataset with large-scale nodules and masses, allowing the network to achieve high recall and accuracy for large-scale lesions.

    The proposed method was extensively tested using two public and one private dataset, which have many CT scans with nodules and masses. Various evaluation metrics were used to assess the segmentation performance, including volume-based Dice Similarity Coefficient (DSC), surface-based Normalized Surface Dice, and recall rate. The outcomes of the SaTTCA method were compared with various CNN and Transformer-based segmentation methods, demonstrating that it is more effective, especially for large lesions.

    The paper provides comprehensive information about the implementation details, including a mapping function to generate masks adaptively based on the size of nodules and masses, which has a linear relationship with the side length. The masks degenerate into a voxel if the size of nodules or masses is small. In contrast, if the size is bigger, the axial and side lengths of bounding boxes follow a nonlinear quadratic relationship.

    The experimental results of the method were promising in improving the segmentation performance of pulmonary nodules and masses. However, some fundamental terms, such as “Click Adaptation,” were not clearly defined, making it difficult for readers not experts in the field to understand. The terms used during paragraphs to describe the whole story could have been briefly described and identified. Furthermore, some reasons for the conditions could be explained more. For instance, a short explanation of why the accuracy of the 3D nodule segmentation model is prone to decline significantly would be better for readers who are not experts in the field.

    Part B of Figure 2 displays four parameters in different colours. However, the fourth parameter is unnamed, even though the names of the other three are indicated in the figure, indicating an error.

    Additionally, some limitations and potential errors or biases of the proposed method were not thoroughly discussed, and a comprehensive comparison with other state-of-the-art methods needed to be provided.

    Although the proposed method’s expensive experiments may limit reproducibility, it can be combined or adapted with the other networks at the instance level, which makes it easy to apply to different medical image segmentation tasks. However, the complex nature of the proposed method, involving multiple components such as a multi-scale neural network and test-time click adaptation, may make it challenging for other researchers to replicate or build upon the work. Moreover, since the model code is not publicly available, the reproducibility or reusability of the method could be affected.

    Overall, the proposed method shows promise in improving the segmentation performance of pulmonary nodules and masses, and its success with large lesions can also be transferred to other fields. However, further research is needed to address the limitations and biases of the proposed method and make it more reproducible and accessible to other researchers.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method for improving pulmonary nodule and mass segmentation performance through click adaptation is effective and can be easily integrated with other networks. The method addresses the issue of poor segmentation performance due to imbalanced datasets and achieves high recall and accuracy for large-scale lesions. In my opinion, the paper provides extensive testing and evaluation metrics with well-designed visual aids.

    However, some terminology and explanations could be more precise, and reproducibility may be difficult due to the data’s complexity. The paper also has limitations, such as limited comparisons to other state-of-the-art methods and a need for a comprehensive discussion on limitations and potential errors in evaluation.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors did not respond to one of my critical remarks regarding Figure 2. Furthermore, they conducted a comparative analysis of their model with several other models. The paper will be deemed acceptable if it addresses the final version’s minor errors.



Review #2

  • Please describe the contribution of the paper

    (1) The manuscript presents a novel approach Scale-Aware Test-Time Click Adaptation (SaTTCA), that leverages lesion clicks to adapt the parameters of a network’s normalization layer during testing. (2) The method expands the clicks as ellipse masks for test-time adaptation to enhance the segmentation accuracy of large-scale nodules and masses. (3) The experimental results, conducted on two publicly available datasets as well as an internal dataset, demonstrate that the proposed method surpasses the performance of existing approaches utilizing various backbones.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This manuscript proposed a plug-and-play scale-aware test-time click adaptation method based on lesion clicks as test-time cues to improve segmentation performance, and its effect is also verified in extensive experiments.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The description of section 2.4 test-time optimization is somewhat unclear. “After adaptively adjusting voxel C_i to ellipsoid M_i” is not described and explained.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code and pre-trained model will be made available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    (1) The multi-scale encoder inputs images with different ROIs, which may be helpful for classification, but the misaligned features may interfere with the performance of segmentation. It would be better to compare it to other multi-scale structures. (2) Section 2.4 test-time optimization needs revise for better understanding.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite some unclear descriptions of the method, the proposed test-time click adaptation is novel and demonstrated through extensive experiments.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I still have some concerns about the design of the proposed multi-scale encoder. It seems a little bit unreasonable to me. Concatenating images with different FOVs could make the tumor appear misaligned in the feature level, leading the model to extract inaccurate structure. Although the results have shown improvement, it would be great if the manuscript could include more explanation on this topic. Of course, I want to point out that this does not affect the contribution of SaTTCA, which I find to be an interesting paper.



Review #3

  • Please describe the contribution of the paper

    The authors propose a multiscale neural network to improve the segmentation performance of big lung nodules and masses. More into detail, they propose a multiscale encoder to combine feature maps for three different scales that are then given to a decoder (can be both conv- or transformer-based) to output the segmentation. The segmentation is then refined thanks to an ellipsoid mask (obtained from the lesion click) that, masking the starting segmentation, guides, at test time, the network parameters updating.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The adaptation of the normalization layers parameters is made at test-time, thus not requiring label data. Moreover, the proposed approach does not restrict the class of supported backbone architecture.

    • Not including the click information in the training of the network but employing it directly at test time does not affect the adaptability performance at test time and can be beneficial when a high data imbalance is present.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • From what I have understood, the authors deal directly with the ROI containing the mass to be segmented and not with the entire CT volume, at least for training. Conversely, at test time, the authors seem to imply that the pre-trained network segments the lesion from the whole CT volume, making the task more complex. It should be clarified to understand also better the results of the baseline obtained. Is this solution considered a second step after the first mass localization step?

    • The proposed segmentation network seems to be trained pretty standardly, not introducing specific solutions to tackle the class imbalance or pushing the segmentation network to focus more on matching the contour voxels. It would be interesting to explore the ability of the proposed solution to refine the segmentation accuracy for large masses further when starting from a more accurate result.

    • In the proposed evaluation, a comparison with other work in the literature is missing, along with a detailed description of the configuration of [7] and [25]. I would also consider looking at the following: “Zhu, Ling, et al. “HR-MPF: high-resolution representation network with multiscale progressive fusion for pulmonary nodule segmentation and classification.” EURASIP Journal on Image and Video Processing 2021.1 (2021): 1-26.” “Agnes, Sundaresan A., and Jeevanayagam Anitha. “Efficient multiscale fully convolutional UNet model for segmentation of 3D lung nodule from CT image.” Journal of Medical Imaging 9.5 (2022): 052402-052402.”

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors state that they will release the code upon acceptance and two out of three datasets are open source so reproducibility should be ok, although some minor details are missing.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The paper is generally well-written, and the topic is interesting, even if details are missing. In Sec.2.1, the authors define D, H, W as the dimensions of the original input CT volume, but then it is said that the center of the lesion is in D/2, H/2, W/2, which seems to imply that the location of the center of the lesion always corresponds to the center of the CT volume, I would consider checking the notation. In Sec.2.4, the last sentence is not completely clear. Does the L_click is given by the cross entropy and the dice? The train/val/test split ratio is reported only for the in-house dataset. Do the volumes have been isotropically resampled? From Fig.3(b), it seems that the proposed solution works well on the in-house data and, as stated by the authors, less well on LNDB since it shows few masses, and it is also reflected in the Figure. What is not completely clear to me is that it seems that after SaTTCA, the LIDC points are also super close to their previous location in the “before” graph; I would double-check it since it is a bit confusing given that Tab.2 shows that after SaTTCA the recall improves. All the tests are carried out by exploiting 3D network architectures for the segmentation task; it can be interesting also to explore the achievable improvements on a 2D segmentation network and then reconstruct the 3D segmentation. As minor fixing:

    • In Fig. 1(b), one label is missing (Mass), the Medium label is incorrectly divided into two lines, and the Micro label is misspelled.
    • In the references, 23 and 24 report the same work, just in two different citation formats; remove one.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed solution can be interesting but a better evaluation would be beneficial. From the proposed evaluation there are some confusing results, with the proposed solution having the best impact on the in-house dataset. A better comparison with the state of the art would make the work stronger. Indeed the authors does not propose a comparison with work in the state of the art and does not provide a clear description of the baselines.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper received three reviews with two accept and one reject. From the detailed comments, this is a paper with mixed opinions. So we would invite authors for rebuttal.




Author Feedback

All reviewers appreciate the novelty and versatility of the proposed SaTTC. In the following, we reply to the major concerns and suggestions.

  1. Comparison experiments (R1, R2, R3) Reply: We have compared with two recent and most related SOTA click based methods [18, 19] in Tab. 1. As suggested, we conduct more comparison experiments on dataset LIDC (D1), LNDb (D2), and In-House (D3) using metrics DSC (M1), NSD (M2), and recall (M3). The results are organized as bellow: “Method|D1M1|D1M2|D1M3|D2M1|D2M2|D2M3|D3M1|D3M2|D3M3|”.
    • P-UNet|75.1|92.2|76.8|71.3|95.7|80.8|80.8|93.4|86.0| +SaTTC|76.0|93.4|78.0|73.4|97.3|81.3|81.4|94.6|87.9|

    • UNet++|74.4|91.9|76.6|68.8|93.5|73.9|77.1|91.9|78.0| +SaTTC|75.5|92.9|77.5|69.8|94.5|74.1|78.4|92.9|83.3|

    • Deul-L|75.3|92.5|78.1|69.1|94.0|79.6|78.0|92.4|82.4| +SaTTC|76.0|93.4|79.1|69.9|94.9|80.9|79.5|93.7|85.8| Based on these experiments, SaTTC consistently improves all corresponding baseline methods, further proving its effectiveness and versatility.

  2. 2D experiment (R3) Reply: As suggested, we train 2D Unet with Deul-L (UNetDL) on slices of 3D volumes and conduct the same evaluation as 3D methods. Not surprisingly, 2D result is worse than 3D result. Yet, SaTTC still consistently performs better. UNetDL|72.8|86.2|74.2|58.9|82.4|76.8|74.6|88.2|77.7| +SaTTC|73.4|88.3|75.8|60.3|84.7|78.2|76.1|90.2|80.2|

  3. Better performance improvement on in-house dataset and reproducibility (R1, R3) Reply: As depicted in Tab. 1 and 2, SaTTC consistently outperforms the baseline methods on both public datasets with few mass (diameter > 30mm), whose segmentation is not well studied in previous works. We aim to accurately segment all pulmonary nodule and mass. The in-house dataset contains more masses, which still suffer from imbalance issue. SaTTC effectively alleviates such imbalance issue (appreciated by R1 and R3), yielding better performance improvement. We will release the source code for reproducible research.

  4. Unclear description (R1, R2, R3) Reply:
    • Input (R3): The network input is the localized ROI with shape (64, 96, 96).
    • Details in Sec. 2.4 (R2, R3): Voxel C_i to ellipsoid M_i is described in Sec. 2.3 and Eq. (1). Sec. 2.4 is dedicated to describe the loss function of SaTTC. L_Click is given by the sum of first two terms in Eq. (3).
    • Data and implementation details (R3): The train/val/test sets are divided into 7/1/2 on all the three datasets. Each ROI input is resampled to isotropic spacing of 1 mm.
    • Limitation (R1): SaTTC slightly increases inference time. For lesions of extremely non-regular shapes, some voxels in M_i may not belong to foreground lesion, leading to some inappropriate adaptation.
    • Inaccurate nodule segmentation (R1): Compared to nodule detection in terms of 3D box, voxel-wise 3D nodule segmentation is much more challenging.
  5. LIDC performance in Fig.3(b) (R3) Reply: As depicted in Tab. 2, SaTTC achieves consistent improvement on all datasets. Though LIDC has 105 masses, only 23 masses (mostly within 30-35mm) are in its test set. Therefore, despite the improvement, it is not very significant in Fig. 3(b).

  6. Minor issues (R3) Reply: We will correct them in the final version.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    First of all, I am not so sure that this problem is needed to be solved. With sufficient training, deep learning can solve this type of lesion/tumor segmentation fairly straightforwardly. No need for additional complexity … second, from Table 1, the numerical results or segmentation improvement from the standard of nnUnet is very small, probably without any difference in clinical indications.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The work presents a test time single click based adaptation to perform pulmonary nodule segmentation interactively. It is based on a pre-trained multi-scale neural network and performs the adaptation without modifying the network weights. An nnUnet and a transformer based backbone are used to study the approach on various datasets, including comparisons with other click based adaptation strategies. From my point of view, the presented approach is innovative and interesting, and seems to make a difference in nodule segmentation. A drawback is the lack of comparison to dedicated fully automated supervised nodule segmentation approaches, however, the presented idea is still interesting to the MICCAI community in my opinion. The rebuttal has clarified a number of issues that reviewers commented like comparison ot other adaptation methods or inconsistencies in the paper. Given that these issues will be revised, I tend to vote for accepting the paper.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposed to extend automatic lung nodule segmentation network with user click and multi-scale feature input. The paper is mostly well written and easy to understand. The proposed method, though seemly trivial, shows reasonable performance when comparing to the baseline methods. It will be helpful if the user could show some time comparison for final version. The test time adaptation requires extra computation which may slow down the whole process.



back to top