Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yuchen Wang, Zhongyu Li, Xiangxiang Cui, Liangliang Zhang, Xiang Luo, Meng Yang, Shi Chang

Abstract

Ultrasound examination is widely used in the clinical diagnosis of thyroid nodules (benign/malignant). However, the accuracy relies heavily on radiologist experience. Although deep learning techniques have been investigated for thyroid nodules recognition. Current solutions are mainly based on static ultrasound images, with limited temporal information used and inconsistent with clinical diagnosis. This paper proposes a novel method for the automated recognition of thyroid nodules through an exhaustive exploration of ultrasound videos and key-frames. We first propose a detection-localization framework to automatically identify the clinical key-frame with a typical nodule in each ultrasound video. Based on the localized key-frame, we develop a key-frame guided video classification model for thyroid nodule recognition. Besides, we introduce a motion attention module to help the network focus on significant frames in an ultrasound video, which is consistent with clinical diagnosis. The proposed thyroid nodule recognition framework is validated on clinically collected ultrasound videos, demonstrating superior performance compared with other state-of-the-art methods.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_23

SharedIt: https://rdcu.be/cVRvO

Link to the code repository

https://github.com/NeuronXJTU/KFGNet

Link to the dataset(s)

https://github.com/NeuronXJTU/KFGNet


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper investigates akey-frame guided video classification model for thyroid nodule recognition and diagnosis. The overall framework contains two parts, the first part is for key-frame localization. In this part, a detection-localization network (based on Faster-RCNN) is trained to localize the frames with clinically typical thyroid nodules in dynamic ultrasound videos. The second part is a ultrasound video classification network (based on lightweight 3D convNet) for thyroid nodule classification/diagnosis. By making use of the adjacent N(N=32) frames of a ultrasound video, the video classification network can take the advantage of temporal information for a more precise classification. The authors have collected over 3000 clinical thyroid ultrasound videos labelled by three radiologists for the experiments.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.The idea of making use of ultrasound video clips to guide the CAD classification task. A video clip is the successive 32 frames centered by a key-frame which have been detected to contain clinically typical nodules.

    2.There are ablation experiments to evaluate the effectiveness of each network module: the usage of keyframes, the motion attention, and the 3D SPP.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The manuscript has given a brief review thyriod nodule classification method in the Introduction Section. There have been publications on ultrasound video-guided CAD method on detecting other organ/tissue. It is therefore suggested to also discuss the relative video-guided CAD works such as [U-LanD]: [U-LanD] M. H. Jafari et al., “U-LanD: Uncertainty-Driven Video Landmark Detection,” in IEEE Transactions on Medical Imaging, vol. 41, no. 4, pp. 793-804, April 2022, doi: 10.1109/TMI.2021.3123547.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The deep network structures used have been described in details, including the number of layers, the kernel size of each layer, etc. The experiments is conducted on a self collected ultrasound dataset with 3000 videos. The Reproductivity report shows that the model and codes will be released if this paper is accepted.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    1.The number of frames to be taken around a keyframe has been empirically selected as 32. It is recommended to give some explanation on this choice, in terms of the computational cost, performance, etc. Besides, I am curious what if the difference of two detected keyframes frame index are no more than 32? Would there need to take some actions to handle such cases. 2.As illustrated in Figure 1, frame-index of the detected nodules is hard-encoded as a feature component of the nodule information. Is this kind of hard-encoding suitable? Would the variation of ultrasound video lengths effect this feature component?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well-written, easy to follow and have practical value. The ablation experiments are solid.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    The manuscript represents an automated localization approach for the key frame identification in thyroid US videos combined with a motion attention module in order to have a more focus on the significant frame.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. A novel framework for thyroid nodule recognition using US video.
    2. The proposed method is addressing an important clinical problem.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. No information regarding the computation time is provided.
    2. There is no discussion on the feasibility of implementing the proposed approach for real-time clinical application.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    A clear explanation of the implementation steps is provided. A private dataset is used for the evaluation and no link is provided to access the dataset. Moreover, no information about the data collection condition (equipment, …) is provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. Section 2, Key-frame localization: As the Faster-RCNN is not introduced before, an explanation about the main characteristics of this network and why you select it as your detection model is missing in the manuscript. Also, it is not clear if you adopt it to your aplication or if you used it as it is proposed. Please clarify this point.
    2. Please explain what IOU similarity is.
    3. Section 3, Dataset: Please clarify how the dataset’s size has changed after data augmentation.
    4. Section 3, Experimental results, Table 1: It is not clear if the listed state-of-the-art methods also used the same dataset as you or not. please clarify this.
    5. Section 4, Conclusion: As there is no information regarding the computation time, it is difficult to evaluate the performance of the proposed approach in the context of real-time application inside clinical settings. I highly recommend you to add a discussion regarding this point and also focus more on the clinical significance of the proposed approach.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The manuscript propose a novel framework for thyroid nodule detection in US videos that can act as a CAD in the clinical settings. However, more evaluation and adjustement is necessary.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The paper first proposes a detection-localization framework to automatically identify the clinical key-frames with typical nodules in each ultrasound video. Based on the localized key-frames, the authors develop a keyframe guided video classification model for thyroid nodule recognition. Besides, the authors introduce motion attention module to help network focus on significant frames in an ultrasound video, which is consistent with clinical diagnosis.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper, according to the actual clinical situation, extends the deep learning based thyroid nodule recognition method to the common B-scan ultrasound videos. The proposed method with motion attention mechanism achieves higher classification performance on self-collected dataset as compared to baseline methods, verifying the effectiveness of the method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The Motion Attention module only has incremental innovation as compared to the similar temporal attention module in [2], using motion speed to replace video brightness as the radiologists’ attention indicator.
    2. Because the baseline methods may preform worse on the self-collected dataset, additional experiments should be conducted to compare the baseline methods with the proposed method on other datasets, e.g., the used datasets reported in the baseline method papers.
    3. The authors should provide a complete description of the data collection process, such as descriptions of the experimental setup, device(s) used, image acquisition parameters, subjects/objects involved, instructions to annotators, and methods for quality control.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors should provide a complete description of the data collection process, such as descriptions of the experimental setup, device(s) used, image acquisition parameters, subjects/objects involved, instructions to annotators, and methods for quality control.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The Motion Attention module only has incremental innovation as compared to the similar temporal attention module in [2], using motion speed to replace video brightness as the radiologists’ attention indicator.
    2. Because the baseline methods may preform worse on the self-collected dataset, additional experiments should be conducted to compare the baseline methods with the proposed method on other datasets, e.g., the used datasets reported in the baseline method papers.
    3. The authors should provide a complete description of the data collection process, such as descriptions of the experimental setup, device(s) used, image acquisition parameters, subjects/objects involved, instructions to annotators, and methods for quality control.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents limited innovation as compared with paper [2], by using a similar attention module and changing the imaging method from CEUS to B-scan US.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper received mixed reviews and is recommended for rebuttal. Two positive reviews are quite supportive to argue that paper to be accepted. Such as “The manuscript propose a novel framework for thyroid nodule detection in US videos that can act as a CAD in the clinical settings. However, more evaluation and adjustement is necessary.”

    In the rebuttal, please address “1. The Motion Attention module only has incremental innovation as compared to the similar temporal attention module in [2], using motion speed to replace video brightness as the radiologists’ attention indicator.

    1. Because the baseline methods may preform worse on the self-collected dataset, additional experiments should be conducted to compare the baseline methods with the proposed method on other datasets, e.g., the used datasets reported in the baseline method papers.
    2. The authors should provide a complete description of the data collection process, such as descriptions of the experimental setup, device(s) used, image acquisition parameters, subjects/objects involved, instructions to annotators, and methods for quality control.”
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    11




Author Feedback

We thank the reviewers for their constructive comments. They highlighted that our method is “novel”, “addressing an important clinical problem”(R#2), “verifying the effectiveness of the method”(R#3), and described our paper as “well-written, easy to follow and have practical value.”(R#1). Here we address the main points in their reviews.

Novelty against [2] (R#3 Q5-1) 1) Inspired by the clinical US examination, we measure frame similarity to quantify motion speed, straight-forward and consistent with radiologists’ attention. In comparison, [2] uses frame brightness to quantify attention on CEUS data, which may not be reliable with low-quality US data, i.e., for correct cases, only “the temporal attention of 40.4% cases matches with radiologists’ experience” reported in [2]. 2) Both our motion attention and TAM in [2] are from Zheng et al. (CVPR’18). We extend this module on B-scan US with motion speed, while [2] extend this module on CEUS with brightness. 3) Our main contribution is that this is the first work on automated key-frame localization, i.e., from nodule detection, ROI representation to key-frame localization. The key-frame guided network is then developed for thyroid nodule recognition. Actually, [2] also employed key-frame as the extra information for video classification. But their key-frames are localized by experts. Moreover, our work has great practical value for CAD of thyroid nodule.

Comparison on baseline dataset (R#3 Q5-2 & R#2 Q8-4) The classification baseline C3D is originally developed for nature images/videos (action recognition, etc). The datasets are quite different from US videos, which may not be suitable for comparison in MICCAI. We also checked the CEUS used in [2]. But they didn’t release their data. We will release our B-scan US videos once the paper is accepted. This will be the first public video-based dataset for thyroid nodule detection and classification.

Data collection process (R#3 Q5-3 & R#2 Q7) We have strict data collection procedure. We collected over 4000 B-scan US videos from 2020/04 to 2021/12 at 3 hospital health examination centers, following data cleaning to filter out videos with bad quality. We cropped videos to remove patients’ and device information. The devices include SAMSUNG MEDISON H60, HS50, and X60. All three devices have line array probes with frequency of 7.5MHz. We only use videos captured in cross-section direction of thyroid on thyroid left or right sides. All videos were annotated by 2 radiologists with over 10 years’ experience. All the annotations were checked by the 3rd radiologist with over 20 years’ experience. Difficult/disagreement samples were decided by 3 experts together. The video data will be released once the paper is accepted.

Other video-based work (R#1 Q5) The “U-LanD” is helpful and also supports our key-frame idea. We will discuss it in the revised version.

Choice of frame number (R#1 Q8) Our US videos have the length of 150-300 frames. We set T=32 as a trade-off between computational time and nodule morphological changes across multiple frames. The frame index is unified to 0 to 1 to indicate the relative position of current frame, in which different video length could also be handled.

Computation time (R#2 Q5 & R#2 Q8-5) For a 5-10s video, our current computation time is around 15-30s on a 2080Ti GPU. The nodule detection of each frame is the most time-consuming step. However, the detection time can be greatly improved by using a light-weighted detection model. Besides, our classification stage is fast (0.1s per video). Our framework is feasible for real-time CAD.

Answer to R#2 Q8-1 & Q8-2 & Q8-3

  1. We test several detection backbones, where Faster-RCNN can achieve the best performance (AP50: 77.34%).
  2. IOU similarity computes the intersection over union between detected nodules in current frame and the key-frame.
  3. Online data augmentation is used in our work, which performs transformations on data during each training epoch.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    From the rebuttal, I agree with the following assessment:

    “3) Our main contribution is that this is the first work on automated key-frame localization, i.e., from nodule detection, ROI representation to key-frame localization. The key-frame guided network is then developed for thyroid nodule recognition. Actually, [2] also employed key-frame as the extra information for video classification. But their key-frames are localized by experts. Moreover, our work has great practical value for CAD of thyroid nodule.”

    The data collection protocol is sufficient (although different probes may be more desriable but overall it is good enough) and the experimental results are convincing.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I think the novelty of this paper is on par with the typical MICCAI paper that takes an algorithm and adds a novel step or modification to work with medical images. Very few MICCAI papers propose entirely original methods. Assuming that the authors add some details about data collection, I vote to accept.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This manuscript provides methods for keyframe localization and nodule recognition, and the authors provide full responses to reviewer comments. The work done has clinical significance, and the research content meets the requirements of the conference.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



back to top