List of Papers By topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Qingbo Kang, Jun Gao, Kang Li, Qicheng Lao
Abstract
Masked autoencoder (MAE) has attracted unprecedented attention and achieves remarkable performance in many vision tasks. It reconstructs random masked image patches (known as proxy task) during pretraining and learns meaningful semantic representations that can be transferred to downstream tasks. However, MAE has not been thoroughly explored in ultrasound imaging. In this work, we investigate the potential of MAE for ultrasound image recognition. Motivated by the unique property of ultrasound imaging in high noise-to-signal ratio, we propose a novel deblurring MAE approach that incorporates deblurring into the proxy task during pretraining. The addition of deblurring facilitates the pretraining to better recover the subtle details presented in the ultrasound images, thus improving the performance of the downstream classification task. Our experimental results demonstrate the effectiveness of our deblurring MAE, achieving state-of-the-art performance in ultrasound image classification. Overall, our work highlights the potential of MAE for ultrasound image recognition and presents a novel approach that incorporates deblurring to further improve its effectiveness.
Link to paper
DOI: https://doi.org/10.1007/978-3-031-43907-0_34
SharedIt: https://rdcu.be/dnwcL
Link to the code repository
N/A
Link to the dataset(s)
N/A
Reviews
Review #1
- Please describe the contribution of the paper
The authors used masked auto-encoders (MAE) to deblur ultrasound images and tested the effectiveness of the deblurring process using thyroid nodule classification tasks. They have used an asymmetric encoder-decoder, with VIT as an encoder and transformer as a decoder. They have compared various de-blurring techniques with vanilla MAE.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The most interesting result from this research is that, when compared to using natural images, in domain self-supervised learning pre-training yields better results.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The epouch number is quite large. This might not be appropriate for applications requiring dynamic training and translation.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The authors have agreed to update their code in GitHub. Any publicly available data can be tested using their code. Detailed summary of the hyper parameter settings and ablation study is provided in the supplementary material which ensures reproducibility of the paper.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
It is very interesting to see the use of deblurring in MAE. Though they have tested the concept in a classification task, it will be beneficial to look for the immediate clinical application of such tasks. It is recommended to test the algorithm using US images of varying quality that have been obtained from different machines without additional blurring.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
7
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper is well organized and written. The appropriate references have been cited. A large dataset has been used to evaluate this work. The utility of SSL pre-training has been demonstrated in thyroid nodule classification. The use of deblurring-based pre-training is well justified. The ablation study on the encoders and transfer learning approaches are presented in a detailed manner.
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
7
- [Post rebuttal] Please justify your decision
The authors have provided clarification for comments. I recommend this paper for acceptance.
Review #2
- Please describe the contribution of the paper
In this paper, a novel method based on the deblurring operation, named MAE, is proposed. The method integrates the low signal-to-noise ratio characteristic of ultrasound images into the pre-training framework of the autoencoder-decoder to abstract the features of the deblurring proxy task, which helps to better learn the features of ultrasound images. Experimental results demonstrate the superiority of the proposed method compared to MAE in classification tasks.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Introducing deblurring operations into a self-supervised learning framework is a novel approach that fully considers the low signal-to-noise ratio characteristic of ultrasound images.
Multiple comparative and ablation studies have demonstrated the effectiveness of the proposed method.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The description of the dataset is not clear. The data section only provides the total amount of data, and the data splitting method for the classification task is not specified.
Performance validation on public datasets is missing. The difference between Denoising MAE, MAE, and the proposed method in this paper is large, and it needs to be verified whether the same conclusion can be drawn on other publicly available ultrasound datasets.
In the Results section, it is mentioned that “In addition, we still add the denoising MAE for comparison, although it has proved to be ineffective for ultrasound images.” but there is no detailed explanation or reference to support this claim.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Probably. the method description is clear. It seems the code will be made public in the future. The experimental dataset used in the study is private.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
I suggest adding a detailed description of the data and conducting experiments on public datasets to obtain consistent conclusions
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The proposed method is relatively simple, but it fully combines the low signal-to-noise ratio characteristics of ultrasound images. Experimental results demonstrate the effectiveness of the proposed method. However, the performance of Denoising MAE is significantly lower than that of MAE and the proposed method in the comparison methods, and it would be more convincing to supplement the conclusion with experiments on publicly available ultrasound classification datasets.
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #3
- Please describe the contribution of the paper
This paper extends and improves previously proposed masked autoencoder (MAE) to ultrasound images for thyroid nodule classification. MAE is a self-supervised learning method initially proposed for natural images. MAE pretrains the backbone network by reconstructing the masked images. The authors improve upon MAE by including an extra step of deblurring in the reconstruction process. This is based on the observation that ultrasound images have a low signal-to-noise ratio, and the deblurring step helps the model to identify finer details more accurately.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The deblurring idea is novel and simple and based on a good intuition that ultrasound has generally low signal-to-noise ratio.
The comparison is made with many different methods.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
While I liked the idea, the contribution and novelty are marginal. The title doesn’t match what delivered, I was expecting to see more ultrasound datasets that benefited from debluring from this title.
There is no good literature review for the related methods like MAE and denoising MAE. The literature on MAE could be better reviewed and in more details. What are the benefits of MAE over recent SimCLR method? Why using MAE at the first place?
It is unclear to me what is the difference between denoising MAE and deblurring? It is essential to review the most important similar works.
Authors mention they use blurring for downstream supervised training to prevent distribution shift. However, blurring removes subtle information in ultrasound images. Wouldn’t it be better to supervisely train (linear probe or finetune) the networks without blurring? What would be the results in this case?
What does 75% masked ratio mean? Does that mean 75% of image is masked? what is the value in masked regions?
Authors did not report the standard deviations of their results. Most of the results seem only marginally different. I would suggest the authors to use cross-validation at least for MAE, denoising MAE and deblurred MAE.
Please include some of the details in the text to the captions of figures. It is not obvious if Figure 1 is the results in linear probe or finetuning mode. Figure 2 is confusing as well. I would suggest divide it to two, one for blur methods and one for intensity of blur.
Considering that ACC and F1 are threshold dependent metrics, have authors looked at AUROC?
Dataset preparation, preprocessing steps, labeling process, patch extraction process all need detailed description which is missing. Are all the pretraining images unlabeled or the authors just didn’t use the labels?
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Code and dataset are not available so not reproducible.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
Please try to address the weaknesses as extensive as you can. In addition, there are many typos which I didn’t mention.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
3
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The number of weaknesses are quite a lot and this is my major factor in my decision.
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
The paper received mixed reviews with two (strong and weak) accept and one reject recommendations. The idea of improving ultrasound image recognition via deblurring is generally recognized by the reviewers, and the papers provides a number of evaluations with several datasets. The area chairs considered the paper and the reviewers’ comments, and agreed with the following concerns with the paper: (1) the key differences/contributions w.r.t. closely related work should be clarified, i.e. why MAE+deburning for pretraining makes a significant difference with the existing SSL methods; (2) the details about the dataset, the processing process, the protocol, etc. should be clarified; (3) comparisons with closely related methods should be given, and any conclusions should be given with supported experiments or references; and (4) Besides Gaussian blurring used in 3.3, how does other blurring work, similarly or much worse? Based on the reviewers’ comments, the authors are suggested to provide a rebuttal to address these questions.
Author Feedback
We thank all reviewers for the constructive comments. Our response is as follows:
- Why MAE+deblurring VS other SSL methods (MR, R2, R3): The MAE pretraining, as a generative SSL approach, has better performance in the non-medical domain than other SSL methods such as SimCLR and MoCo v3. We also find in our work the vanilla MAE achieves the best ACC (0.8725) in downstream task, compared to SimCLR (0.8621) and MoCo v3 (0.8696). Therefore, we choose to use MAE to validate our deblurring idea given the inherently noisy property of ultrasound images. The proposed deblurring+MAE increases the proxy task difficulty and forces the model to recover subtle details beneficial for downstream tasks. Note that our method can also be used with other MAE variants like ConvMAE, CAE. We will clarify more in the revision.
- Comparisons with closely related methods with references (MR, R3): Our work is closely related to [1] which studies denoising MAE for natural images. Our deblurring MAE, however, is an opposite direction, i.e., denoising first adds noise to the clean image and learns to remove the noise, while deblurring blurs the noisy ultrasound image and learns to sharpen the image. Based on our experiments, denoising is worse than the vanilla MAE (0.7799 vs. 0.8754 in F1). This also motivates our proposed deblurring MAE which achieves 0.8848. In addition, [2] also shows MAE is better than zoomed-in, zoomed-out, distorted, de-colorized operations. [1] Wu, et al. Denoising Masked Autoencoders Help Robust Classification. ICLR 2023. [2] Tian, et al. Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers. arXiv 2022.
- Dataset details, process and the protocol (MR, R2, R3) The pre-training dataset has no labels. For fine-tuning, we use 4493 images, with 2576 benign and 1917 malignant. We split the dataset into train/val/test with 3:1:1 ratio. The labels were obtained from fine-needle aspiration or senior radiologists. All images are directly fed into the model without patch extraction. We will include more details in the revision, and release the code upon acceptance.
- Comparisons with other blurring methods (MR) Table 2 compares 6 blurring methods. We find the Gaussian performs the best and most of blurring methods are beneficial for pretraining, however, motion and defocus are not preferable based on our experiments.
- Supported experiments on public datasets (MR, R2) Following your suggestion, we conduct experiments on the TN-SCUI2020 dataset (note that this dataset is restricted to challenge only). The experiments are conducted on the training set with 3:1:1 splitting. Our deblurring MAE achieves F1 of 0.8201, better than MAE (0.8093) and denoising MAE (0.7497). The results are consistent with the conclusion drawn from our private dataset.
- Pretraining epochs (R1) The pretraining epoch number is chosen by performing transfer experiments where 12000 gives much stable and best performance. However, smaller epochs (~6k) also give much superior performance compared with the baseline.
- Immediate clinical application and varying quality of US images (R1) Thanks for valuable advice. We are expanding our approach to other clinical applications (like Hashimoto’s disease) and cross-domain scenarios (machine, hospital).
- Explanation of denoising MAE is ineffective for ultrasound images (R2) This claim is based on our experiments (Table 1), which shows denoising MAE has much lower scores compared to the vanilla MAE.
- No blurring for downstream training (R3) If we didn’t perform blurring for downstream training, the F1 decreases from 0.8848 to 0.8667. This is because our encoder during pretraining only sees the blurred images.
- Mask ratio and value (R3) The 75% masking ratio means 75% of the image patches are masked and only the unmasked patches will be fed into the ViT encoder (masked region not used).
- Results of STDs and AUROC (R3) Our method is quite stable and robust. We will include STDs and AUROC.
Post-rebuttal Meta-Reviews
Meta-review # 1 (Primary)
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
This paper studied the ultrasound image classification problem, in which ultrasound images typically have low quality. The paper designed a deblurring MAE approach to perform feature pretraining and then applied it to ultrasound image classification. The method achieved promising results in comparisons with both supervised and SSL methods. The rebuttal provides additional details about the difference w.r.t. previous methods, and experimental details, addressing a number of the concerns of the reviewers.
Meta-review #2
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The authors have effectively addressed concerns raised regarding the contribution of their work, dataset details, and comparison with other blurring methods. The concept of enhancing ultrasound image recognition through deblurring is well-received by the reviewers, and the paper includes comprehensive evaluations using multiple datasets. Based on these factors, I recommend accepting this paper.
Meta-review #3
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The authors diligently addressed the reviewers’ concerns.