Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Bin Huang, Ziyue Xu, Shing-Chow Chan, Zhong Liu, Huiying Wen, Chao Hou, Qicai Huang, Meiqin Jiang, Changfeng Dong, Jie Zeng, Ruhai Zou, Bingsheng Huang, Xin Chen, Shuo Li

Abstract

Ultrasound imaging can vary in style/appearance due to differences in scanning equipment and other factors, resulting in degraded segmentation and classification performance of deep learning models for ultrasound image analysis. Previous studies have attempted to solve this problem by using style transfer and augmentation techniques, but these methods usually require a large amount of data from multiple sources and source-specific discriminators, which are not feasible for medical datasets with limited samples. Moreover, finding suitable augmentation methods for ultrasound data can be difficult. To address these challenges, we propose a novel style transfer-based augmentation framework that consists of three components: mixed style augmentation (MixStyleAug), feature augmentation (FeatAug), and mask-based style augmentation (MaskAug). MixStyleAug uses a style transfer network to transform the style of a training image into various reference styles, which enriches the information from different sources for the network. FeatAug augments the styles at the feature level to compensate for possible style variations, especially for small-size datasets with limited styles. MaskAug leverages segmentation masks to highlight the key regions in the images, which enhances the model’s generalizability. We evaluate our framework on five ultrasound datasets collected from different scanners and centers. Our framework outperforms previous methods on both segmentation and classification tasks, especially on small-size datasets. Our results suggest that our framework can effectively improve the performance of deep learning models across different ultrasound sources with limited data.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_5

SharedIt: https://rdcu.be/dnwJn

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

In this paper, the authors propose a data augmentation method for ultrasound images. By using style transformations, the authors propose mixed stye augmentation, feature augmentation, and mask-based stye augmentation. Experiments on five datasets demonstrate the effectiveness of the proposed methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The strengths of this paper are as follows:
- This paper examines data augmentation from three aspects to train CNNs on ultrasound images.
- The paper demonstrates the effectiveness of data augmentation based on style transformation for training multi-task networks of segmentation and classification.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
Many questions are raised due to the poor organization of the paper.
- In the introduction, the authors describe that different imaging processes degrade performance. It is unclear whether this is for different modalities or the same modality with different equipment. It would be understandable if the same modality and the same equipment could produce differences in medical imaging. The authors should explain carefully.
- In the introduction, the authors describe a general problem in medical imaging. Since the authors propose data augmentation for ultrasound images in this paper, the focus should be on ultrasound. If the discussion begins with general medical images, the authors should explain why the focus is on ultrasound images.
- This paper deals with data augmentation of multi-task networks for segmentation and classification in ultrasound images. This should be explained in the introduction.
- The reviewer cannot understand the description of MaskAug in Sect. 2.3.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The followings are the detailed comments to the authors.
1. In Fig. 2, there are no inputs or outputs specified. Data augmentation generally produces variation in the training data. Fig. 2, on the other hand, contains a training source and a testing source. What do they indicate?
2. Does $A$ in Eq. (1) denote the feature map obtained after global average pooling? Also, in Eq. (1), is the same dynamic range noise given to the mean and variance?
3. Experiments to evaluate accuracy should be conducted using not only private datasets, but also public datasets.
4. Each dataset is divided into training and test. In general, we divide each dataset into training, validation, and test. What is the reason for not setting validation?
5. In addition to evaluation within the dataset, experiments should also be conducted on cross DB to demonstrate the generality of the data augmentation. Cross-validation should also be performed to reduce data bias.
6. Table II is referenced, but there is no Table II.
7. Since AutoAug is cited in the paper, a comparison with AutoAug would be helpful.
8. In the results of Table 1, why are LD1 and TD1 results for the training data?
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

The architecture of the multi-task netowork is described in the supplemental material, so it is possible to implement it. On the other hand, the parameters (e.g., $\alpha$) of the proposed data augmentation method are unknown, so the code must be publicly available in order to reproduce it. PyTorch is used for the implementation, but the version is not available. The experimental environment is described.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

3
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

It is difficult to understand this paper correctly because of the poor organization of the paper, the lack of explanation of the proposed method in many parts, and the lack of clarity in the experimental conditions and results. So, I have rated this paper as reject.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The key contributions of the paper include: 1) A combined style augmentation approach that incorporates information from various sources to enhance the training set, thereby improving the model’s performance. 2) Instead of augmenting in the original image domain, we suggested a feature-based augmentation method that adjusts the style at the feature level for more effective compensation of potential source variations. 3) To minimize the influence of irrelevant style information on ultrasound images during the style transfer process, we introduced a mask-based style augmentation strategy
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors have provided a thorough and detailed explanation of the proposed method, ensuring that readers can fully comprehend the underlying concepts and implementation. This clarity in presentation demonstrates the authors’ deep understanding of the subject matter and promotes accessibility for the target audience. A significant strength of the paper is its comparison with previous methods in the field. By critically evaluating and contrasting the proposed method with existing approaches, the authors effectively demonstrate the improvements and advantages offered by their work, thereby highlighting its potential impact on the field. The paper concludes with a well-organized and concise summary of the main findings, contributions, and implications of the study. This section effectively synthesizes the key points of the paper, allowing readers to appreciate the significance of the work and its potential to advance the current state of knowledge in the field. The paper is complemented by well-designed flowcharts and figures that effectively illustrate the discussed concepts and methodologies. These visual aids not only enhance the readers’ grasp of the content but also contribute to the overall professional appearance of the paper.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Insufficient Dataset Description: The authors have not provided an adequate description of the dataset utilized in the study, which may hinder the reader’s understanding of the context and the applicability of the proposed method. Furthermore, the lack of information regarding the preprocessing steps implemented in the research could adversely affect the reproducibility of the method, thereby limiting its potential adoption by other researchers.

Disorganized Introduction Section: The Introduction section of the paper appears to be lacking in organization and clarity, making it difficult for the reader to grasp the importance and motivation behind the proposed method. A well-structured Introduction that clearly outlines the research problem, its significance, and the authors’ approach to addressing it is crucial for setting the context and engaging the readers.

Inadequate Description of Previous Methods: While the paper compares the proposed method with earlier approaches in the field, the authors have not provided sufficient information about these previous methods. A brief but informative overview of the earlier approaches would have been valuable in helping the reader understand the limitations of existing methods and the reasons behind the development of the proposed method. This additional context would further emphasize the novelty and potential benefits of the authors’ work.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors have not provided an adequate description of the dataset utilized in the study, which may hinder the reader’s understanding of the context and the applicability of the proposed method. Furthermore, the lack of information regarding the preprocessing steps implemented in the research could adversely affect the reproducibility of the method, thereby limiting its potential adoption by other researchers
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Insufficient Dataset Description: The authors have not provided an adequate description of the dataset utilized in the study, which may hinder the reader’s understanding of the context and the applicability of the proposed method. Furthermore, the lack of information regarding the preprocessing steps implemented in the research could adversely affect the reproducibility of the method, thereby limiting its potential adoption by other researchers.

Disorganized Introduction Section: The Introduction section of the paper appears to be lacking in organization and clarity, making it difficult for the reader to grasp the importance and motivation behind the proposed method. A well-structured Introduction that clearly outlines the research problem, its significance, and the authors’ approach to addressing it is crucial for setting the context and engaging the readers.

Inadequate Description of Previous Methods: While the paper compares the proposed method with earlier approaches in the field, the authors have not provided sufficient information about these previous methods. A brief but informative overview of the earlier approaches would have been valuable in helping the reader understand the limitations of existing methods and the reasons behind the development of the proposed method. This additional context would further emphasize the novelty and potential benefits of the authors’ work.

Addressing these weaknesses in the paper would greatly enhance its overall quality and increase its potential to make a meaningful contribution to the existing body of literature in the field
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

-
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The paper presents an augmentation based on style transfer for improving segmentation and classification.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

well-written Good ablation experiment evaluated on multiple datasets collected by diverse imaging setting.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The p-values could be reported for Table 1. The method is computationally expensive containing style network and classification/segmentation network but dice score improvement is very negligible (I was wondering if it is even statistically significant )
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The network architecture and training is explained but dataset is not available online
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

It is not clear how part B (feature augmentation) is used. The features are translated but how the translated feature were used for the final network?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

They have good contributions and the method is evaluated on diverse datasets.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The proposed data augmentation schemes would be useful for the ultrasound imaging community. More clarifications to the key sections of methodology would be needed.

Author Feedback

Thanks to the meta-reviewer and reviewers for your valuable comments of our work. Especially, thanks to R2 and R3 for accepting the paper directly. We appreciate your recognition of:

Our contributions (Meta-Reviewer: “useful for the ultrasound imaging community”);

Sufficient experiments (R3: “Good ablation experiment” R2: “demonstrate the improvements and advantages”);

Well-organization (R2: “detailed explanation of the proposed method” R2: “well-designed flowcharts and figures” R3: “well-written”).

Q1: Why focus on ultrasound (US) (R1). A1: US presents unique challenges compared to other modalities. 1) US is affected by speckle noise, which differs from common Gaussian noise. 2) Acoustic shadow in US can cause missing information in the image. These characteristics can lead to degraded performance in similar studies.

Q2: Clarifications of description in Sect.2.3 MaskAug (R1). A2: To provide clearer explanations, Fig. 2C illustrates the pipeline of MaskAug and the simplified steps are as follows: 1) Select content/style images from training/testing sources. 2) Use a trained network to generate ROIs of these images. 3) Feed the content image, style image and ROIs into style transfer network. 4) Translate the intensity distribution of ROIs in the content image to that of the style image.

Q3: Clarifications about Eq.1 (R1). A3: $A$ is not the feature maps after GAP, as shown in Fig.S1 of supplementary materials. Dynamic range noise for the mean and variance is the same.

Q4: Clarifications about the use of training/testing sources, validation set, and the input/output in Fig.2 (R1). A4:

As stated in the 3rd sentence of Sect.3, our datasets are divided into training and testing sources. Images from training and testing sources are available during training, but only labels from training source are used for loss computation.

We randomly selected 20% of training set as validation set during training.

In Fig.2, inputs are the images from training and testing sources, and outputs are the segmentation and classification results.

Q5: Why are LD1 and TD1 results for the training data in Table.1 (R1)? A5: They are not the results for training data. As mentioned in the caption of Table.1 and the 4th sentence of Sect.3, they represent the performance of the testing set from training sources.

Q6: Lack of information on preprocessing steps (R2). A6: Preprocessing steps are described in the 5th sentence of Sect.3.

Q7: Use of part B in final network (R3). A7: Part B (FeatAug) is not used in final network, as stated in the 1st sentence of Section 2.2.

Q8: Conduct experiments using public datasets (R1). A8: We conducted experiments on a public thyroid dataset (DDTI). The DSC/AUROC values for BigAug, Hesse et al., UDA and our method are 0.526/0.488, 0.645/0.537, 0.470/0.565 and 0.680/0.614, respectively. These experiments indicate that our method achieves the best performance on both public and private datasets, further demonstrating its generality.

Q9: A comparison with AutoAug would be helpful (R1). A9: To address your concern, we applied AutoAug to our datasets and found that it further highlights the superiority of our method. The DSC/AUROC values for LD1, LD2, LD3, TD1 and TD2 using AutoAug are 0.939/0.871, 0.876/0.653, 0.914/0.678, 0.687/0.769 and 0.535/0.392, respectively. The results indicate that AutoAug performs worse than our method and support our conclusions.

Q10: p could be reported in Table.1 and question about negligible improvement of dice score (R3). A10:

To compare our method with traditional augmentation, we conducted statistical analysis and observed significant improvements in DSC for TD1 and TD2 (p<0.05), as well as significant improvements in AUROC for LD1, LD2, LD3 and TD1 (p<0.05).

The DSC shows no significant improvement in LDs, as the liver is easy to segment, resulting in a high baseline. Although the increase in DSC is slight, our method improves both AUROC and DSC across all sources.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The author’s feedback has addressed most of the comments from the reviewer, the method description is much more clarified now.

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper is interesting and the task clinically relevant. However, in its current state the paper needs significant improvement regarding its organization and clarity, lacks enough information about the dataset used and enough discussion of related works.

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposes a data augmentation strategy for ultrasound images based on style transfer. The method is interesting and tested on five different ultrasound datasets. The reviews were very mixed (strong accept to reject). The main criticism is about missing details and inadequacies in data and method description. I agree with this. The data is not described at all and the tasks of segmentation and classification are not specified. What is classified here? Why is a multi-task network necessary? Unfortunately, the rebuttal is convincing and opens even more questions (FeatAug is not used in final network? Training and testing split is unclear. It seems that the test data is also used for validation, …). Therefore, I recommend rejection of the paper.

back to top

A Style Transfer-based Augmentation Framework for Improving Segmentation and Classification Performance across Different Sources in Ultrasound Images