Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Jialang Xu, Yueming Jin, Bruce Martin, Andrew Smith, Susan Wright, Danail Stoyanov, Evangelos B. Mazomenos

Abstract

Automated quality assessment (AQA) in transoesophageal echocardiography (TEE) contributes to accurate diagnosis and echocardiographers’ training, providing direct feedback for the development of dexterous skills. However, prior works only perform AQA on simulated TEE data due to the scarcity of real data, which lacks applicability in the real world. Considering the cost and limitations of collecting TEE data from real cases, exploiting the readily available simulated data for AQA in real-world TEE is desired. In this paper, we construct the first simulation-to-real TEE dataset, and propose a novel Simulation-to-Real network (SR-AQA) with unsupervised domain adaptation for this problem. It is based on uncertainty-aware feature stylization (UFS), incorporating style consistency learning (SCL) and task-specific learning (TL), to achieve high generalizability. Concretely, UFS estimates the uncertainty of feature statistics in the real domain and diversifies simulated images with style variants extracted from the real images, alleviating the domain gap. We enforce SCL and TL across different real-stylized variants to learn domain-invariant and task-specific representations. Experimental results demonstrate that our SR-AQA outperforms state-of-the-art methods with 3.02% and 4.37% performance gain in two AQA regression tasks, by using only 10% unlabelled real data. Our code and dataset are available at https://doi.org/10.5522/04/23699736.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43996-4_15

SharedIt: https://rdcu.be/dnwOP

Link to the code repository

https://github.com/wzjialang/SR-AQA

Link to the dataset(s)

https://doi.org/10.5522/04/23699736


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors present a workflow that allows quality assessment on unlabeled Transoesophageal Echocardiography data based on annotated simulation data. Here method for domain adaptation is used.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Overall the work presented is novel and the application is relevant to the MICCAI community Using synthetic data to train/improve models that operate on real data that is difficult to obtain/annotate is certainly a most relevant problem The presented method is outlined and explained in detail

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • While I certainly liked the paper and think it is relevant and of interest, I would found it somewhat difficult to follow and would recommend some rethinking of the presentation. Furthermore the evaluation leaves out a few, at least in my opinion, interesting aspects.
    • The paper contains a large number of abbreviates, making it difficult to follow
    • You state that you want to minimize the similarity of the features, shouldn’t this be maximize?
    • What does footnote 2 refer to?
    • The section regarding SCL is difficult to comprehend
    • I would urge dividing the real data into a training and testing set, to demonstrate that the trained network can also generalize to never before seen data/patients. I would recommend dividing the dataset into training and testing patients.
    • Furthermore, it would have been interesting to include an upper base, i.e. trained on labeled real data, in order to gauge how large (or small) the gap to training with labeled domain data actually is.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors state that they will provide code upon publication, ensuring reproducibility

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    See above

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See above

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    I really enjoyed the paper and found the method quite interesting. The authors addressed some of my concerns and promised to include numbers that further aid understanding, I therefore upgrade my score to “Accept”.



Review #2

  • Please describe the contribution of the paper

    This paper proposes an unsupervised regression network for the automatic quality assessment (CP and GI) of TEE based on domain style alignment/adaptation between simulated and real datasets

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • ablation study providing evidence that all components of the proposed model are important and investigating the level of layers sufficient enough to do UFS
    • good comparison with SOTA models
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • It is unclear why quality assessment values in Fig. 1 are sometimes even higher for simulated data compared to real cases.
    • By focusing entirely on the regression problem it is difficult to appreciate the domain adaptation problem the authors provide a solution for. It is only when some examples are generated in the appendix that one could appreciate the images and the regression achieved on real data
    • It is interesting to see that the authors decided to perform task-specific learning on the quality scores of the simulated data rather than the real data, given that at least GI is related to the quality of the US image, and specially since both datasets were annotated by experts.
    • The main criticism is that despite overperforming SOTA models, MSE values are only different to 3 precision decimal values. Therefore, even if the reduction is ~5% compared to other models, differences seem to be very minimal compared to other methods.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducibility checklist is thorough

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • It is unclear what is the effect of doing UFS per batch (Eq. 2 and 3) or in groups based on the 9 different views, or even the entire dataset.
    • Section 2 doesn’t mention anything about MSE loss.
    • Transesophageal or Transoesophageal?
    • Table 3 caption refers to L_{SIM}, isn’t that L_{SCL}
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It contains novel aspects in their architecture for a very specific problem (regression) but improvements seem minor

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper describes an unsupervised domain adaptation method learning from simulated transesophageal echocardiography (TEE) to reals for automated quality assessment. A layer-wise feature stylization approach is presented. Task specific loss is imposed to regularize transfer.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The studied problem is very relevant in clinical practice. Automated quality assessment for TEE can offer great help in sonographer training practice.

    • The evaluation metric design is comprehensive given the materials presented in the supplementary.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The uncertainty-aware feature stylization module is not very convincing.
    • Performance improvement over existing art is limited.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors would provide the dataset and the code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    My major concerns come from the following two parts.

    1. The uncertainty-aware feature stylization module is not very convincing, especially considering the target problem is unsupervised domain adaptation. The layer-wise style transfer equation (3), though with help of multivariate gaussian, is still a linear rescaling of the source feature. Such feature handling, compared with generative adversarial approaches such as CycleGAN, can be very limited.

    2. From Table 1. and Table 2., the performance of the proposed model over the second best one is very limited. Standard deviation and hypothesis test needs to be added to justify if the improvement is significant.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is overall well organized, but additional numerical studies and results need to be added to justify the model performance. Additional explanation is needed to justify the rationale of the UFS module.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors has performed the hypothesis testing to justify the significance of the model improvement compared with the second best approach in existing art. The result is satisfactory but not outstanding, most p-value is close but smaller than 0.05. I think this paper is on the borderline. Given this numerical evidence as well authors’ feedback, I change my opinion to weak accept.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors present a regression task for automatic quality assessment of ultrasound data. The paper has received mixed reviews, therefore entering the rebuttal phase seems appropriate. The authors should address the points raised by the reviewers. Especially, “the performance of the proposed model over the second best one is very limited” and showed be clarified. “Standard deviation and hypothesis test needs to be added to justify if the improvement is significant.” , as recommended by R2.




Author Feedback

We are grateful to R1,R2,R3,AC for constructive feedback. Point by point answers follow.

Q1 Performance improvement-R2&R3&AC A1.1 We normalized CP,GI scores ([0,1]), so MSE values in Tables 1,2 are normalized for consistency. Given the actual ranges of CP ([0,100]) and GI ([0,4]), MSE values must be multiplied by 100^2 and 4^2, thus improvement ranges between ours and 2nd-best method are [9, 22]-CP, [0.04, 0.14]-GI. We will update Tables with denormalized MSE for clarification. A1.2 We repeat experiments via different random seeds to get mean and std of denormalized MSE for all methods. Results below for ours and 2nd-best method (settings: 100%/50%/30%/10%) show consistently smaller mean, similar std. CP-Ours 418±7/415±5/421±7/423±7, 2nd-best 434±6/425±4/427±5/431±6 GI-Ours 0.73±0.03/0.73±0.03/0.75±0.01/0.71±0.05, 2nd-best 0.84±0.04/0.84±0.07/0.86±0.06/0.88±0.14 Paired t-test on MSE results (ours and 2nd-best) gives p-values of 0.028/0.046/0.042/0.048 for CP, and 0.041/0.098/0.048/0.039 for GI, all but one < 0.05, clearly showing that improvements are statistically significant. Will add it to the final paper.

Q2 Use of UFS-R2&R3 A2.1 Compared with generative adversarial-R3: UFS disentangles style and content information, providing prior knowledge that helps the network retain semantic features (content and structure) to perform style transfer. GA methods have larger feature space but need complex losses and architectures to generate real-stylized images that retain content. UFS is superior as 1) In Table 1, SDAT and MDD are GA methods and our method outperforms them; 2) We explore with a CycleGAN first trained with sim and real data to get real-stylized images, then fed into the encoder+regrossor for AQA. Results are 468-CP, 0.9-GI, 50 and 0.17 less than ours (418, 0.73 see A1.2); 3) UFS is pluggable. A2.2 UFS per batch vs in views/entire dataset-R2: Doing UFS in views needs view labels from real data, contradicting UDA setting. Performance of UFS per batch against on full dataset is comparable (418vs426-CP, 0.73vs0.68-GI). Per batch allows efficient training, as UFS on full dataset, all data processed in every iteration, needs more computation (13vs22 ms/input).

Q3 Generalization & Use of real data-R1&R2 A3.1 Generalization & Upper base-R1: UDA methods (see references 4,18,23,24 in paper) typically use all target domain data for testing. Note that real data are already split w.r.t. patients. Table 1 (e.g. in 10% split, 90% of testing data is unseen) shows our method can efficiently generalize to unseen patients. As advised by R1, we also exclude unlabeled real data (splits 50%/ 30%/10%) used for UDA from testing, and test only on the remaining 50%/70%/90%. Results further prove our method (440/430/432-CP, 0.86/0.83/0.6- GI) can generalize to unseen data and again outperform the 2nd-best (464/448/442-CP, 0.93/0.86/0.79-GI). Fully supervised regression (encoder+regressor with 50%/30%/10% labeled real data) gives 456/428/476-CP and 0.79/0.64/0.65-GI. Our UDA method (see above) is comparable to the upper base, showing its potential and effectiveness. A3.2 Perform TL on sim vs real labels-R2: 1) Real labels cannot be used for UDA training; 2) In Fig S1, real-stylized features maintain sim content/structure while getting different real styles, thus performing TL on sim scores retains necessary information for AQA; 3) Ablation results in Table 3 verify TL.

Q4: Clarifications-R1&R2 R1: SCL helps encode domain-invariant features by maximizing the similarity. Footnote 2 is for Eq. 2. R2: 1) We will highlight the domain gap of sim/real (point to Fig 1), link to AQA regression and add that style difference, since quality assessment is similar, is the main reason for the domain gap. Also add MSE equation. Change L_sim to L_scl. 2) Values in Fig 1: Fig 1 shows dataset examples, picked randomly for each view, to highlight the domain shift. There is no relationship of the CP/GI values. Normal for sim data by experts to have high scores.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper has received diverging reviews, however, after rebuttal, the Reviewer with the lowest rating (weak reject) changed his rating to weak accept. Therefore, all reviewers align now with their voting towards a tendency for acceptance. In my point of view, the authors addressed the points raised by the reviewers in their rebuttal and especially performed a statistical analysis of the significance of their results over existing work. I think this is a solid paper that can be accepted.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents the first simulation-to-real transoesophageal echocardiography dataset for AQA tasks and proposes an unsupervised domain adaptation method, demonstrating promising results compared to four SOTA approaches. The approach is novel, the topic and application is of interest to the community, the paper is well-written, and evaluation experiments are thorough.

    The comments from the reviewers regarding the rationale of the uncertainty-aware feature stylization module and additional details/justification for improvement over SOTA have been addressed by the rebuttal.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors did a good job of responding to critiques to improve the reviewers opinions to all be 5 or above.



back to top