Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Hamideh Kerdegari, Nhat Tran Huy Phung, Van Hao Nguyen, Thi Phuong Thao Truong, Ngoc Minh Thu Le, Thanh Phuong Le, Thi Mai Thao Le, Luigi Pisani, Linda Denehy, Vital Consortium, Reza Razavi, Louise Thwaites, Sophie Yacoub, Andrew P. King, Alberto Gomez

Abstract

Skeletal muscle atrophy is a common occurrence in critically ill patients in the intensive care unit (ICU) who spend long periods in bed. Muscle mass must be recovered through physiotherapy before patient discharge and ultrasound imaging is frequently used to assess the recovery process by measuring the muscle size over time. However, these manual measurements are subject to large variability, particularly since the scans are typically acquired on different days and potentially by different operators. In this paper, we propose a self-supervised contrastive learning approach to automatically retrieve similar ultrasound muscle views at different scan times. Three different models were compared using data from 67 patients acquired in the ICU. Results indicate that our contrastive model outperformed a supervised baseline model in the task of view retrieval with an AUC of 73.52% and when combined with an automatic segmentation model achieved 5.7%±0.24% error in cross-sectional area. Furthermore, a user study survey confirmed the efficacy of our model for muscle view retrieval.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_15

SharedIt: https://rdcu.be/dnwcd

Link to the code repository

https://github.com/hamidehkerdegari/Muscle-view-retrieval

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose an approach to selecting consistent ultrasound views in longitudinal examinations using contrastive learning. The application is quantification of muscle volumes to assess muscle wasting. This is an interesting work and a strength is that learning is self-supervised, hence it does not require labeled data for training. However the applicability of this approach may be limited as it is applied to 64x64 clips. This matrix size is a bit too small for accurate assessment of muscle CSA. The use of cross-sectional area agreement for validation is a measure of low sensitivity as it does not measure the overlap of the regions. The approach was validated on a single image sequence. This does not provide much information about the generalizability of the approach.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This is an interesting work and a strength is that learning is self-supervised, hence it does not require labeled data for training.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    However the applicability of this approach may be limited as it is applied to 64x64 clips. This matrix size is a bit too small for accurate assessment of muscle CSA. The use of cross-sectional area agreement for validation is a measure of low sensitivity as it does not measure the overlap of the regions. The approach was validated on a single image sequence. This does not provide much information about the generalizability of the approach.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper is reproducible, mainly because the authors provide a (redacted) URL of their source code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    This is an interesting paper because it studies the topic of self-supervised learning for a clinically relevant application. The approach though, seems to have not been validated extensively. This is an area of improvement especially because it would provide information about the generalizability of this technique and clinical usefulness.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Interesting work. There are some concerns about the resolution of data and the generalizability of the approach.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #6

  • Please describe the contribution of the paper

    This paper proposes the use of a network for Ultrasound plane selection to ensure repeatability between acquisitions made of the same structure at different times. These longitudinal images are used to evaluate the progression of muscle atrophy for patientes in care untis that stay long time in bed. Their main contribution is the approach to a problem not yet addressed and their proposal of an AI solution, with an accuracy similar to or even better than that of physicians, according to their qualitative study. It is important to remark that they train the network without annotations! They train with data augmentation images (horizontal flipping and random cropping). For testing, they do ask the doctors to select the image that most resembles to that one of the first time. The method is a two-encoder architecture that projects the characteristics to a new domain. It seeks to minimize the distance between similar frames and maximize the distance between different frames.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    One of their major strengths is that they will make the code available. They also propose “new” metrics to evaluate the predictions. One of them is to calculate the absolute area error predicting a segmentation network in the GT image chosen by the physician and the one chosen by the method. Their argument for me is totally valid: If the segmentation network was trained in good images, low quality images will decrease the performance of the segmentation. In addition, they do a statistical study in which they show that although physicians choose the model image instead of the machine image, it is not representative to say that the method outperforms the physician, however it is valid to say that it is similar and even better than a random choise.

    My last positive surprise is that the dataset contains inages from two ultrasound machines, which allows a generalization of the method to images with different quality.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weaknesses of this paper are: Although they state that a correct calculation of the muscle area done by a network is paramount in the chosen of the image, they do not establish how much error is acceptable on the image chosen to validate the view for a valid diagnostic view. They compare their method with a classification architecture implemented by them as well. They could also compare with another standard open source classification methods with hyperparameters open-source pre-defined. It will allows us to know if classification methods are bad for this task and is not an implementation and fine tunning matter. It seems to me that in the state of the art they are missing to mention methods of choice of planes for videos, even if they are not in different times. Such methods discuss the US image selection on different patients where the same structures have to be visualized with high quality: 2017, Ultrasound standard plane detection using a composite neural network framework. By Chen et al. 2017, FUIQA: fetal ultrasound image quality assessment with deep convolutional networks. By Wu et al. 2017, Transfer learning with convolutional neural networks for classification of abdominal ultrasound images. Cheng et al.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    They will free the code. But they also give enough details in the paper so that the architecture can be reproduce. They give all the necessary information if you want to reproduce the training, like the values of: Epochs, learning rate, batch size, number of images for train/val/test, size of the images, even the framework used.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Thank you for abording such interesting problematic. It is a topic that have bother the doctors for long time and it is really interesting the way you try to solve it and more important that you also you compare with the repetibility of the doctors. The stadistical analisys is very usefull. Thank you.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The problematic aborded in this paper is a long-time issue for doctors. I think the authors open a new field of discussion: Can networks up-perfom in repetibility compare to doctors for Ultrasound plane selection? This work will open a lot of interesting discussions during the conference

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    The authors propose a method for finding and retrieving maximally similar ultrasound views from longitudinal rectus femoris ultrasound scans in ICU patients at risk or suffering from muscle atrophy. The authors utilize a self-supervised method to learn representations without the need for manual annotations and demonstrate that this outperforms a baseline method. Furthermore, the authors demonstrate that views identified as similar tend to have similar cross-sectional area.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors provide some baseline insights into longitudinal ultrasound view matching that indicate that self-supervised methods could be a good fit for the problem. The biggest strength of this paper comes from a conducted user-study with physicians who indicated that the trained algorithm is helpful in a clinical setting.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper does not describe sufficiently how major hyperparameters were chosen. The authors use a non-standard architecture for their encoder instead of, for example, a ResNet-18. This makes comparisons difficult, where, with a standard architecture, the features learned by the authors’ algorithm could have been compared with a standard transfer-learning setup from a trained ImageNet model. Section 4.1 mentions choosing hyperparameters based on a validation set but it is unclear whether that only includes the parameters listed immediately afterwards. For example, the low number of augmentations is another unusual choice that raises questions about why it was made.

    The annotation procedure of the classification data needs to be explained further, “manual frame selection by doctors” is insufficient. How many doctors? How much did frame selection differ between doctors? What were the instructions given to the doctors? What are their backgrounds (radiologists, GP, etc.)?

    The authors mention using a trained UNet model for segmentation. This model needs to be cited or introduced in significantly more detail. It is also unclear, given access to a model like this, why we wouldn’t use some intermediate embedding from this UNet to compare the features of the proposed model against.

    It is unclear why the authors chose the specific contrastive learning framework for this paper, given that another framework, DINO (arXiv:2104.14294) has explicitly been shown abnormally good image retrieval performance.

    The evaluation of the method using similar area seems imprecise. The motivation of the paper is to retrieve similar ultrasound images in a population that is undergoing significant changes in muscle mass. As such, a large change in muscle mass between retrieved views may simply indicate good response to physical therapy or an overall decline in constitution as a result of illness, and not a failure of the model to retrieve similar views. If getting views with maximally similar area was the goal, the authors could just use the UNet to segment each frame and pick the one with the most similar area. The user study is a much stronger motivation for the proposed method.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Besides a lack of justification for some of the architecture and hyperparameter choices, no major issues.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    For the motivation stated in the introduction: BIA seems like an easy analysis to do, even in an ICU setting?

    The authors chose the most basic way to select negative and positive samples. This is sufficient for an initial exploration of the topic but in the future, maybe positive and negative samples can be chosen in a more informed manner. One very obvious choice is not to include the frame directly before or after T1 as a negative sample in the batch.

    A better description of data splitting criteria for validation would be beneficial.

    Figure 3 is hard to parse and could be revised or at least described in more detail.

    The authors are not specific enough about describing how the ultrasounds were taken, and the length of the videos.

    The authors don’t say if patients were recruited or analysis was done in retrospect

    Figure 4, “Middle” label should be defined clearly.

    Figure 6 has misaligned axes.

    Is any quality control done for T1 frames (e.g. failure case: bad frame used) or are they picked entirely randomly?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper can easily be elevated to an accept by improving some of the presentation and conducting a few more comparisons to baseline methods. The user study is a main strength of the paper and should be a central part.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The three reviewers acknowledge that this work has merit, but they point to several issues that should be addressed. An important limitation is validation and generalization of the proposed method. Other issues include lack of detail about selection of hyper parameters as well as the procedure followed for creating the manual frame selection. There are also highly relevant papers that are not mentioned/discussed that should be added to the camera ready version of the paper.




Author Feedback

We would like to thank reviewers and area chairs for their feedback and comments. We would like to address some of the points raised as follows:

Reviewer one: “An important limitation is validation and generalization of the proposed method. They asked about the applicability of this approach, as clip size (64x64) is a bit too small for accurate assessment of muscle CSA. Also, the approach was validated on a single image sequence. This does not provide much information about the generalizability of the approach.” »> Regarding the clip size, we tried different clip sizes like (128x128), (256x256) and clip size of (64x64) works best with our model. A brief statement about this will be included in the final version of the paper. “The approach was validated on a single image sequence. This does not provide much information about the generalizability of the approach.”»> We believe there was a misunderstanding about the use of sequences. Our approach was not validated on a single image sequence; instead, we validated our model on 67 patients with multiple (typically 4 to 6) sequences each. What was carried out for each sequence was retrieving the frame corresponding to a reference frame acquired on that patient at a different time point. We have tried to explain this more clearly in the final version.

Reviewer two: “The paper does not describe sufficiently how major hyper-parameters were chosen.” »> We will expand on this in the final version to improve reproducibility. “The annotation procedure of the classification data needs to be explained further, ‘manual frame selection by doctors’ is insufficient.” »> We will address this in the final version of the paper. “If getting views with maximally similar area was the goal, the authors could just use the UNet to segment each frame and pick the one with the most similar area. The user study is a much stronger motivation for the proposed method.” »> There might have been some confusion around what the aim of the method is. The goal is to retrieve the imaging plane (frame) from within a sweep sequence over the RF muscle that corresponds to the same physical location of the image acquired a few days before, in a way that this retrieval will be robust to changes in cross sectional area of the muscle. Indeed, the goal is not to be (mis)led by matching the same muscle area since this might change as the reviewer rightly points. Instead, we want to learn, form data, the features that are invariant to changes in muscle cross section and consistent when the imaging plane is the same. This is precisely what experts would do when scanning. To this end, we developed a self-supervised method to retrieve the same view at different scan times. We have tried to clarify this, which is a crucial point, in the final version.

Reviewer three: “There are also highly relevant papers that are not mentioned/discussed that should be added to the camera ready version of the paper.” »> The papers mentioned by the reviewer are focused on view classification, this is, identifying which standard view an image belongs to. Now, this is a very different problem to the one in hand; if we consider that a view perpendicular to the length of the RF muscle is the standard cross sectional view, any frame in a sweep video from the hip to the knee would be a version of that standard view; indeed, clinicians will aim at roughly the center of the thigh, but within a few cm all views would be considered standard. What we want to do is, given a reference frame, find one that corresponds to the reference, at a later time; regardless of whether that is a standard view or not. As a result, the proposed literature is not entirely relevant.

Many thanks and best regards.



back to top