Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Raja Ebsim, Benjamin G. Faber, Fiona Saunders, Monika Frysz, Jenny Gregory, Nicholas C. Harvey, Jonathan H. Tobias, Claudia Lindner, Timothy F. Cootes

Abstract

Osteophytes are distinctive radiographic features of osteoarthritis (OA) in the form of small bone spurs protruding from joints that contribute significantly to symptoms. Identifying the genetic determinants of osteophytes would improve the understanding of their biological pathways and contributions to OA. To date, this has not been possible due to the costs and challenges associated with manually outlining osteophytes in sufficiently large datasets. Automatic systems that can segment osteophytes would pave the way for this research and also have potential clinical applications. We propose, to the best of our knowledge, the first work on automating pixel-wise segmentation of osteophytes in hip dual-energy x-ray absorptiometry scans (DXAs). Based on U-Nets, we developed an automatic system to detect and segment osteophytes at the superior and the inferior femoral head, and the lateral acetabulum. The system achieved sensitivity, specificity, and average Dice scores (±std) of (0.98, 0.92, 0.71±0.19) for the superior femoral head [793 DXAs], (0.96, 0.85, 0.66±0.24) for the inferior femoral head [409 DXAs], and (0.94, 0.73, 0.64±0.24) for the lateral acetabulum [760 DXAs]. This work enables large-scale genetic analyses of the role of osteophytes in OA, and opens doors to using low-radiation DXAs for screening for radiographic hip OA.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16443-9_1

SharedIt: https://rdcu.be/cVRx7

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper deals with the automatic segmentation of osteophytes in hip dual X-ray absorptiometry scans (DXAs). The proposed approach uses U-Net.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The original idea of the paper is the automatic segmentation of osteophytes on the hip from DXA scans using neural networks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The contribution is rather fair. Use of the existing U-Net to segment osteophytes on DXAs.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    A public database is used, the UK BioBank. No proof of reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This paper deals with the automatic segmentation of osteophytes in hip dual X-ray absorptiometry scans (DXAs). The proposed approach uses U-Net.

    The paper is well written. The contribution is rather fair.

    Contributions of the study are not well described in the paper. As contributions, only the performance of the proposed system are reported.

    The cropping method of the patches is not clear enough.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The contribution is rather fair. Direct use of the existing U-Net for the segmentation

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    The reviewer still doesn’t find enough novelty in the proposed study. See the following study:

    • https://doi.org/10.1016/j.bone.2021.116146



Review #3

  • Please describe the contribution of the paper

    The article presents a semi-automatic method for segmentation of hip osteophytes from DXA images. The first step of the method is manual localization of hip joint keypoints (can be done automatically with BoneFinder tool, as mentioned for a subset of the data). The second step is a vanilla UNet with minimal modifications applied to patches extracted around certain keypoints.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Osteophytes are one of the most distinct signs of osteoarthritis and they are included in different OA grading schemes (for hip, knee, hand, etc). Their clinical picture and overall etiology are not well understood. Since this research direction is covered in the literature very sparsely, this work provides new evidence on feasibility of automatic osteophyte detection and quantification. Subsequently, the work and the developed method open opportunities for further epidemiological studies, at least with the UK BioBank data. These are the primary contribution of this work.
    2. The methodology applied in the study is principled at all steps (dataset creation, annotation, data standardization, model training, analysis) and is well shaped to approach the related clinical quesiton.
    3. Large sample size.
    4. The article is very well structured and written in a clear language. The graphical materials are informative and sufficient.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. While the osteophyte detection performance is relatively high, the segmentation one is rather low for downstream clinical applications (Dice score of 0.6-0.7). One of the potential reasons is that the applied segmentation method is very basic (essentially, UNet without hyper-parameter optimization, as stated by the authors). Overall methodological novelty of the work is unclear.
    2. Otherwise, no notable weaknesses.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Overall, I consider reproducibility of the study as high. Several remarks below.

    1. The exact model architecture is not given explicitly, eventhough it is understandable from the text to some extent.
    2. The authors have checked the code release in the checklist, however, there is no link to the source code in the article.
    3. Software versions are not specified for the critical components of the pipeline.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. Page 3: While most of the studies in the domain are mentioned, please, add a citation to the recent study on segmentation of ostephytes from MRI - https://www.oarsijournal.com/article/S1063-4584(21)00484-2/fulltext . Also, please, include those results into Discussion.
    2. Page 5: It would be appropriate to elaborate on how the manual and automatic keypoint annotations differ. Perhaps, by showing a sample from each of the groups.
    3. Page 5: The authors say “Automatic point placements were available for a subset of all images”. Please, provide more informative details on how the subset was selected - certain cohort? / random? / etc.
    4. Page 5: Please, specify software versions for BoneFinder, Keras, and Tensorflow.
    5. Page 5: “optimized with Adam [28] (default parameter values used)”. Please, state explicitly the default parameter values.
    6. “dropout rate, with probability 0.3” -> “dropout rate of 0.3” / “dropout with probability 0.3”.
    7. Table 2: The authors use osteophyte detection Sensitivity/Specificity as the performance metric. However, it is not clear from the text what is the detection threshold (> 0 voxels? other?). Please, state explicitly.
    8. Table 2: The standard deviation of the Dice scores are considerable in comparison to the mean. For the reader to better understand the extent of the segmentation errors, it would be informative to visualize additionally either or both: (I) the distribution of sample Dice scores - histogram where (x) sample Dice score, (y) number of samples, (II) the distribution of sample Dice scores at different osteophyte severities - scatter plot where (x) ground truth osteophyte area, (y) sample Dice score. Please, consider adding those plots in the article or as a Supplemental material.
    9. Page 6: The authors make a hypothesis “could it be underfitting due to insufficient training examples?”, which eventually does not hold. This assumption sounds somewhat questionable, in the first place, since the number of the training/validation samples is actually the highest for “Lateral acetabulum”. I would suggest to remove this point from the text.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
      • Analysis of osteophytes (morphological and whatnot) in a rapidly emerging and important topic in scope of understanding the clinical picture of osteoarthritis progression. The study provides new evidence into the performance of automatic methods in the task, thereby complementing the sparse existing literature on the topic. The presented results have a good potential to facilitate a discussion on the best practices in annotation of osteophytes, longitudinal studies of their morphology, etc.
      • The study is conducted in a very principled way (excellent justification and description of the method components and the performed steps).
      • The overall novelty of this study in terms of methodolgy is rather limited.
  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The discussion on the potential causes behind the relatively moderate segmentation performance, given by the authors in their rebuttal, is convincing. The article is focused (and delivers) on bringing new evidence for clinical and biomedical research. I consider the findings interesting for sharing with the community and see the rather limited methodological novelty as non-critical.



Review #4

  • Please describe the contribution of the paper

    The authors proposed 3 deep learning networks (U-Nets) for automatic segmentation of osteophytes in DXA scans at three different sites: inferomedial femoral head, superolateral femoral head, and lateral acetabulum. The proposed framework provide good sensitivity and specificity for the detection of osteophytes and average Dice metric in terms of semgentation accuracy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel application: automatic segmentation of osteophytes from DXA scan is new and if successful could open of new avenues for screening population for a risk of total hip arthroplasty (THA).

    • Testing the method on UKBB dataset with 41,160 left hip DXA scans is a strength. The sesisitivity for detection of osteophytes was excellent >95% but the specificity was fair (>70%). The dice metric was also fair ~0.65.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The proposed U-Nets require a priori landmark detection step using bone-finder software. The authors mentioned on page 6 that landmark positioning error in this step may adversely affect the U-Nets performance especially in the lateral acetabulum (corresponding landmark point 78). Given the power of deep networks, would it be possible to learn directly from the raw DXA scans and integrate the landmark placement algorithm into the U-Net architecture?

    • 820 scans out of 41,160 subject were excluded due various reasons including image quality. How did you assess the image quality?

    -The dice metric for segmentation accuracy is fair suggesting potential improvements would be possible by optimising the network further. You may report the dice index for manual annotation between the two radiographers and discuss the proposed framework utility in light of this results.

    Minor comments:

    • In figure 4 (second row), it is better to overlay the ground truth mask on the actual DXA scan.
    • Page 5, ‘with He normal initializer’ -> ‘with the normal initializer’?
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The results should be well reproducible. The authours employed the standard U-Net architecture for segmentation and the initial landmark detection was also carried out using bonefinder software which is also available upon request. The dataset is from UKBB which is also available upon request.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • As discussed by the authors, the initial landmark detection step can adversely affect the U-Nets performance. I would integrate the landmark placement step into the U-Nets architecture to segmentent osteophytes directly from DXA scans.

    • Approximately 2% of scans were excluded due to poor image quality. Identifying these scans in large dataset with 40,000 scans could be demanding. will you consider automatic image quality assessment?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is a well-written paper with an interesting topic and results, but the methodology is not quite novel so I rate this paper as ‘accept’ rather than ‘strong accept’.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Reviewers have mixed reviews on this paper. For one side, the paper presents a novel application of a well-established method, i.e., U-Net, to segmentation of osteophytes from DXA scan. For the other side, the methodological contribution is rather fair. In the rebuttal, the authors are encouraged to address following limitations:

    • It is not clear why U-Net is chosen, given the fact that the state-of-the-art methods achieve significantly better results than U-Net.
    • The reason why the achieved average Dice score is relatively low should be discussed.
    • An explanation (better with qualitative visualization) why U-Net(4) achieved better results than U-Net (3) should be presented.
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5




Author Feedback

We thank the reviewers for their constructive comments and suggestions.

Why U-Net? (MR,R1,R3):
Our system is the first to automate hip osteophyte segmentation and presents a proof-of-concept. The focus of this work is on clinical and biomedical impact rather than methodological innovation. U-Nets are commonly used to provide a performance baseline for medical image segmentation. Recent work on nnU-Net (doi: 10.1038/s41592-020-01008-z) showed that a U-Net with carefully tuned parameters outperforms other architectures on several datasets. We acknowledge that there is a room for improvement in terms of modelling and parameter optimisation to achieve better Dice scores. Segmentation of osteophytes on 2D imaging is a novel concept. Osteophytes are abnormal so their mere presence is a sign of pathology and as such most studies look at them as a binary presence or absence. The discrimination performance of our U-Net is high, enabling the system to be applied for binary grading of osteophytes (e.g. in genetic studies).

Why relatively low Dice scores? (MR,R3,R4): The achieved Dice scores, even if considered relatively low, are unprecedented in the literature for this application. Segmenting osteophytes is extremely challenging for humans and computers alike. For example, related work on segmenting knee osteophytes from MRI images (doi: 10.1016/j.joca.2021.02.429) achieved average Dice scores <0.5. In our work, we achieved Dice scores of 0.66±0.24 for the inferior femoral head (IFH), 0.71±0.19 for the superior femoral head (SFH), and 0.64±0.24 for the acetabulum (ACT). For comparison, manual inter- and intra-observer Dice scores on a subset of our data are IFH:0.72±0.16;SFH:0.65±0.12;ACT:0.65±0.14 and IFH:0.74±0.09;SFH:0.74±0.12;ACT:0.73±0.12, respectively, when including only cases where both observations agree in the presence of an osteophyte. When including all data (i.e. also including images where there is a difference in the identified presence of an osteophyte between observations), as we do for assessing our automatic osteophyte segmentations, then the manual inter- and intra-observer Dice scores drop to ≤0.55 and ≤0.68, respectively. Further, the minimum Dice score needed to achieve clinically relevant results is still not explored for this application, and we acknowledged the need for this insight in our paper.

Why U-Net(4) achieved better results than U-Net(3)? (MR,R3): U-Net(4) includes more augmentation (i.e. displacement) than U-Net(3) to compensate for point placement inaccuracies. We will include a representative example as supplemental material, showing a DXA scan with the difference between manual and automated annotations of point 78, and the outputs of the two U-Nets for the same input patch. In some cases, U-Net(4) is able to successfully segment an osteophyte while U-Net(3) fails to detect it.

Availability of automatic point placements (R3): The BoneFinder model we used was built on images taken from UK Biobank. Those images were excluded from our automatic point placement experiments to not include any training data in the analysis.

Image quality (R4): All images were visually inspected by a clinician prior to this work to exclude images showing artefacts of movements and/or missing parts. We have started looking into automating image quality assessment for future UK Biobank releases.

Deep learning based integrated system (R4): Integrating the landmark placement into the U-Net architecture is an interesting suggestion that is worth exploring in the future. The low prevalence of osteophytes (<10%) could be a challenge for a DL system to learn accurate landmark placement in the area of osteophytes. We are building on a mature system for landmark placement as our focus was on automating osteophytes detection/segmentation.

Finally, we would like to thank the reviewers for their minor corrections which we will consider when preparing the final version of the paper.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    As an application-oriented paper, I agree with Reviewer #3 that it is important to bring new evidence for clinical and biomedical research. Authors’ rebuttal sufficiently addressed reviewers’ concerns.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents a new application of a well-established method. Although all reviewers agree with the relevance of the work, novelty remains a major concern of the paper to be accepted by MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper deals with the automatic segmentation of osteophytes in hip dual X-ray absorptiometry scans (DXAs). Points are manually located and a vanilla Unit is applied locally hereafter.

    One positive reviewer states: The discussion on the potential causes behind the relatively moderate segmentation performance, given by the authors in their rebuttal, is convincing.

    Hence it is two positive reviews even though I agree with 1 reviewer stating: The contribution is rather fair.

    Lowest rated paper to go over the threshold in my pack.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    10



back to top