Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Khanh Nguyen, Huy Hoang Nguyen, Aleksei Tiulpin

Abstract

This paper tackles the challenge of forensic medical image matching (FMIM) using deep neural networks (DNNs). FMIM is a particular case of content-based image retrieval (CBIR). The main challenge in FMIM compared to the general case of CBIR, is that the subject to whom a query image belongs may be affected by aging and progressive degenerative disorders, making it difficult to match data on a subject level. CBIR with DNNs is generally solved by minimizing a ranking loss, such as Triplet loss (TL), computed on image representations extracted by a DNN from the original data. TL, in particular, operates on triplets: anchor, positive (similar to anchor) and negative (dissimilar to anchor). Although TL has been shown to perform well in many CBIR tasks, it still has limitations, which we identify and analyze in this work. In this paper, we introduce (i) the AdaTriplet loss – an extension of TL whose gradients adapt to different difficulty levels of negative samples, and (ii) the AutoMargin method – a technique to adjust hyperparameters of margin-based losses such as TL and our proposed loss dynamically. Our results are evaluated on two large-scale benchmarks for FMIM based on the Osteoarthritis Initiative and Chest X-ray-14 datasets. The codes allowing replication of this study have been made publicly available at \url{https://github.com/Oulu-IMEDS/AdaTriplet}.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_69

SharedIt: https://rdcu.be/cVVqq

Link to the code repository

https://github.com/Oulu-IMEDS/AdaTriplet

Link to the dataset(s)

https://nda.nih.gov/oai/

https://nihcc.app.box.com/v/ChestXray-NIHCC


Reviews

Review #2

  • Please describe the contribution of the paper

    This paper improves triplet loss by imposing a panelty on the “hard” triplets whose negative sample stands close to the positive sample and the anchor in the feature space. The panelty was implemented by an adaptive gradients depending on the difficulty of negative samples. The proposed method greatly improved the matching results on the selected datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method was mathematically proved in detail and the main concept of the target problem and resolving ideas were clearly descripted and visualized.
    2. The proposed method was effective on the selected datasets throught reasonable experiments, and make significant improvement with minor modifications.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. As an alternative of the general loss function in field of metric learning, the proposed method should be evaluated on more mainstream architectures and datasets.
    2. Some expressions are confusing and need to be modified.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper is highly reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Some sentenses are confusing and affect the readability. For example, in Introduction, ‘Unlike general CBIR, longitudinal medical imaging data of a person evolves in time due to aging and the progression of various diseases’. In Section 2.4, ‘The prior work [22], considered incorporating an additional term that is minimized when a hard negative example is detected.’

    1. Some abbreviations need to be written in full with proper explaination when they first appear, such as CV.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The methodology of this work is concise and of elegant simplicity with good portability. It achieves significant results.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This work proposed a new AdaTriplet loss modified from the Triplet loss to improve the image matching problem over the hard negative samples. It also proposed an AutoMargin method to adjust margin hyperparameters during the training.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The writing is very good. The problem is clear, the existing method is analyzed, and the proposed method is well motivated. The figures in the manuscript help a lot in understanding the work.
    2. Both theoretic analysis and the experimental results are presented in the manuscript.
    3. Alation studies shows the effectiveness of the proposed method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Minor:

    1. The purpose of th AutoMargin is to choose the hyperparameter margin automatically. However, the AutoMargin method (equation 7 and 8) has new hyperparameter K_delta and K_an. How to select these two hyperparameters?
    2. More explanation on this statement “we want to increase the virtual thresh- olding angle between anchors and negative samples, which leads to the decrease of β(t)” ? Is β(t) the threshold in AdaTriplet loss to ensure f_n is far from f_a?
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code is provided so it is reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This a great work. My questions according to the AutoMargin method is listed above in the weaknesses section. The authors mentioned that they will test more models and datasets. I totally agree with that.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is novel, the analysis is comprehensive, the writing is clear, and the results are good.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    This paper investigates image retrieval for forensic medical image matching by introducing a new triplet objective with corresponding margin adaptation method. In particular, the authors propose AdaTriplet, which combines the standard triplet loss with a simple regularization on anchor-negative distances, and AutoMargin, which uses distance statistics to automatically adapt both the standard triplet margin and the additional regularization margin parameter. The performance of these methods is evaluated on two FMIM benchmarks, showcasing convincing performance especially when the subject time differences increase.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is very well written and structured, with the proposed methods themselves being well motivated. The performance of Adaptriplet + Autogmargin is quite convincing (see Section 5 for some issues with this particular aspect.).

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • While the FMIM problem is incredibly interesting, it is not entirely clear how AdaTriplet specifically tackles this problem, and it would be important to clarify/investigate this in more detail. Instead, it is proposed as a standalone novel Deep Metric Learning method. However, for this to work, much more extensive discussion of other existing ranking objectives, but in particular negative mining strategies, have to be discussed in conjunction, with AdaTriplet and Automargin sharing similarities to existing works such as margin loss (Wu et. al, “Sampling Matters in Deep Embedding Learning”) or Smart Mining (Harwood et al., “Smart Mining for Deep Metric Learning”). The issue of less informative/reductive triplets can be often already addressed with a tuple mining approach.

    In general, there are many more recent approach in Deep Metric Learning to account for - given that classification-based approaches perform much worse than sample-based methods, it would be important to compare to stronger sample-based methods with tuple mining (c.f. e.g. Roth et al., “Revisiting Training Strategies and Generalization Performance in Deep Metric Learning” or Milbich et al, “Characterizing Generalization Performance under Out-of-Distribution Shifts in Deep Metric Learning for a list of more Out-of-distribution capable extensions). Particularly the automargin approach should have similar effects as adaptive mining methods such as Smart Mining (Harwood et al., “Smart Mining for Deep Metric Learning”) or PADS (Roth et al., “Policy-Adapted Sampling for Visual Similarity Learning), of which at least one should be compared to show whether AutoMargin performs competitively.

    • It would help the readability if some of the formulas and symbols are replaced or extended with full sentences/descriptions. Currently, the paper is very densely packed with a large collection of different variables and notations, which often requires multiple additional passes over various text passages.

    • Small note w.r.t 2.2: The reason to operate on the hypesphere (i.e. for normalized embeddings) is the much better scalability to higher-dimensional representation spaces (see e.g. Wang et al., “Understanding Contrastive Representation Learning through Alignment and Uniformity” or Roth et al., “Revisiting Training Strategies and Generalization Performance in Deep Metric Learning”), and not just the fact that the margin is better behaved.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    All relevant hyperparameters and pipeline settings are clearly listed, giving me no reason to doubt the reproducibility of the paper results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The paper is very well written and structured, and both AdaTriplet and AutoMargin make intuitive sense. Unfortunately, the paper disregards the large corpus of existing DML methods, in particular more recent sample-ranking objectives and tuple mining approaches, which effectively tackle what both methods aim to do. As such, the paper would heavily benefit from an extended experimental section which compares to some of the more recent sample-based objectives alongside respective tuple-mining methods.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Given the aforementioned points, I am currently opting for weak reject - while the paper is very well written, and both proposed methods are well motivated, the lack of comparison to the current Deep Metric Learning literature, as well as it not being clear to me how the proposed method in particular addresses the problem of FMIM make it hard to go for acceptance. I am however certainly open to have potential misunderstandings of mine pointed out during the rebuttal to update my score.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper introduces a new triplet loss with margin adaptation that is applied to image retrieval for forensic medical image. The new loss, named AdaTriplet, is a triplet loss regularised by hard negative samples and AutoMargin that uses distance statistics to adapt the triplet margin and the margin parameter. Experiments are based on two FMIM benchmarks and shows convincing results. The paper has mostly received positive feedback, which can be summarised as: 1) paper is well-written; 2) method is well-motivated; 3) performance is convincing; and 4) nice theoretical analysis. However, reviewers also identified a few negative points, such as: 1) missing discussion on other negative mining strategies, such as margin loss (Wu et. al, “Sampling Matters in Deep Embedding Learning”) or Smart Mining (Harwood et al., “Smart Mining for Deep Metric Learning”); 2) no comparison with recent sample-based methods with tuple mining (c.f. e.g. Roth et al., “Revisiting Training Strategies and Generalization Performance in Deep Metric Learning” or Milbich et al, “Characterizing Generalization Performance under Out-of-Distribution Shifts in Deep Metric Learning for a list of more Out-of-distribution capable extensions); 3) no comparison with adaptive mining methods such as Smart Mining (Harwood et al., “Smart Mining for Deep Metric Learning”) or PADS (Roth et al., “Policy-Adapted Sampling for Visual Similarity Learning); and 4) formulas and symbols could be simplified to improve readability. Even with the negative comments, I believe the paper should be accepted at this stage. I recommend the authors to address negative comments (1) and (4) for the camera-ready paper.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2




Author Feedback

We thank the reviewers for the provided feedback. We appreciate the acceptance of our work and provide the clarifications to some of the comments. R#2: ==== Comm.: Some sentences are confusing and affect the readability. Resp.: We will improve the readability of the paper due to your feedback in the camera-ready version. R#3: ==== Comm.: The purpose of the AutoMargin is to choose the hyperparameter margin automatically. However, the AutoMargin method (equation 7 and 8) has new hyperparameter KΔ and Kₐₙ. How to select these two hyperparameters? Resp.: We would like to address the questions about KΔ and Kₐₙ in AutoMargin. The two hyperparameters allow us to adaptively select triplets at the appropriate tails of the distributions, as shown in Fig2.b and c, respectively. KΔ and Kₐₙ are positive integers rather than non-negative real values as epsilon and beta. The larger the two hyperparameters are, the less triplets will be selected during the hard negative sample mining. In addition, since the differences ε(t, KΔ) - ε(t, KΔ+1) and β(t, Kₐₙ) - β(t, Kₐₙ+1) converge to 0 when KΔ and Kₐₙ move to infinity, respectively, KΔ and Kₐₙ do no need to be very large. Our empirical evidence showed that they should exceed 1, and simply setting them to either 2 or 4 yielded good results in our experiments. Finally, one can see that the results with AutoMargin (Supplementary Table 3) are less sensitive to hyperparameter choice than extensive grid search for the margin variables. Comm.: More explanation on this statement “we want to increase the virtual thresholding angle between anchors and negative samples, which leads to the decrease of β(t)” ? Is β(t) the threshold in AdaTriplet loss to ensure fₙ is far from fₐ? Resp.: It is true that beta(t) enforces threshold on the cosine similarity between fₙ and fₐ, and we showed it in Supplementary Fig 1. R#4: Comm.: It would help the readability if some of the formulas and symbols are replaced or extended with full sentences/descriptions. Resp.: Thanks for the comment. We will improve the readability of the manuscript where possible in the camera-ready version. Comm.: However, for this to work, much more extensive discussion of other existing ranking objectives, but in particular negative mining strategies, have to be discussed… Resp.: We would like to clarify that we used the semi-hard negative mining for the triplet loss baseline. We will highlight this further in the camera-ready. Comm.: In general, there are many more recent approach in Deep Metric Learning to account for - given that classification-based approaches perform much worse than sample-based methods, it would be important to compare to stronger sample-based methods with tuple mining….. Resp.: We agree with the reviewer and will extend our references to related work in the camera ready, given the page limit allows. However, we would like to point out, that the methodology presented in the paper is not related to sampling, and the main claim of our paper is the new loss, as well as adjusting its hyperparameters.



back to top