Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Hanqing Chao, Jiajin Zhang, Pingkun Yan

Abstract

Regression plays an essential role in many medical imaging applications for estimating various clinical risk or measurement scores. While training strategies and loss functions have been studied for the deep neural networks in medical image classification tasks, options for regression tasks are very limited. One of the key challenges is that the high-dimensional feature representation learned by existing popular loss functions like Mean Squared Error or L1 loss is hard to interpret. In this paper, we propose a novel Regression Metric Loss (RM-Loss), which endows the representation space with the semantic meaning of the label space by finding a representation manifold that is isometric to the label space. Experiments on two regression tasks, i.e. coronary artery calcium score estimation and bone age assessment, show that RM-Loss is superior to the existing popular regression losses on both performance and interpretability.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_41

SharedIt: https://rdcu.be/cVVpW

Link to the code repository

https://github.com/DIAL-RPI/Regression-Metric-Loss

Link to the dataset(s)

https://www.kaggle.com/kmader/rsna-bone-age


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper considers learning semantic representation space for medical image regression tasks. The authors propose a novel regression metric as loss function and use it to low-dimensional manifold that matches high-dimensional labels feature space. The experiment section demonstrates that the proposed loss is better than existing state-of-the-art metrics.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is very well organized and written. I have pleasure to read it. It is clear and self-contained.
    • The RM-Loss proposed in EQs. 2-3 is novel and interesting as it allows to capture interpretable representations and could be optimized with small dataset.
    • The authors provide clear and in-depth analysis with ablations. The results are consistent and the performance achieved by the loss supports the claim about performance superiority.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Minor aspects of the paper require better clarification (see details in section 8. below)
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper looks reproducible. The hyper-parameters are provided in experiments. Analysis and ablation study is provided in the paper. The DNN architectures used in the paper are mentioned and cited.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • I found EQ.6 somehow ambiguous, given a test sample x_t, and its label y_t, why the distance only computed on semantic representations fi of training space. what is the relation between f_t and f_i? In fact, Do we need f_t during testing? or maybe f_p is f_t?
    • In supplementary: the radius r used in Fig1. is less than 1. It is unclear why \epsilon=10. Is it a step? It is unclear how the performance is evaluated between r and r+10? while r\in{0,1}.
    • In proof of Lemma 1. What is D? In the case of closed geodesic, the authors claim that the uniqueness is violated. Could the authors elaborate a bit more? Does this condition breaks the bijection property and so global isometry becomes not true?Does this means obtaining multiple representations f in the manifold that match a single label representation - like multiple-to-one correspondence?
    • The ablation on full CAC datasets are somehow difficult to interpret. For example, we observe that not using the mask m was better than using m for same values of sigma and alpha, and when sigma is infinite (the linearity case), the results look comparable to non-linearity. why this behaviour?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My rating is based on the good quality of the presentations and novelty.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    Regarding all reviewers comments and the rebuttal response, I decided to keep my rating.



Review #2

  • Please describe the contribution of the paper

    In this paper, the authors present a new metric loss specifically adapted to regression. The specificity of this loss is that it takes into account a semantic aspect related to the data. Results are obtained on two regression tasks based on medical images, propose by the RSNA Bone Age Assesment Dataset and the NLST CAC score estimation dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed loss appears as pertinent for medical applications in the results section. The methodology also appears as mathematically solid.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The description of the methodology as well as its practical justification however lacks of clarity. This makes the paper difficult to read and its methodological impact not clear enough for a conference like MICCAI. I would not accept this manuscript in its current form, but I would also recommend the authors to give more maturity and clarity to their work, as I believe it could have a good potential. More specific comments are given below.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The source code will be released on GitHub.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • In section 1, the claim « To learn a meaningful representation space for regression, a loss should be able to … margin in loss functions » is central to justify the methodology but is clearly not enough justified or even simply illustrated by a convincing example. I would recommend the authors to make cristal clear the pertinence of this claim in medical imaging before developing the methodology;

    • In section 1, the sentence « It guides the deep learning model to learn a low-dimensional manifold that has the same semantic meaning as the label, … » introduces the notion of « semantic meaning », which is not described and far to be obvious when talking about regression. What is the meaning of this notion?

    • In section 2.1, is a $d_t$-dimensional vector, a one-hot encoding representation of different labels? If this is the case, why binary vectors would live in Euclidian spaces?

    • It is impossible to know from Eq. (1) what is actually optimised. The authors should use a $\hat{\theta} = \arg\min_{\theta} … $ like formulation of the problem.

    • What is $l’$ in Eq. (2) and where is it used later?

    • Just before Eq. (2) how the sample pairs are selected in a training batch?

    • If I understand well, the loss is computed for a whole mini-batch. Could we use it to compare a single $f_i$ to a $y_i$ (which leads to the information that is usually backpropagated and then averaged into the mini-batch)? Maybe an algorithm explaining how to train a neural-network using this loss would help understanding how this loss can be used in practice.

    • The paper contains many typos. The authors should use a spell checker before submitting the paper, eg: « Various clinical risk or measurement …» « Resent studies …» «…  and 𝐸 a Euclidean … » …

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the methodology seems interesting, the method description and motivation clearly lacks of clarity

  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The main contribution of this paper is proposing a novel loss for medical image regression tasks that is the Regression Metric Loss (RM-Loss). This loss could decrease MAE and other indicators to make DNN more robust.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper provided ablation analysis to clarify the results. And visualization of the learned representation space on provided dataset is given. The organization of this paper is good.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There is no significant test to valid the proposed loss. And it is not clear that if the results are stable during different runs.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    There is no detailed hyperparameter settings and the version of GPUs.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The authors should better pay more attention to the details, ‘cause there are some typos in the context. E.X. “Mean Squared Error (MES)”.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposed a novel loss to make DNN more valid. But it is lack of more validation analysis to clarify the superiority of the proposed loss.

  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
    • An embedding approach with an new regression metric loss
    • There are concern about the clarity of the writing and justification of the work through examples. Those issues must be addressed
    • The following issues MUST be address:
      • There are some confision about the notation:
      • “what is the relation between f_t and f_i? In fact, Do we need f_t during testing? or maybe f_p is f_t?”
      • ” It is unclear how the performance is evaluated between r and r+10? while r\in{0,1}.”
      • ” margin in loss functions is central to justify the methodology but is clearly not enough justified or even simply illustrated by a convincing example”
      • “What is $l’$ in Eq. (2) and where is it used later?”
      • There is no significant test to valid the proposed loss. And it is not clear that if the results are stable during different runs.
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7




Author Feedback

We are grateful to the reviewers and AC for acknowledging our contributions and considering our work “novel and interesting”. The reviewers’ questions mainly regard the justification, notations, and significance test on results as summarized by the AC. This rebuttal clarifies these issues.

  1. Justification of the work (R2) The cited sentence, “… margin in loss functions”, is a comment to [7,13]. Previous work on classification tasks [2,16,17] showed enforcing label space topological structures in the learned representation space can effectively enhance the interpretability of the representation. In contrast, our work deals with regression in medical image computing. In regression tasks, since the labels are continuous instead of categorical, those losses proposed for classification tasks cannot be directly applied. Some works [7,13] tried to adapt them into regression tasks by tuning the margin in the triplet loss. However, such adaptation lacks theoretical foundations and cannot fully explore the inherent structure of the continuous label space. Thus, we were motivated to propose a novel loss suitable for regression by enforcing the learned representation space to have the same topological structure as the label space. We started from this basic definition and derived the final loss. We demonstrated the effectiveness of the proposed Regression Metric Loss (RM-Loss) on two tasks, coronary artery calcium score estimation and bone age assessment.

  2. Confusion about notations We apologize that the manuscript contains some typos and caused confusion. The following correction and clarification will be added to our final submission.
    • (R1) In Eq.6, all f_p and f_a should be f_t (features of test sample t).
    • (R1) In Sup-Fig.1, \epsilon=0.01.
    • (R2) Eq.1 only defines the key term in the loss function. To minimize the loss function, the network’s parameters and the scale ‘s’ will be optimized.
    • (R2) In Eq.2, l’ is the loss function without m_ij for hard pair sampling. The definition of w_ij should have been in Eq.3 with D_ij. l’ is later transformed into L in Eq.5 as the final form of the proposed RM-Loss.
    • (R2) In Sec. 2.1, d_t denotes the dimension of the label, which may be multi-dimensional for the sake of generality. The label space is thus a d_t dimensional continuous Euclidean space, instead of a one-hot encoding space.
  3. Significance test & performance stability (R3) Excellent point! To evaluate the stability, we trained the model 5 times on the BAA task. Our RM-Loss achieved the mean (std) performance of 6.47(1.9e-2), 0.954(2.6e-4), 8.68(4.3e-2), 0.060(2.6e-3) for MAE, R2, D5, and RV, respectively. L1 loss (the best in all the baselines) got 6.62(4.8e-2), 0.951(1.1e-3), 8.78(5.4e-2), 0.062(1.3e-3). We also applied a one-sided unpaired t-test, which shows RM-Loss significantly outperformed L1 loss (p<0.014), except for RV (p=0.169). We will update these results in the final submission.

  4. Definition of “semantic meaning” (R2) In our work, semantic meaning refers to that the feature values and relative difference between feature representations have a meaningful mapping with the labels. For instance, in the task of BAA, each location in the feature space corresponds to a specific age and the distance between features denotes their difference in ages.

  5. Further clarifications of technical details
    • D in proof of Lemma 1 (R1) D/dt is the total derivative with respect to t.
    • Selection of training pairs (R2) For a mini-batch with n samples, all the n(n-1)/2 pairs will be used to calculate the RM-Loss. We will follow the suggestion to add an algorithm description to the paper to improve its clarity.
    • Hyperparameters (R3) r in Eq. 6 is 0.162 for BAA and 0.291 for CAC. Since r is calculated by the algorithm described in Sup-Fig 1, we didn’t count it as a hyperparameter. All the other hyperparameters are included in Sec. 3.1 and studied in Sec. 3.3
    • GPU Type (R3) The GPUs used in all our experiments are NVIDIA A100.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
    • The main issues seems to be addressed and i vote to accept.
  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    na



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors successfully address the concerns of reviewers during the rebuttal. This paper has merits by introducing a new regression metric loss. Overall the paper is well-written. I would also vote to acceptance of this paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposed a new metric loss to learn the semantic representation space for medical images. It is a overal positive review. The concerns raised in stage 1 are addressed in rebuttal. The authors MUST correct all the typos and unclearity in the final submission.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



back to top