Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Chaoyu Chen, Xin Yang, Ruobing Huang, Xindi Hu, Yankai Huang, Xiduo Lu, Xinrui Zhou, Mingyuan Luo, Yinyu Ye, Xue Shuang, Juzheng Miao, Yi Xiong, Dong Ni

Abstract

Regression learning is classic and fundamental for medical image analysis. It provides the continuous mapping for many critical applications, like the attribute estimation, object detection, segmentation and non-rigid registration. However, previous studies mainly took the case-wise criteria, like the mean square errors, as the optimization objectives. They ignored the very important population-wise correlation criterion, which is exactly the final evaluation metric in many tasks. In this work, we propose to revisit the classic regression tasks with novel investigations on directly optimizing the fine-grained correlation losses. We mainly explore two complementary correlation indexes as learnable losses: Pearson linear correlation (PLC) and Spearman rank correlation (SRC). The contributions of this paper are two folds. First, for the PLC on global level, we propose a strategy to make it robust against the outliers and regularize the key distribution factors. These efforts significantly stabilize the learning and magnify the efficacy of PLC. Second, for the SRC on local level, we propose a coarse-to-fine scheme to ease the learning of the exact ranking order among samples. Specifically, we convert the learning for the ranking of samples into the learning of similarity relationships among samples. We extensively validate our method on two typical ultrasound image regression tasks, including the image quality assessment and bio-metric measurement. Experiments prove that, with the fine-grained guidance in directly optimizing the correlation, the regression performances are significantly improved. Our proposed correlation losses are general and can be extended to more important applications.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_63

SharedIt: https://rdcu.be/cVVqk

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    1.This paper proposes two correlation-based loss functions for medical image regression tasks, where two complementary correlation indexes are explored as learnable losses. 2.The experimental results show that the simple network equipped with our proposed loss functions are effective on various medical image regression tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1,This paper proposed two novel correlation losses, which are crucial for various image regression tasks.

    1. The presentation is acceptable and the readers can follow this work easily.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1.The novelty is limited since two correlation losses have been presented before. 2.The experimental setting is insufficient and more comparison experiments and ablation studies should be designed to demonstrate the proposed method.

    1. Although the presentation is accept, it still should be improved further.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I didn’t check it

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    1.more comparison experiments and ablation studies should be designed to demonstrate the proposed method.

    1. The authors should improve the level of the presentation.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    1.The novelty is limited since two correlation losses have been presented before.

    1. The experimental setting is insufficient and more comparison experiments and ablation studies should be designed
  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The authors proposed training networks for regression tasks using two loss functions based on Pearson linear correlation (PLC) and Spearman rank correlation (SRC). Since using a pure PLC is highly sensitive to outliers, the defined loss function splits normal samples and outliers and calculates PLC only on normal ones while calculating the L2 norm on outliers. In addition, they introduced a Coarse-to-Fine optimization strategy to ease the rank learning using SRC. The proposed method has been evaluated on image quality assessment and bio-metric measurement tasks using ultrasound images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    • The study is well-motivated, and the manuscript is well-written. • The method and experiments are explained clearly. • The idea of using PLC as the loss function and making it robust to outliers, as well as providing a smoother cost function by introducing the Coarse-to-Fine optimization strategy seems interesting.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I do not see any major weaknesses as a conference paper; however, I do have some questions that I asked in the comments section.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    In the reproducibility checklist, the authors mentioned that the code has been made available or will release if this work is accepted. I think it is great if they release the code if this work is accepted.

    Unfortunately, there are no details regarding how the authors acquired data. I assume that the dataset was dedicated to this study since there are no citations or download links. If it is the case, it can be mentioned in the manuscript explicitly.

    The validations seem well-documented except for concern regarding the reported batch size, which has been explained in detail in the comments.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The samples that cause outlier predictions probably were outliers in the training set too. Could the authors comment on this assumption and explain what will happen if we simply exclude the outliers from the training set and then train the network using a correlation-based loss? And how the results will be different compared to the proposed method that splits normal and outlier predictions before calculating PLC? Although excluding some samples leads to a smaller training set, probably those samples were not so informative.

    2. The network was trained using a NIVIDIA 2080 Ti GPU, which has 11GB of memory. Since the input size was 320x320, I was wondering whether a batch size of 160 could fit in the memory. Is there any chance that 160 is a typo, where the correct batch size is 16?

    3. Could the authors please comment on why in Table 1, the NIN and SoDeep methods yield much worse AE and RE while their performance is comparable to the other methods for the other cases?

    4. It is mentioned that “to the best of our knowledge, these medical regression studies mainly focus on learning the mapping among input and output for individual samples, but ignore the learning of the structured relationships over the dataset and among the samples.” However, we know that during training phase, the network looks at the whole training set at each epoch, meaning that the trained model considers all samples together. Could the authors comment on how a model trained using regular loss functions completely ignores the relationships among the samples?

    5. I appreciate the way that the authors introduce the correlation-based loss functions only when the network passes epoch #30 (ignoring the first few epochs). Probably because they pick the top 10% of samples with the largest difference between prediction and ground truth as outliers in each iteration. Therefore, in the first few epochs, it seems the differences are too noisy to let us select the correct top 10% since the network is not stable yet, and even non-outlier predictions may show large differences with ground truth.

    6. “reduce the distribution discrepancy at at global level.” –> Please remove the extra “at”

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea is interesting, and the method and validations are well-documented.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    I checked the code provided by the authors and read the feedback, as well as other reviews where insightful comments had been provided for further improving the manuscript. Still, I think the manuscript is enough for the conference version.



Review #4

  • Please describe the contribution of the paper

    The authors propose a method using their fine-grained correlation loss for regression tasks. The method contains two parts: Pearson linear correlation (PLC) training and ranking order training. In the first part, different from using L2 loss in ordinary regression, the authors use PLC, mean, variance as loss for normal samples, and L2 loss for auto-identified outliers. In the second part, the authors propose a ranking constraint on the similarity of features. The authors use the ratio of regression label as supervision information: 1) force the similarity close to the ratio, 2) force the difference of the similarity close to the difference of the ratio. The method is validated on image quality assessment (IQA) and bio-metric measurement(BMM) tasks on ultrasound images, achieving promising results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The organization of the paper is very good.
    2. Technical novelty: a. In the first part, the regression branch, the authors use PLC between prediction and target, mean, variance as loss for normal samples. This idea is straightforward since the PLCC is an evaluation metric for regression tasks, but using L2 loss for outliers can avoid misleading, which is a simple and effective idea. b. In the second part, the similarity rank branch, the authors combine contrastive learning methods and ranking order constraints to give a coarse-to-fine learning strategy. This strategy forces the features also in ranking order as the prediction. Instead of positive and negative sampling in classification, the authors use the ratio of regression label as supervision information, and use the difference of the ratio as an adaptive margin.
    3. The visualization in the supplementary explains how the method works intuitively.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Improper wording: a. In Section 2.2, both the coarse and fine losses are not related to the SRC, so it is not proper to name it “SRC loss”. The coarse loss is an L2 loss on similarity; the fine loss is a margin-based L1 loss on the difference of similarities and the difference of ratios. b. The fine loss is similar to the margin-based loss, but there are no positive or negative examples. So it is improper to name them ‘L_pos’ and ‘L_neg’. They stand for ascent and descent constraints in the ordered tuple. c. In the equation P(xi, xj) = [R(xi)/ R(xj)], the notation R(·) probably confuses readers. In eq. (2), the y_i is used to denote the target. They should be consistent.
    2. Unsound experiments: a. There are no experiments on public datasets and other image modalities. If the proposed method works, it should work on any dataset. The authors only evaluate their method on the private dataset and ultrasound images, which seems that the authors are not confident in their methods. b. The ablations for Lpos and Lneg are unnecessary. If R(xi) > R(xj) > R(xk), then S(xi, xj) > S(xi, xk) and S(xj , xk) > S(xi, xk) should both be satisfied. Using Lpos or Lneg separately cannot guarantee xi, xj, xk in ranking order. So it is unnecessary to run the ablation for the loss. c. There are no experiments for the hyperparameter α in Eq. (3). Readers may be curious about how this value influences performance. d. There are no ablations for adaptive margin in the fine loss. Similarly, readers probably would like to know how the adaptive margin works compared with different fixed margins.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    According to the checklist, the authors will opensource their codes in the future.

    In the paper, the authors also provide enough details for reproduction.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Suggestion:

    1. Use accurate words to describe the method.
    2. Evaluate the proposed method with public datasets and other modalities.
    3. Run experiments for hyperparameter α.
    4. Run experiments with different margins.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite the novelty and interpretability of the work, the unsound experiments and improper wording lead to a score of 4.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors give additional results with different hyperparameter and on other datasets, which strength their points.

    But after reading the rebuttal and other review comments, I reviewed the manuscript again and found that the novelty of this paper is not as good as I thought before. The correlation loss is not totally new, and the ranking loss, which actually is borrowed from contrastive learning, is also existing. This fact made me downgrade my rating for this manuscript.

    I would like to give a point of 4.5 if that is possible, otherwise, I would tend to give a 5.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper studies the regression problem by using two loss functions based on Pearson linear correlation (PLC) and Spearman rank correlation (SRC). Since using a pure PLC is highly sensitive to outliers, the defined loss function splits normal samples and outliers and calculates PLC only on normal ones while calculating the L2 norm on outliers.

    This paper received mixed review ratings. A major divergence on the review evaluations of this paper is its novelty. R#2 considers the novelty of this paper is limited, while R#3 and R#4 consider using the two presented loss functions is novel. The AC lean toward the evaluations by R#3 and R#4. But the AC also does not agree that the presented method is completely new, as the correlation losses have been presented in existing works. So the authors are suggested to address this in the rebuttal.

    Other concerns of this paper include the writing of the paper and experimental settings. Please the authors read the comments of R#3 and address this in the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    9




Author Feedback

We thank all the reviewers (R) for reviewing and recognizing our work. Our novelty is clarified thoroughly. Required experiments and comparisons have been fairly conducted. Code is released and the writing will be improved.

Q1: Method Novelty. (MetaR1, R2, R4) A1: Our work has remarkable novelty regarding the methodology, results and generality. Previous methods under-estimate the effect of outlier and sample similarity relationship, and are degraded by the strong noise and ambiguity in ultrasound (US). Our novel designs include: (1) we are the first to define and adaptively identify the outliers in PLC to enhance learning. It proves to be simple but effective (R3, R4) and general across different tasks/modalities. (2) We transform the rank correlation learning into a novel similarity learning. We are the first to prove the importance of sample similarity relationship in improving the Spearman rank learning. Our coarse-to-fine scheme further smooths the learning. (3) We report the best results on two large US datasets. Efficacies and generality are also witnessed on two more modalities/tasks (Q4). We hope to inspire the community with the released code.

Q2: Code. (R3, R4) A2: Our anonymized code https://github.com/MICCAI-1280/FGCL. We implement both the direct regression and regression based 2D object detection.

Q3: Experimental setting. (MetaR1, R3, R4) A3: Details will be added in the final version: (1) For the α, large α means loose constraint for coarse rank. Small α means tight constraint and high sensitivity to noise. With the validation in IQA on 700 aortic arch (AA) US images, we got PLC=0.779 for α=0.1; PLC=0.786 for α=0.25; PLC=0.785 for α=0.5. We finally recommend α=0.25. (2) For the margin, our target attributes are continuous (R4), rather than binary class tags. The continuous and diverse sample similarity relationships need adaptive margins to match. With the 3-fold testing in IQA on AA US images, adaptive margin gets the best SRC=0.832, while SRC=0.811 for margin=0.1; SRC=0.813 for margin=0.3; SRC=0.799 for margin=0.5. We recommend adaptive margin and will add the details in final version. (3) For L_pos/L_neg ablations, our assumption is the same as R4. We only want to further quantify the impact of our bidirectional constraint.

Q4: Comparisons. (R2, R4) A4: We further extended and validated on: (a) bone age assessment in public X-ray Challenges [Ref1] with 0.7 month better than baseline, (b) 46 object detection in 2D prenatal US with 2% in mAP-50 better than baseline. Our efficacy and generality are supported by (1) released code (2) the details in journal version.

Q5: Outlier impact. (R3) A5: We define outlier in local batch level. The outlier in training set is global. Outliers in local may not be the outliers in the global. Excluding the global outliers definitely eases the training and can be complementary with our local outlier identification. Comparisons will be added in the journal version.

Q6: Data acquisition and batch size. (R3) A6: Experienced experts conducted the acquisition with US device vendors including Philips, GE and Samsung. Gestational age ranges from 20 to 30 weeks. For the batch size, the GPU consumption at 160 during training is 10561MB (10.32GB < 11GB).

Q7: Competitors. (R3) A7: SoDeep and NIN perform poor in AE and RE is because they only focus on the rank correlation among predictions and labels, while ignore the value differences. This results in high correlation values, but poor AE and RE.

Q8: Structural relationship. (R3) A8: Regular loss functions often consider basic relationships among samples, like the MSE, but pay less attention to the structural relationships, like the clustering, ranking and relative distances among samples.

Q9: Naming and writing. (MetaR1, R2, R3, R4) A9: We will revise our final version, including typos, loss function naming and statement.

[Ref1] Halabi, Safwan S., et al. “The RSNA pediatric bone age machine learning challenge.” Radiology. 2019.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Part of the concerns raised by the reviewers have been addressed by the rebuttal, which should be included in the final paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Authors have addressed most comments by reviewers, particularly that of novelty. Their rebuttal has reassured R#3 towards acceptance, where authors had been asked to focus on the rebuttal.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper seems interesting; R4 raised their score after reading the rebuttal. My only concern is that the literature review especially regarding the used task is very limited and that there would be a lot of space left to acknowledge a lot of previous work that has been done to provide automated means for the automatic estimation of biometrics in prenatal ultrasound.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



back to top