Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Shijia Zhou, Euijoon Ahn, Hao Wang, Ann Quinton, Narelle Kennedy, Pradeeba Sridar, Ralph Nanan, Jinman Kim

Abstract

The measurement of fetal thalamus diameter (FTD) and fetal head circumference (FHC) are crucial in identifying abnormal fetal thalamus development as it may lead to certain neuropsychiatric disorders in later life. However, manual measurements from 2D-US images are laborious, prone to high inter-observer variability, and complicated by the high signal-to-noise ratio nature of the images. Deep learning-based landmark detection approaches have shown promise in measuring biometrics from US images, but the current state-of-the-art (SOTA) algorithm, BiometryNet, is inadequate for FTD and FHC measurement due to its inability to account for the fuzzy edges of these structures and the complex shape of the FTD structure. To address these inadequacies, we propose a novel Swoosh Activation Function (SAF) designed to enhance the regularization of heatmaps produced by landmark detection algorithms. Our SAF serves as a regularization term to enforce an optimum mean squared error (MSE) level between predicted heatmaps, reducing the dispersiveness of hotspots in predicted heatmaps. Our experimental results demonstrate that SAF significantly improves the measurement performances of FTD and FHC with higher intraclass correlation coefficient scores in FTD and lower mean difference scores in FHC measurement than those of the current SOTA algorithm BiometryNet. Moreover, our proposed SAF is highly generalizable and architecture-agnostic. The SAF’s coefficients can be configured for different tasks, making it highly customizable. Our study demonstrates that the SAF activation function is a novel method that can improve measurement accuracy in fetal biometry landmark detection. This improvement has the potential to contribute to better fetal monitoring and improved neonatal outcomes.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43990-2_27

SharedIt: https://rdcu.be/dnwLC

Link to the code repository

https://github.com/DasuberVetLeonidas/SwooshActivationFunction

Link to the dataset(s)

https://hc18.grand-challenge.org


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper has proposed a Swoosh Activation Function (SAF) based regularization to optimize predicted heatmaps in landmark detection based algorithms. SAF regularization is architecture-agnostic, and has shown potential in improving accuracy in the measurement of of fetal thalamus diameter (FTD) and fetal head circumference (FHC).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper has introduced regularization function, that includes a new function, named as Swoosh Activation Function (SAF). Particularly, this function measures the similarities between the predicted heatmap and ground truth heatmap, and tries to reduce the discrepancy between these two heatmaps through regularization. This can be incorporated in different architectures and different applications, that involve landmark detection.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The proposed SAF function (along with the choice of coefficients and optimum MSE) is rather arbitrary and its contribution in improving the measurement accuracy was not clearly demonstrated. The regularization function contains MSE functions besides SAF, so it was not evident which terms were dominating in the regularization.

    The coefficients and the optimum MSE value seem random. It was not clear how the optimum MSE was computed: only a single value was provided. A range of values indicated by an average and standard deviation will be more compelling.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The datasets used are publicly available. The function parameters are explained adequately ensuring reproducibility of the work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The formulation is arbitrary and lacks proper justification. Authors should mention what x-axis and y-axis indicate. Also, how the coefficients a, b and c are determined.

    2. “the optimal MSE between a pair of predicted heatmaps is 0.0061”: it wasn’t clear if this will be applicable for any fetal images, or particularly for fetal thalamus. Is it possible that for a case with clinical abnormality, the landmarks don’t match with the ones representative of normal cases? Will it be still relevant to use a SAF function to optimize the landmarks to lock the MSE to the given value of 0.0061?

    3. “The optimal MSE between a predicted heatmap and a zero matrix is half of the optimal MSE between a pair of predicted heatmaps.” : the statement was confusing. Also, SAF regularization includes two terms to regularize the MSE between each predicted heatmap and a zero matrix. It wasn’t clear why the difference with zero matrix will be relevant.
    4. “SAF was configured by first computing the optimum MSE between predicted heatmaps, which was 0.0061 for both the FTD and HC18 datasets”: please explain how optimum MSE was computed? Is it an average measure? What is the standard deviation?

    5. The SAF regularization terms include 5 terms, among which the first two terms are MSE between the ground truth and the predicted heat map. It will be very interesting to separate the contribution of the pure SAF terms in improving the measurement accuracy.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea of using an empirical function to predict optimum heatmaps is interesting: the formulation and description of the functions can be improved. The coefficients and the optimum MSE value seem random. It was not clear how the optimum MSE was computed. If these limitations are addressed, it will be an interesting work for fetal biometric application.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a simple and efficient mechanism for improving the performance of paired landmark detection, useful in tasks such as biometric measurements, by adding a novel regularization term that penalizes large deviations from expected mean-squared error values for the heatmaps corresponding to those landmarks. They test their approach on a fairly challenging landmark detection problem - the fetal thalamus diameter prediction. They compare their performance against the state-of-the-art BiometryNet by adding their regularization term to the loss function of BiometryNet. Their approach seems extensible and generalizable since it’s simply an augmentation to the loss function.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors propose a very simple extension to the BiometryNet model - the Swoosh activation function - that seems to improve the performance of the model significantly. It is rare to find such a simple and effective strategy to improve model performance in medical imaging. The experiments are extensive and overall this is something many readers can easily grasp and apply to their own models - so the impact can be significant.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It’s not a very ‘major’ weakness but the motivation for this work could be better articulated. For instance, they write that “BiometryNet is inadequate for FTD and FHC measurements because of its inability to account for fuzzy edges of these structures and the complex shape of the FTD structure“. Well, this is generally true for a lot of medical imaging - particularly with Ultrasound imaging, with lower spatial resolution than CT for instance. To me, it seems like this method is useful in any scenario (even other imaging modalities), where two landmarks can appear very visually/structurally similar. The regularization forces the neural network to learn to distinguish them.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    One dataset seems private and another public. Some clarification on whether or not the private set can be made available would be helpful. The model setup and training seems reasonably simple enough to replicate. The regularization term itself should be easy to implement and add to any existing modeling frameworks. Would be even better to have access to the full code in the final paper. Overall, I’d score them ‘good’ in terms of reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • I was wondering how this method generalizes to landmark detection problems that aren’t necessarily paired measurement landmarks - e.g., detecting landmarks of the heart (e.g., mitral valve, mitral annulus, aorta, LV apex, etc., depending on the 2D plane of interest). Also if there are more than 2 landmarks how would you recommend handling those. I think something like this could still be beneficial but I am not sure how to handle a large number of landmarks since there would be a very large number of combinations of pairs of landmarks.
    • It wasn’t obvious if you replace dynamic orientation detection (DOD) or just leave it as is and add the regularization term on top of it. In your critique of DOD, you mention that DOD isn’t effective due to high SNR, but as I mentioned above this is not a strong critique. If you’re replacing DOD, could you clarify why the regularization works better. If you’re not replacing it, there’s no need to critique it so strongly. You could provide a reason why your method is even better for orientation determination.
    • Why do you call the swoosh an ‘activation’ function? You’re not using this as a replacement for ‘relu’ or ‘sigmoid’ etc. You’re adding this as a regularization term to the loss function. I guess, a function is just a mathematical operation at the end of the day and whether we call it activation or regularization is not really that crucial and depends on how we end up using it. But still, using the right term may help increase the impact of your work.

    Minor errors (that I discovered)

    • Fig 2: annotationed -> annotated
    • 2.4, preprocessing: ‘aspect ration’ -> ‘aspect ratio’
    • Fig 4, 2nd last line: ‘Leftm’ -> ‘Left’
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper presents a nice idea with good results. Its very simple and easily extensible. Also, what they’re trying to do is self-explanatory to some extent. Many other researchers could easily implement this and hopefully improve their results as well. I’ve deducted some points because they could motivate their idea better. They could also present some ideas on how to extend this to related problems - keypoint regression, when there are more than 2 landmarks, etc.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors proposed a novel Swoosh Activation Function designed to enhance the regularization of heatmaps produced by landmark detection algorithms.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Good organization of the paper
    • Authors proposed a novel swoosh activation function
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • There is a only one contribution of the paper, I am not sure that there are enough contribution to the top-tier conference in the medical image computing field.
    • My main concern is that paper is lack of state-of-the-art works. The authors were focused on the only one past paper - BiometryNet. There a lot of open-source code with different approaches for fetal biometry.
    • They do not provide any details about annotation protocol.
    • There are not proper comparison between state-of-the-art methods (only EfficientNet, and BiometryNet)
    • Lack of explainability of the proposed method
    • Bad quality of the plain text, the authors should to proofread the paper before submission
    • There are no ablation study
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    There are only hyperparameters inside the paper. I can’t see any relevant link to the training code and the dataset.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The authors should focus more on comparison to the state-of-the-art methods, not only to landmark detection-based approaches. Additionally, the authors should to improve the quality of the plain text before final submission.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    There a lack of contribution to the paper, lack of comparison with other state-of-the-art results, lack of presentation the results with other SOTA models and ablation study.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    2

  • [Post rebuttal] Please justify your decision
    1. First of all, the authors did not address my concerns properly.
    2. Secondly, presentation of the results is poor. We are not able to say that this method outperforms the previous methods since we do not have related works addressed properly.
    3. The authors just show the results with BiometryNet (and their variants), and EfficientNet. This is not well explored.
    4. There are still no ablation study.
    5. I found several papers [1, 2, 3, 4] with better results for fetal head circumference (based on publicly available HC18 dataset). They did not mention about it.
    6. There are lack of information about in-house dataset, annotators, annotation protocol.

    [1] Wang, Jinting, et al. “Ellipse guided multi-task network for fetal head circumference measurement.” Biomedical Signal Processing and Control 82 (2023): 104535. [2] Fiorentino, Maria Chiara, et al. “A regression framework to head-circumference delineation from US fetal images.” Computer methods and programs in biomedicine 198 (2021): 105771. [3] Wang, Xin, Weibo Wang, and Xiaodong Cai. “Automatic measurement of fetal head circumference using a novel GCN-assisted deep convolutional network.” Computers in Biology and Medicine 145 (2022): 105515. [4] Moccia, Sara, Maria Chiara Fiorentino, and Emanuele Frontoni. “Mask-R 2 CNN: a distance-field regression version of Mask-RCNN for fetal-head delineation in ultrasound images.” International Journal of Computer Assisted Radiology and Surgery 16.10 (2021): 1711-1718.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper describes anew activation function for landmark prediction with deep learning networks based on regularization to optimize predicted heat maps. It is demonstrated on fetal ultrasound images. Experimental results on a significant dataset show the advantages of using this function over BiometryNet. The reviewers have raised insightful and relevant issues which the authors are advised to address. The most relevant are:

    1. Clarify how the how the optimum MSE was computed.
    2. Better articulate the motivation of the work wrt to BiometryNet. Refer the reader to the original paper for details and a review of literature.
    3. Explain how the how the coefficients a, b and c are determined.
    4. Comment on how method can be useful for other measures and imaging modalities, where two landmarks can appear very visually/structurally similar




Author Feedback

We appreciate the valuable feedback provided by the reviewers and the meta-reviewer. Below are our point-by-point responses addressing their comments:

Comment 1 (C1), Determination of optimum MSE (MR, R1): In our study, we determined the optimum mean squared error (MSE) based on the size of the Gaussian distribution that represent landmarks in each of the ground truth heatmap (Fig. 3 in our paper). Each landmark was a 19x19 matrix drawn from a Gaussian distribution centered at the landmark coordinates with the peak assigned to 1. The MSE between the two ground truth heatmaps were 0.0061. This dataset configuration follows the standard implementation used in human pose estimation landmark detection (Xiao et al., 2018). Fig. 3B and C demonstrate how deviations from this optimum MSE between the predicted heatmaps can lead to incorrect and noisy heatmaps. We have revised the manuscript to provide a clearer description of the ground truth heatmap generation process and the calculation of the optimum MSE values.

C2, Motivation and implementation with respect to BiometryNet (MR, R2): Avisdris et al. (2022) proposed BiometryNet ([7] in our paper) to address various fetal orientations in 2D ultrasound (2D-US) images, and it has shown great performances in measuring fetal skull and femur bone biometrics and outperformed other landmark-based methods. However, BiometryNet cannot be directly used to measure fetal thalamus diameter (FTD) due to the complex shape of the guitar-shaped structure (GsS) used for FTD measurement, resulting in inaccurate localization of landmarks in our experiments. To address this limitation, we propose a Swoosh Activation Function (SAF) to regularize the heatmaps predicted by BiometryNet. By adding SAF, we enforce the MSE between pairs of predicted heatmaps and between a predicted heatmap and a zero matrix to be close to optimal MSE values. This approach reduces the scattering of hotspots in the predicted heatmaps and prevents similar areas from being highlighted in pairs of heatmaps. We have revised the manuscript to provide a clearer explanation of our motivation and have appropriate references to the BiometryNet paper.

C3, Determination of coefficient a, b, and c (MR, R1): In the revised manuscript, we have provided a clarified description of the coefficients: Coefficient a determines the slope of the SAF function around its minimum point in Quadrant 1 of the Cartesian coordinate system (x > 0), where the x-coordinate of the minimum point corresponds to the optimum MSE values. The slope of the SAF determines its regularization strength and is task-dependent, as our proposed methods’ performance negatively correlates with stronger SAF whereas the correlation is positive in HC18 dataset. Coefficient b is deduced using Equation 1 (Supplementary data), which utilizes the x-coordinate of the minimum point and coefficient a. Coefficient c is determined by Equation 2 (Supplementary data) to ensure that the value of the Min term in SAF is 0.001.

C4, Generalizability to other imaging modalities (MR, R2): Our proposed SAF regularization method should work in other imaging modalities that require pair-wise landmark detection e.g., detecting mitral and aortic valve in 2D-US images of heart and detecting cranial sutures in CT images of skull. In our formulation, there is no limitation regarding the imaging modality.

C5, Lack of contribution (R3): Our manuscript introduces a novel activation function SAF for landmark detection problems, specifically addressing the issue of predicted heatmaps highlighting similar areas. We demonstrate that SAF can effectively enforce an optimal level of dissimilarity between pairs of predicted heatmaps. Our study is the first to explore this innovative approach, highlighting its potential for improving landmark detection algorithms. We have provided additional justification for our contribution in response to comment 2 (C2).




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors addressed some of the issues raised by the reviewers. However, Reviewer 3 is still negative, and based on additional research, he found methods with similar or better performance. So overall, I think the novelty is indeed limited despite the two other more positive reviews.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I think the authors addressed the remaining concerns well and the idea is interesting within a very relevant application domain. it should be accepted



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have addressed all major comments summaries by the metareviews. The paper has merits and addresses a key problem. The authors make comparison with the SOTA in fetal biometry using landmarks *i.e. BiometryNet) which is the most relevant method to the proposed approach as well. An accept is recommended for this work



back to top