Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Nil Stolt-Ansó, Julian McGinnis, Jiazhen Pan, Kerstin Hammernik, Daniel Rueckert

Abstract

Segmentation of anatomical shapes from medical images has taken an important role in the automation of clinical measurements. While typical deep-learning segmentation approaches are performed on discrete voxels, the underlying objects being analysed exist in a real-valued continuous space. Approaches that rely on convolutional neural networks (CNNs) are limited to grid-like inputs and not easily applicable to sparse or partial measurements. We propose a novel family of image segmentation models that tackle many of CNNs’ shortcomings: Neural Implicit Segmentation Functions (NISF). Our framework takes inspiration from the field of neural implicit functions where a network learns a mapping from a real-valued coordinate-space to a shape representation. NISFs have the ability to segment anatomical shapes in high-dimensional continuous spaces. Training is not limited to voxelized grids, and covers applications with sparse and partial data. Interpolation between observations is learnt naturally in the training procedure and requires no post-processing. Furthermore, NISFs allow the leveraging of learnt shape priors to make predictions for regions outside of the original image plane. We go on to show the framework achieves dice scores of 0.87 ± 0.045 on a (3D+t) short-axis cardiac segmentation task using the UK Biobank dataset. We also provide a qualitative analysis on our frameworks ability to perform segmentation and image interpolation on unseen regions of an image volume at arbitrary resolutions.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_70

SharedIt: https://rdcu.be/dnwEn

Link to the code repository

https://github.com/NILOIDE/Implicit_segmentation

Link to the dataset(s)

https://www.ukbiobank.ac.uk/


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes an auto-decoder process for approximating the latent representation of previously unseen subjects based on their pairs of coordinate-image intensity values. The proposed framework allows for sampling of intensity and segmentation predictions from arbitrary coordinates in the volume. The method is evaluated on the UK-Biobank cardiac MRI short-axis dataset, and its segmentation scores and generalization properties are investigated.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper introduces a novel auto-decoder process for approximating the latent representation of previously unseen subjects using their coordinate-image intensity values.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    However, the approach is similar to other related works that utilize neural implicit functions (NIF) for medical image segmentation. While the paper demonstrates robust results, it lacks comparison with other related works, making it challenging to evaluate the extent of improvement made by the proposed method.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper is written clearly and has enough information to reproduce

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Compare results with similar NIF methods Authors can also discuss this paper in the introduction “Khan, Muhammad Osama, and Yi Fang. “Implicit Neural Representations for Medical Imaging Segmentation.” In Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part V, pp. 433-443. Cham: Springer Nature Switzerland, 2022.”

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes an auto-decoder process for approximating the latent representation of previously unseen subjects based on their pairs of coordinate-image intensity values. The proposed framework allows for sampling of intensity and segmentation predictions from arbitrary coordinates in the volume.
    The work shows improvement over other NIF methods and has promise to improve the performance.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper
    1. The authors propose NISF that addresses CNN limitations in handling sparse or partial inputs, high-dimensional domains, and different voxel spacing, by assuming a continuous space representation.
    2. The method learns a mapping from real-valued coordinate space to shape representation for segmentation in continuous spaces.
    3. NISF demonstrates valid performance on a cardiac segmentation task using the UK Biobank dataset.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper introduces a novel and interesting approach by applying neural implicit functions to the segmentation task.
    2. The assumption that the latent code optimized for image reconstruction under f_\phi produces accurate segmentation under f_\theta is both intuitively and empirically valid.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Although the approach is interesting, further experiments, such as comparing with a standard U-Net, would strengthen the results.
    2. Testing on publicly available datasets, like ACDC [1], would also be beneficial.
    3. Concerns arise regarding the normalization of coordinates to the range [0, 1] based on voxel’s relative position, as this method does not reflect voxel spacing information for MLP decoding. O. Bernard, A. Lalande, C. Zotti, F. Cervenansky, et al. “Deep Learning Techniques for Automatic MRI Cardiac Multi-structures Segmentation and Diagnosis: Is the Problem Solved ?” in IEEE Transactions on Medical Imaging, vol. 37, no. 11, pp. 2514-2525, Nov. 2018
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper appears to be reproducible, as it utilizes a publicly available dataset and presents a method that is relatively straightforward to implement. However, it would be better to have other datasets with segmentations from human experts.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    See 5 and 6.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The novelty of applying neural implicit functions for cardiac segmentation is promising and shows potential for other tasks.
    2. However, the experiments presented in the paper are limited and could be expanded for more comprehensive evaluation.
  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors present a method for medical image segmentation based on neural implicit functions (NIF). In contrast to existing neural network-based methods such as U-Nets and related methods that learn an image-to-image mapping, NIFs learn a mapping from real-valuded voxel coordinates to voxel intensities (e.g., segmentation classes). This makes the representation independent of the size and sampling grid of the input image, and naturally handels missing data, and anisotropic and irregular sampling stategies. The method was evaluated in the context of cardiac MR chamber segmentation showing promissing results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors present a novel approach for medical image segmentation based on neural implicit functions (NIFs). Although NIFs or the related compositional pattern-producing networks (CPPNs) have been described at least since 2007 [1] they have only recently shown practical use for modelling shape variations and image segmentation, and the full potential of those methods are still to be explored. This paper is an important milestone in popularizing these methods and will foster much needed further research in this area.

    The paper is well written and clearly states limitations and assumptions of the current method. E.g., “we make the following assumption: A latent code h optimized for image reconstruction under fφ will also produce accurate segmentations under fθ”. This is a key assumption of the method, which might not hold. Experimental results show that this assumption might indeed hold for the studied task, but the assumption requires further investigation. Stating it clearly helps in establishing new research directions, which is a key strength of the paper.

    While a lot of research as gone into understanding U-net-like CNNs for image segmentation, we know very little what NIFs are capable of and this paper shows the potential for image segmentation, while learning a shape prior and matching it to a given image.

    [1] Stanley, Kenneth O. “Compositional pattern producing networks: A novel abstraction of development.” Genetic programming and evolvable machines 8 (2007): 131-162.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Not quite clear how far we are to see practical use. It’s by no means necessary to demonstrate results that are close to or improve upon the state-of-the-art, but knowing the gap in terms of accuracy and running time would provide guidance in how the field needs to move forward. How long does it take to infere an entire image? Is it practical already or does it require further research? Running backpropagation to find h might be costly and evaluating every point as well. How does this method compare with CNN-based approaches in terms of running time? Making a statement about limitations due to running time might foster further research in this area and stating a challenge is as important as stating advances when presenting a new methodology.

    Same is true for segmentation accuracy. The paper lacks a comparison to the state-of-the-art. While the numbers look promising and reasonable. Understanding the numbers requires some general understanding of the field. E.g., what are typical Dice scores for chamber segmentation in cardiac MR? A direct comparison to freely available out-of-the-box segmentation methods, such as the nnU-Net, would make it easier for the reader to relate the presented method to current methods. Again, no need to get close to the performance of the nnU-Net but important to understand the gap.

    The proposed network architecture is not described in sufficient detail. While the overall class of network is stated (MLP) and the choice of activations functions are motivated, it’s not clear how many layers the network has and how many hidden units in each layer. This would be beneficial to understand the complexity and size of the method and is also needed to reproduce the results in order to build on them in further research.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    As stated in the weaknesses, the network architecture is not fully described.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Using all voxels of a time frame for training seems expensive. Neighboring voxels might contain a lot of redundant information. Exploring sparse sampling schemes might further help in the future to reduce training time.
    • “CNNs are restricted to data in the form of grids, and cannot easily handle sparse or partial inputs”. It’s true that CNNs don’t handle sparse or partial inputs naturally, but they can be extended to do so. In the end, a loss als calculating for every voxel and a mask can be employed to remove the contribution of missing voxels to the overall loss.
    • What are typical sizes (d) of the latent representation vector h?
    • How does the paper relate to the work of Khan et al. [1], who published a paper with the title “Implicit Neural Representations for Medical Imaging Segmentation” last year at MICCAI? I’m actually torn if not mentioning this paper should be regarded as a major weakness. On the one hand, the title of the paper promises a NIF-based approach for image segmentation, which is the main contribution of this paper, so a natural question is “what is the contribution of this paper in light of last year’s paper?”. Of novelty is the main strength, is this paper really novel when a paper with a similar approach was already published last year? I think that this paper is still novel, because the title of [1] is misleading. In fact, they did not describe a NIF or implicit neural representation method in their paper. If I understand their method correctly, the coordinate of an image is never part of the input of any function in the approach, which is the key idea of NIFs. Instead, they merely use a CNN to extract image features, use the coordinate of a voxel for feature vector look-up and then use a decoder on the feature vector (and not on the coordinates) to predict the class. I’m not sure if this counts as related work, because the actual method is much different. However, the title of the paper [1] suggests as NIF method, so it might still be a good idea to clarify what the differences are, especially because both papers are submitted to the same conference.
    • “Coordinates are normalized to the range [0, 1] based on the voxel’s relative position.” How does the normalization work when the field of view changes? E.g., when the method was trained on large field of views and applied to small ones. This changes the meaning of the normalized coordinates and might effect the output.
    • How is the intensity normalization done? Is the minimum and maximum of the input mapped to [0, 1] or are more robust percentiles used?Are values standardized using the mean and standard deviation? Pre-processing can have a big impact on the final result.
    • Reference 20 and 21 are the same paper.

    [1] Khan, Muhammad Osama, and Yi Fang. “Implicit Neural Representations for Medical Imaging Segmentation.” Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part V. Cham: Springer Nature Switzerland, 2022.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the method is very novel for medical image segmentation and will inspire and foster further research in this area. The main weaknesses are around some details of the method. But for this paper, all details are less relevant, because it’s not incremental progress over another method, why the fine details usually play a large role. It’s about a paradigm-shift in how to do segmentation and it’s more important to understand the main idea and potential limitations and assumptions. And those are provided by the paper.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    All authors are positive about the novelty and presentation of the work. One reviewer mentioned the comparison experiments with unet. Congratulations!




Author Feedback

We would like to thank the reviewers for their acknowledgements on the novelty that neural implicit functions bring to the field of image segmentation. It excites us to see that our appreciation for the potential research avenues offered by this field is shared. The constructive criticism provided was fair and well-informed. We address the highlighted issues below:

Abbreviations: INR = Implicit Neural Representation NIF = Neural Implicit Function Note: the terms “function” and “representation” are often used interchangeably in the literature.

(1) Lack of mentioning “Implicit Neural Representations for Medical Imaging Segmentation” by Khan et al. (R1, R2): As pointed out by R2, the paper by Khan et al. does not describe an INR as defined in the literature. Their model segments on a continuous space via interpolation of a convolution feature grid and has no awareness of the predicted point’s coordinate. However, in our paper’s “Related Work” section, we claim that “… segmentation performed by CNNs … requires post-processing heuristics to extract smooth object surfaces.”, while Khan et al.’s work proves otherwise. We acknowledge that a reference to Khan et al. should be included based on this merit.

(2) Lack of baselines (R1, R2, R3): Despite the shortcomings of convolutional approaches listed in our paper, given the vast amount of attention they receive in deep learning research, matching segmentation performance was not our primary goal. A relevant cardiac segmentation paper by [1], reports Dice scores of (0.94, 0.88, 0.90) on a similar segmentation task. Given our method’s Dice scores of (0.90, 0.82, 0.88), we did not want the overly metric-focused trend of our literature to divert attention from our paper’s main contributions, those being: the ability to predict out-of-plane regions of a volume, the handling of arbitrary input data topologies, and the scalability to high-dimensional domains, all of which cannot easily be measured against a convolution-based baseline. To the best of our knowledge, there are no similar NIF approaches that create mappings from image to segmentation (see section 1 of this rebuttal).

(3) Architecture details and inference time (R2): The MLP consisted of 8 layers, 128 units each. Latent vector had 128 trainable parameters. The optimization at inference took 3-7 minutes, while the final segmentation sampling took 1-4s (depending on volume shape).

(4) Justification for image normalization (R2): The UK-Biobank has a standardized MR scanner across all subjects. The same types of tissues, as well as the background, are expected to share intensity values throughout the dataset. We assume all volumes to roughly share the same range of intensities and perform a simple min-max normalization to the range [0,1], as performed in similar literature [1]. Normalization approaches involving mean and variance were avoided as consistency is not guaranteed across the population.

(5) Concerns regarding coordinate normalization (R2, R3): In cardiac MR imaging, a radiologist intrinsically aims to consistently center, orient, and scale the object of interest within the volume. Due to variations in real-world coordinate systems across subjects, relative voxel coordinates were preferable. Nonetheless, we recognize issues may arise from a [0,1] normalization of the coordinate system due to the variation in volume side ratios across subjects. However, we found these variations were not strongly pronounced. Additionally, we expect the latent prior to be capable of modeling minor aspect ratio deformations as a subject-specific feature. Assuming a NIF correctly models the segmentation of a distorted image, that segmentation can be mapped back to the image’s pre-normalization coordinates, undoing deformations introduced by normalization.

[1] - Bai, W. et al.: Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. Journal of Cardiovascular Magnetic Resonance 20(1), 1–12 (2018)



back to top