Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Aline Sindel, Bettina Hohberger, Andreas Maier, Vincent Christlein

Abstract

In ophthalmological imaging, multiple imaging systems, such as color fundus, infrared, fluorescein angiography, optical coherence tomography (OCT) or OCT angiography, are often involved to make a diagnosis of retinal disease. Multi-modal retinal registration techniques can assist ophthalmologists by providing a pixel-based comparison of aligned vessel structures in images from different modalities or acquisition times. To this end, we propose an end-to-end trainable deep learning method for multi-modal retinal image registration. Our method extracts convolutional features from the vessel structure for keypoint detection and description and uses a graph neural network for feature matching. The keypoint detection and description network and graph neural network are jointly trained in a self-supervised manner using synthetic multi-modal image pairs and are guided by synthetically sampled ground truth homographies. Our method demonstrates higher registration accuracy as competing methods for our synthetic retinal dataset and generalizes well for our real macula dataset and a public fundus dataset.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16446-0_11

SharedIt: https://rdcu.be/cVRSR

Link to the code repository

N/A

Link to the dataset(s)

CF-FA dataset: https://sites.google.com/site/hosseinrabbanikhorasgani/datasets-1/fundus-fluorescein-angiogram-photographs–colour-fundus-images-of-diabetic-patients


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposed a self-supervised learning method for multi-modal retinal image registration. The proposed method consists of feature detection using RetinaCraquelureNet and feature matching using SuperGlue.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method in general works for different imaging pairs such as CF-FA, IR-OCTA-OCT
    2. The proposed method is unsupervised/self-supervised and does not require any manually labeled ground truth for training network
    3. Overall the paper is well written with good illustrations.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The proposed method only tested on narrow-field retinal images. More challenging cases needed to be tested.
    2. While the paper validates the proposed methods on three datasets, the baseline methods are limited. See other suggested baslines in “comments for authors”
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper provides sufficient implementation details.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Overall the paper is well written and is of interest to the MICCAI community. A few comments to further improve the paper. (1) All test images are narrow-field. The transformation can be approximately modeled by homography. I suggest the authors to test on more challenging case including ultra-widefield retinal images (for example, PRIME-FP dataset https://dx.doi.org/10.21227/ctgj-1367) as secondary results.

    (2) The paper compares the proposed method with other deep learning based method such as SuperPoint and GLAMPoints. Please also consider other conventional methods for retinal image registration, such as vessel-based, intensityb-based or traditionally keypoint-based methods. [Vessel-based] Registration of multimodal fluorescein images sequence of the retina [Intensity-based] Maximize mutual information between multi-modal images [kepoint-based] A partial intensity invariant feature descriptor for multimodal retinal image registration [keypoint-based] Alignment of challenging image pairs: Refinement and region growing starting from a single keypoint correspondence

    (3) The synthetic images are obtained using CycleGAN. The generated images may contain artifacts. How robust the proposed method is regarding such artifacts?

    (4) Will the IR-OCT-OCTA dataset be publicly available?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has novelties and solves a challenging problem (self-supervised learning/multi-modal registration). The results show the improvement over prior works.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors present an end-to-end method that combines RetinaCraquelureNet and SuperGlue networks into a single system to perform multimodal retinal image registration. Training is performed with a synthetically generated multimodal dataset of retinal images, in which warpings and noise is dynamically introduced into the images. to represent image acquisition variances.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors provide an end-to-end method, combining the RetinaCraquelureNet (based on ResNet) and SuperGlue networks into a single system.
    • The method is build on top of 2 individually proved neural networks, with extra refinements and fine tuning to improve both their individual and joint performances. So while it doesn’t present a great innovation, is a push towards the good direction.
    • The paper is very clear, and has a very detailed description of the system and the experiments.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The method is trained only using synthetic data. Although this is quite understandable, given the difficulty of obtaining a large dataset of correctly manually annotated data utilizing multiple modalities.
    • Synthetic data is created using homographies. This works mostly for images with a narrow field of view. This is shown in works like [C. Hernandez-Matas, X. Zabulis and A. A. Argyros, “REMPE: Registration of Retinal Images Through Eye Modelling and Pose Estimation,” in IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 12, pp. 3362-3373, Dec. 2020, doi: 10.1109/JBHI.2020.2984483]. where it was shown than utilizing an eye model is more accurate than simple homography for images with a FOV over 40º
    • While the method is geared to multi-modal registration, it would be interesting to see performance in single mode registration, as there exists at least one publicly available for evaluation registration performance on fundus images, the FIRE dataset that was also utilized in the paper mentioned above
    • Color retinal dataset utilized contain only low resolution images (576 × 720), when it’s almost been a decade where high resolution images (at least 2500x2500) are widely available, so such low resolution images should not be considered enough anymore.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • System is very well defined and detailed, although not publicly available.
    • Dataset is very well defined an detailed, although not available at this time (although they indicate it will be available upon acceptance)
    • Training and execution dataset partitions and epochs utilized is well defined.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    I would like to congratulate the authors on the great description of the system, the dataset creation and the experiments. I consider this paper to be a good contribution to the field, even if not highly innovative. The main weaknesses I see are mostly related to the synthetic data utilized for the training of the method.

    • Synthetic data is created using homographies. This works mostly for images with a narrow field of view, more complex warping, or if possible the utilization of an eye model would provide more realistic transformations, particularly when working with standard FOV images.
    • Color retinal dataset utilized contain only low resolution images (576 × 720), when it’s almost been a decade where high resolution images (at least 2500x2500) are widely available, so such low resolution images should not be considered enough anymore. I think the authors should aim for an upgrade of those images towards that direction. Additionally, while not strictly necessary, a nice piece of data to show the performance of the registration method in single mode images would be to utilize the FIRE dataset for testing, as there are results of competing methods readily available.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The system seems to be robust, and a push forward in the good direction. However, it’s not highly innovative, as it mainly relies on 2 already established neural networks with some fine tuning on top. The utilization of homography as the only type of warping for the creation of the synthetic dataset and the utilization of low resolution RGB images are weak points, but not enough to prevent the acceptance of this submission.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors proposed to extract convolutional features from the vessel structure for keypoint detection and description and used a graph neural network for feature matching to achieve the multi-modal retinal image registration.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method is novel, the manuscript logic is clear, and the verification experiments are abundant.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The interpretation of symbols in the formula needs to be described in more detail.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method is reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. In the abstract, it is suggested to write quantitative experimental results.
    2. What is the meaning of the colored points in the image in the rightmost box in Fig. 1?
    3. What are the shortcomings of the existing methods introduced in the Introduction?
    4. What does x_ai and x^_pi represent in Formula (2), and what does d(x_ai, x^_pi) represent?
    5. What do N and M mean in Formula (3)? Do the M on the left and the M on the right have the same meaning?
    6. How are the hyperparameters in the framework determined?
    7. What is the reason for using different assessment measures in different experiments?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The research is meaningful, the logic is clear, and the experimental results are convincing.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Multi-Modal Retinal Image Registration Using a Keypoint-Based Vessel Structure Aligning Network

    This submission tackles the multi-modal rigid registration of vessel imaging in ophthalmology. Its originality resides in jointly learning the keypoint detection and the keypoint matching using mixed convolutional and graph convolutional modules. Training is performed on synthetic data using homographies and by fusing two existing pre-trained network. Evaluation is on a private multi-modal datasets. While the methodological novelty may be considered limited, all reviewers indicate that despite its lack of true novelty, the proposed rigid approach is a good contribution to the field of ophthalmology due to the clearly demonstrated improved performance with the comparable approaches in the field. The reviewers have made a few comments inquiring on specific limit cases (large field of view images, mono-modal settings), but these may not change the overall appreciation of the paper. To be possibly considered in a further extension. For all these reasons, the recommendation is therefore towards Acceptance.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2




Author Feedback

We thank all reviewers (R1-R3) and the meta reviewer (MR) for their positive feedback and constructive comments. We will revise our manuscript accordingly. In the following, we would like to clarify some points raised by the reviewers: -Dataset: For our work, we used two multi-modal retinal datasets, the public CF-FA dataset and our own non-public IR-OCT-OCTA dataset. Based on the images of both datasets, we created a synthetically augmented multi-modal dataset by translating real images to another modality using CycleGAN. We evaluated our method using the synthetic and both real datasets. We thank R1, R2 and MR for their suggestions of the two additional public retinal fundus datasets (R1: PRIME dataset (ultra-widefield CF-FA), R2: FIRE dataset (high-resolution single mode CF)) for a possible further extension of our work in the future. -About the robustness of our method regarding potential artifacts in the synthetic images generated using CycleGAN (R1): The generated images may contain some artifacts, but by visual inspection, we could confirm that the vessel structures are mapped to the same positions in the generated image and in the content image, which is important for the registration task. Due to the combination of a real image with a synthetic image as a registration pair, the network sees both real and fake images during training and thus learns both data distributions. In our experiments, our method generalized well for the real retinal image test datasets. -Comparison methods: R1 suggested to also include conventional methods for retinal image registration, such as vessel-based, intensity-based, and traditionally keypoint-based methods for the comparison. We would like to clarify that we have already included a vessel segmentation based method and a conventional keypoint-based method in our experiments, but we would like to thank R1 (conventional vessel-based, mutual information, region growing) and R2 (REMPE) for their additional suggestions. We have tested RootSIFT keypoints and descriptors based on a vessel segmentation obtained by UNet and have tested the conventional SURF+PIIFD+RPM method. Due to the bad results of SURF+PIIFD+RPM for CF-FA and its failure for IR-OCT-OCTA, we only included the CF-FA results of SURF+PIIFD+RPM in the supplementary material with a reference in the main manuscript. -What are the shortcomings of the existing methods introduced in the introduction (R3)? The described methods either separately address the different parts of the registration pipeline (feature extraction and description and matching) or are not specifically designed for the multi-modal retinal registration task. Our self-supervised end-to-end multi-modal registration method jointly learns a CNN for cross-modal keypoint detection and description of the retinal vessel structure and a graph neural network for descriptor matching. Both parts of our pipeline are jointly optimized and our end-to-end method shows superior performance to competing methods. -How are the hyperparameters determined (R3): They were experimentally determined by evaluating different learning rates, batch sizes, number of keypoints on the validation set. -Thanks to R3 for the suggestions about refining the description of Fig 1 and of the mentioned symbols of equations 2 and 3, which we will revise in the manuscript. In equation 2, x_ai refers to coordinates of the anchor of the positive pair and x^_pi to the transformed coordinates of the positive counterpart and d(x_ai,x^_pi) respectively is the reprojection error of the positive pair using the Euclidean distance d(x,y). In equation 3, M is used twice, we will change one of the symbols to avoid any confusion. -Thanks to R3 for raising the question about the different assessment measures that we used for the synthetic and real datasets. The reason is that we only have ground truth homographies for the synthetic dataset and only have manually labeled 6 control point pairs for the real datasets.



back to top