Authors

Changwei Wang, Rongtao Xu, Shibiao Xu, Weiliang Meng, Xiaopeng Zhang

Abstract

Since the morphology of retinal vessels plays a pivotal role in clinical diagnosis of eye-related diseases and diabetic retinopathy, retinal vessels segmentation is an indispensable step for the screening and diagnosis of retinal diseases, yet it is still a challenging problem due to the complex structure of retinal vessels. Current retinal vessels segmentation approaches roughly fall into image-level and patches-level methods based on the input type, while each has its own strengths and weaknesses. To benefit from both of the input forms, we introduce a {Dual Branch Transformer Module} (DBTM) that can simultaneously and fully enjoy the patches-level local information and the image-level global context. Besides, the retinal vessels are long-span, thin, and distributed in strips, making the square kernel of classic convolutional neural network false as it is only suitable for most natural objects with bulk shape. To better capture context information, we further design an Adaptive Strip Upsampling Block (ASUB) to adapt to the striped distribution of the retinal vessels. Based on the above innovations, we propose a retinal vessels segmentation Network with Dual Branch Transformer and Adaptive Strip Upsampling (DA-Net). Experiments validate that our {DA-Net} outperforms other state-of-the-art methods on both DRIVE and CHASE-DB1 datasets.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_51

SharedIt: https://rdcu.be/cVRsj

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The authors of this paper propose a novel architecture for retinal vessel segmentation by combining image and patch level information in a joint encoder-decoder model. Their approach combines regional and global features from the encoder and introduce them to a transformer module. The transformer output is then decoded with an adaptive, line like, stip upsampling block. Compared to the existing work, they introduce the transformer module to fuse features from two different scales and they modify the standard decoding part of the existing architectures to take into account more context from the regions along the length of the vessels. The authors validate their approach on 2 widely used public databases. Compared to existing methods, the method outperforms existing work in the literature.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) Methodology: They propose a novel architecture that fuses regional and global information using transformers. To the best of my knowledge, it is the first application and intergration of a transformer module in a encoder-decoder like architecture. Related work that is inspired from U-net and Transformers is the PCAT-Unet model presented by Danny Chen et al. [Ref1] for the retinal vessel segmentation. For the decoding part, they replace the standard square kernel with oriented line detectors that take into account context along the length of the vessels.
2) Reproducibility: The authors provide the necessary tools to the MICCAI community to be able to reproduce the experiments and results. They provide the code implementation with the necessary dependencies, the evaluation of their method, the dataset that was used in the study. Also the previous are supplemented with the trained weights of the proposed model so the community can replicate exactly the algorithm.

[Ref1]: Danny Chen et al. “PCAT-Unet: UNet-like network fused convolution and transformer for retinal vessel segmentation”, PLOS one, 2022
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1) The use of the proposed upsampling block (ASUB) is not justified enough. It is not clear how well the stip upsampling block can capture the regional context in more complex vascular features, like bifurcations, where the vessel deviates from a line like shape. Another case where the use of the ASUB is not justified enough is in the smallest vessels case. The smallest vessels deviate significantly, in terms of shape, from a straight line like structures approximating curvilinear shapes. The pathology also can change their appearance making them more tortuous. 2) The difference in the segmentation performance between the proposed method and the existing work is relatively close (see CHASE-DB1 dabase, Table 1). The standard deviation is not provided. Also, the authors do not provide statistical significance analysis of their results. 3) The authors do not discuss the limitations of the methods on more complex vascular features, like in junctions, bifurcations, in cases of pathology, and they do not give examples of their segmentation of pathological images. For example, bifurcations consist of one parent and two daughter vessels and the strip like upsampling can not capture the organization of this structure. The future research steps in the discussion section are also missing.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Is a complete submission with the training and evaluation code, the dataset, and the pre-trained weights.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

1) How does the authors’ method compare in terms of methodology and performance with the [Ref1]? Where [Ref1]: Danny Chen et al. “PCAT-Unet: UNet-like network fused convolution and transformer for retinal vessel segmentation”, PLOS one, 2022 2) Please replace the “Cytomegalovirus Retinitis [12]” with a more relevant paper. The refereed work does not deal with this specific pathology. 3) The following sentence in section 1 is not very clear: “In contrast, patches-level methods make the geometric features … usually span multiple patches.” Please rephrase it. 4) In Fig. 4 (a) is the visualization of the strip convolutions correct? The authors propose a 4 pixel strip which seems to cover a significant area in the retina. Does each square of the strip cover that large area, or pixels, in the image? 5) In ASUB, the authors propose a straight line structural element to extract the context from the vessels. Could more complex vascular features, for example junctions, be approximated by more complex filters? 6) In Table 1 are the differences in the metrics statistically significant? Also in the results are the performance improvements originating from better small vessel segmentation, or large vessel boundary detection?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors propose a novel architecture with transformers. Their method takes into account global and local information by processing regions and the whole image. Also the approach is supplemented with an additional methodological novelty that seems to be tailored to the vessel segmentation problem (ASUB block).
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The paper proposes DA-Net for retinal vessel segmentation. The design of the network addresses limitations of image-wise and patch-wise approaches by combining them in a single network with a dual branch transformer. Also, the paper presents an adaptive strip upsampling method to better mimic tubularity of vessels. The proposed method is validated on two fundus images datasets; Drive and CASE-DB1 and shown to outperform previous methods
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper provides a new way of how to combine patch-wise and image-wise segmentation approaches in a single network for vessel segmentation, with the use of dual transformer, which makes the paper interesting. Also, adaptive strip upsampling block seems to suit well for the segmentation of vessels, which are elongated structures.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

No statistical analysis is given for ablation studies to show if the contributions of the proposed modules on method performance is significant, which seems very small.

Although using a shared encoder for image and patch wise segmentation seems to helps network parameters to be kept small, I have some concerns about that it can reduce the information can be extracted from a fundus image (which is down-sized by a factor of N=4), which would loss small vessels due to downsampling, in contrast to previous work using full size of fundus images. So, this means the network does not use image-wise approach purely. There is no ablation experiment provided to show if using dual branch improves performance instead of using only patch-level branch or image-level branch.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors shared their code and weights of their network, which makes the paper reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

There is no justification given why adaptive strip kernels are used in upsampling but not when feature extraction. They would also improves detection of vessels.

An ablation experiment to show if using dual branch improves performance instead of using only patch-level branch or image-level branch would improve the quality of the paper.

The authors stated that they resized all fundus images to 640x640 pixels. However, they did not provide any details if they crop images to be square before resizing; not doing it can change representation of vessels in images. Also, there is no information given if patches are cropped in overlapping fashion or not.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents a novel way of combining image and patch level segmentation, which is shown to outperform previous methods. Also the use of adaptive strip upsampling suits well for segmentation of elongated structures. However, there is not sufficient justification given to appreciate if dual branches outperform single branch.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The paper proposes a patch and non-patch combination retinal vessel segmentation network, including a shared-weights U-net like encoder and decoder for patch and non-patch training and a transformer module for combining them in the latent space.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Overall, it is well written and well organized. The idea for tackling small vessels in a patch way is very straightforward, yet the mechanism to combine patch and non-patch is interesting. Additionally, this method seems to obtain decent experimental results. With their codes promised to be released. I have no doubt about the methodology part of the paper.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Though, the results seem to be excellent, the experment results are not sufficient. Only two datasets are included.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

providing code, dataset is open access
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

However, I still have a question. Have you tried the STARE dataset [A]? It is also a widely used dataset to prove the effectiveness of the segmentation model. You should vaildate your method on it as well.

[A] Adam Hoover, Valentina Kouznetsova, and Michael Goldbaum, “Locating blood vessels in retinal images by piece-wise threshold probing of a matched filter re- sponse.,” in Proceedings of the AMIA Symposium. 1998, p. 931, American Medical Informatics Association.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well writen and the proposed method is novel. Furthermore, the code will be released.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

There is consensus that the paper proposes a novel architecture for vessel segmentation and the results show improvement compared with prior arts.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

1

Author Feedback

Thanks to all the reviewers for their professional and convincing comments.

Related work: The relevant work mentioned in the review will be cited and discussed.

Ablation experiment: Results using only image-level and patch-level inputs have been reported in Table 2, and they can be considered as ablation for different branches.

Typos and expression mistakes: We will make changes according to the comments of the reviewers to improve the readability of the paper.

Adaptation of ASUB to tortuous small vessels: We recommend the use of deformable convolutions and more recent transformer architectures to further improve performance, which will be discussed as future work in the paper.

back to top

DA-Net: Dual Branch Transformer and Adaptive Strip Upsampling for Retinal Vessels Segmentation