Authors

Kunzi Xie, Yixing Yang, Maurice Pagnucco, Yang Song

Abstract

Image registration is an essential task in electron microscope (EM) image analysis, which aims to accurately warp the moving image to align with the fixed image, to reduce the spatial deformations across serial slices resulted during image acquisition. Existing learning-based registration approaches are primarily based on Convolution Neural Networks (CNNs). However, for the requirements of EM image registration, CNN-based methods lack the capability of learning global and long-term semantic information. In this work, we propose a new framework, Cascaded LST-UNet, which integrates a sharpening skip-connection layer with the Swin Transformer based U-Net structure in a cascaded manner for unsupervised EM image registration. Our experimental results on a public dataset show that our method consistently outperforms the baseline approaches.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16446-0_30

SharedIt: https://rdcu.be/cVRTc

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper puts together a Cascaded Laplacian-sharpening Swin Transformer U-Net (LST-UNet) for deformable image registration of data slices from the MICCAI Challenge on (neural) Circuit Reconstruction from Electron Microscopy Images (CREMI). As these are preregistered data, synthetic deformations are applied for later recovery of motion.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- interesting use of Laplacian sharpening of skip connections
- interesting use of Transformer-based Swin-UNet for capturing more long-range semantic information with the Transformer modules.
- comparison to SOTA (designed for different purposes though)
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- synthetic deformations are suboptimal, as these only test the robustness and precision, but not the accuracy
- unclear why (simulated by motion) adjacent slices in EM require registration - what is the application/question?
- EM is suffering from intrinsic motion artefacts (line shifts) which are not dealt with here
- unclear why this methods is “specifically designed for ME registration” - it appears to me to be quite generic in set-up for other 2D registration tasks?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- use of public challenge data
- unclear if code is to be made available
- no significance testing (not claimed in checklist but unclear why not carried out)
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

This is a nice methodological framework, tested on EM challenge data and generally compared well against SOTA methods, which however were designed for different purposes. Results seem moderately, but not significantly better. Deformations were first simulated then recovered, which is suboptimal. I would have thought that serial or cine registration applications would have been more interesting to investigate.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

See above - nice methodological framework, questionable experiments and moderately convincing results.
Number of papers in your stack

2
What is the ranking of this paper in your review stack?

4
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #4

Please describe the contribution of the paper
- The authors propose a Swin Trasnformer UNet with Laplacian sharpening in the skip connections to register TEM images by infering the displacement map rather than the registered image. The sharpening filters seem appropriate for connectomics in TEM to preserve the semantic and structural information in the image.
- They also propose applying the model inference two times, one after the other one (cascade processing).
- They compare their approach with other existing methodologies in the field, showing a better performance of their proposed architecture and the cascade approach.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Integration of an adequate filter (Laplacian sharpening) for the nature of the images to analyse in the skip connections.
- The authors introduce the idea of fine tuning the results by processing the output result with the trained architecture (cascade processing)
- The authors benchmark their proposal with existing state of the art methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The novelty of the propose method is questionable: The authors choose a CNN architecture already published in the literature to show it’s applicability in a different microscopy image processing task. Additionally, it is not clear from the paper whether their method works in 3D or in 2D. In the original Swin U-Net, they used 2D slices and propose to work on 3D as a future work. An important contribution would be for example, upgrading this method to 3D, which is not clear whether the authors did it already.
- The authors speak about catching the global information of TEM images to improve the registration. Indeed they claim that this work is specifically designed for serial-section. Although they are using transformers, I think that if they are not analysing the global 3D information, such statement can be confusing. The main reason is because in connectomics, when speaking about serial-sections or global information, it is related to the 3D information contained along all the EM slices, and not only the one that is observed in pairs of slices.
- If the authors are indeed analysing 3D information, please, make a clear statement in the text and indicate the number of slices entering the training batch.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- Please, correct me if I’m wrong: In the reproducibility statements, authors declare to provide the links to the code, however I did not find any or any statement in the paper indicating the the code will be openly available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
Typos & proposed corrections to be made:
- Page 1: Lv et al. [12] proposed (Is it correct?)
- Page 3: STN, please indicate that it means “spatial transformer network” and cite it. Also,
- At the end of page 2, the authors indicate that they add a Laplacian sharpening skip connection but they do not justify why until the description of feature enhancements. To ease the reading, I think it would be nice to motivate why you integrate such accesorial step in the architecture.
- Figure 2: could you elaborate a bit more on the Caption. For a matter of completeness it is recommended to explain the meaning of the parameters in the propose architecture (i.e., W, H, C and so on) and how did you set them up.
- Did you train the STN? How?
- What is the size of the kernel used for SSIM?
Comments:
- Cascade approach: Is there any theoretical reason to not train the cascade approach end-to-end? Additionally, there will always be an error and a bias implicit to the trained architectures. How does it propagate in this cascade approach? Is there any reason to apply the cascade iteration only once? Would it make sense to apply it iteratively until there is no significant change in the output?
- The authors use images of size 448x448 to train the model. How does this relate to the receptive field of the network? Additionally, images are resized before the training. Is there any specific reason to do so? Were the images also downsampled before analysing them with the proposed stare-of-the-art methods? A comment on this might be important as in Figure 2, it looks like the proposed network is able to better resolve fine details, compared with the UNet. This might be first, because the resolution of the images was different on each approach, or because the proposed method is able to fine tune the results. Is this a consequence of using the cascade approach?
- How does the proposed method compare in terms of hyperparameters and memory consumption?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I think the authors did a great job describing the method and comparing it with state of the art soltuons, but I think the contribution they are making lacks of technical novelty.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

3
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The authors investigate the use of a Swin Transformer based U-Net (Swin-UNet) for registration of serial electromicroscopy (EM) image slices. They further incorporate ideas from Sharp U-Net and cascaded registration approaches (see for example Quicksilver by Yang et al. 2017 and Recursive Cascaded Networks by Zhao et al. 2019) to further improve the registration. In addition, a similarity term consisting of two loss terms is utilized.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Clear motivation and approach integrating three orthogonal ideas from related work.
- Ablation study demonstrating improvements due to each architectural choice.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The paper proposes a methodologically simple idea of combining related work.
- No standard deviations nor significance tests are reported for the results.
- Use of both correlation coefficient and structural similarity is not motivated well. No ablation study with regards to this choice of loss is included.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Publicly available dataset and simplicity of approach based on related work with available open source implementations should facilitate reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- The authors don’t specify how many of the 125 images were used for training, validation, and testing.
- A reference to related work which also used a cascade of registration networks would be appropriate. Appropriate would be Quicksilver by Yang et al. 2017 and Recursive Cascaded Networks by Zhao et al. 2019, though both differ from the simpler cascaded approach utilized in this work. Personally, I think Recursive Cascaded Networks with the difference that weights are not being shared and learning is not end-to-end (which is fine!) seems an appropriate reference.
- Given that after an initial alignment, the motivation for long-range dependencies is less profound, would it be more plausible to use a standard CNN registration model for the second stage of the cascade?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors integrate ideas from related work and apply them to particular application. Novelty is limited, but results of combination of Swin-UNet with Sharp UNet for image registration have not been reported before to the best of my knowledge.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #5

Please describe the contribution of the paper

The authors propose a transformer based network to register EM images. They used the CREMI dataset for the experiments and report results which outperform the baseline approaches.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The main strength of the paper is the fact that the authors used a transformer based network for an image registration task. The paper is very well written and easy to follow. The analysis is also very clear.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The major weakness of the paper is that it lacks timing comparisons for various algorithms compared.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors have mentioned that the source code will be provided.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

The authors should compare the run times for various algorithms compared.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is very well written and the analysis seems thorough. The evaluation is also very convincing. The timing aspect of the analysis was a major reason for my score.
Number of papers in your stack

7
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

Overall the paper was well received, with the authors appreciating the problem, approach, and experiments and leaning to accept the paper, but still had several concerns, which I think can be addressed in a rebuttal. Several of the questions revolve around clarify and novelty. Overall, it seems like the paper straddles technical contribution and application, and may be of interest to part of the MICCAI community.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

4

Author Feedback

We appreciate all reviewers and Area Chairs for your time and comments. In the final version, we will address a few presentation issues and include more related work as the reviewers suggested (R2, R4). We will also clarify the motivation for 2D registration of EM (R1), describe the experimental settings more clearly and explain why we use SSIM and PCC loss functions (R2). Also, we will explain the use of cascaded mechanisms (R2, R4) in more detail, clarify the 2D and 3D work in our experiments (R4) and include time consumption information (R5). In our future work, we will conduct more ablation studies with accuracy comparisons and noisy detection. We will generalize the image registration method to other kinds of medical images (R1) and 3D image registration (R4). We will also include statistical significance tests in the extended journal version (R1, R2).

back to top

Electron Microscope Image Registration using Laplacian Sharpening Transformer U-Net