Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

José Morano, Guilherme Aresta, Dmitrii Lachinov, Julia Mai, Ursula Schmidt-Erfurth, Hrvoje Bogunović

Abstract

Deep learning has become a valuable tool for the automation of certain medical image segmentation tasks, significantly relieving the workload of medical specialists. Some of these tasks require segmentation to be performed on a subset of the input dimensions, the most common case being 3D-to-2D. However, the performance of existing methods is strongly conditioned by the amount of labeled data available, as there is currently no data efficient method, e.g. transfer learning, that has been validated on these tasks. In this work, we propose a novel convolutional neural network (CNN) and self-supervised learning (SSL) method for label-efficient 3D-to-2D segmentation. The CNN is composed of a 3D encoder and a 2D decoder connected by novel 3D-to-2D blocks. The SSL method consists of reconstructing image pairs of modalities with different dimensionality. The approach has been validated in two tasks with clinical relevance: the en-face segmentation of geographic atrophy and reticular pseudodrusen in optical coherence tomography. Results on different datasets demonstrate that the proposed CNN significantly improves the state of the art in scenarios with limited labeled data by up to 8% in Dice score. Moreover, the proposed SSL method allows further improvement of this performance by up to 23%, and we show that the SSL is beneficial regardless of the network architecture. Our code is available at https://github.com/j-morano/multimodal-ssl-fpn.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_56

SharedIt: https://rdcu.be/dnwD4

Link to the code repository

https://github.com/j-morano/multimodal-ssl-fpn

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

(1) Proposed a 3D->2D segmentation CNN based on ReSensNet [20] by 3D->2D projective blocks. (2) Proposed a self-supervised learning strategy for 3D->2D models based on the reconstruction of modalities of different dimensionality.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) This paper proposed a 3D->2D segmentation CNN based on ReSensNet [20] by 3D->2D projective blocks. (2) The self-supervised learning method by reconstructing image pairs of modalities is interesting, but the similar idea has been proposed in the multi-modal reconstruction pre-training (MMRP) [7].
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

(1) In introduction, the description about the limitations of IPN in [11, 12] is not correct, because the limitation of the patch-based strategy has been solved by IPN-V2 in [12]. (2) The comparison is not enough. Why not compare with the ReSensNet [20], which is similar with the proposed method? (3) Except Fig.4, the quantitative comparison should be given in a Table, because the differences of different methods in Fig.4 are difficult to be clearly distinguished.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I think that the reproducibility of this paper is relatively easy, because it is mainly based on existed framework or idea, and the code will also be provided.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

(1) Please clearly explain and compare the difference between the proposed method and the existed methods, such as [20] and [7]. (2) The experimental analysis should be improved, such as more qualitative results.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The network architecture and the SSL strategy are interesting, but the similar work has been existed.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The authors propose a 3d-2d segmentation network for retinal OCT by efficiently leveraging self-supervised learning. The paper’s key contribution is proposing an adaptive pooling strategy for combining 3d and 2d blocks of the CNN network. This contribution facilitated label-efficient 3d to 2d segmentation. The authors evaluate the method on multiple datasets along with ablation studies to highlight the contribution.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

– The SSL approach taken in the paper show improvement in performance for the proposed 3d-2d network and other methods from the literature

– The article is well written with easy to understand
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The application of the paper is novel, along with the ablation study. There are a few minor questions that need to be addressed for completeness of the paper.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The author state to release the code on GitHub. The carbon footprint and the training time are also reported in the supplementary article
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
-Did the authors train different models for different datasets as the GA-M and GA-S dataset sizes vary? -Which layers of the segmentation model are the weights shared from the reconstruction model?
- Is the model susceptible to the initial registration of the FA and OCT method?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Well written paper with novel application
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

6
[Post rebuttal] Please justify your decision

My concerns about the paper were addressed. It is also nice to see results from additional experiments supporting the paper’s claims.

Review #3

Please describe the contribution of the paper

The authors proposed a label-efficient 3D-to-2D segmentation network based on self-supervised learning (SSL) via cross-modality reconstruction. The experimental results on the en-face-level segmentation of geographic atrophy (GA) and reticular pseudodrusen (RPD) in 3D optical coherence tomography demonstrate that the proposed network outperforms other 3D-to-2D segmentation methods and the proposed SSL strategy is also beneficial to other architectures.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed method integrates the novel self-supervised learning (SSL) strategy using pre-trained cross-modality reconstruction into the 3D-to-2D segmentation network. The experimental results have demonstrated the effectiveness of SSL strategy on 3D-to-2D segmentation tasks and the paper is well-organized.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The training of the proposed network still has strict requirements on data which must contain one 3D imaging modality, another registered 2D imaging modality and manual annotations based on the 2D imaging modality. In addition, the paper has not verified the effectiveness of the proposed 3D-to-2D projective block.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The details of the method have been clarified clearly and the code implementation will be available on GitHub, which brings the fine reproducibility of the paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. The proposed network is based on ReSensNet. However, the authors have not compared the proposed network with ReSensNet and also have not conducted ablation study on the 3D-to-2D projective block.
2. In Training and evaluation details of Section 2, 1) “For reconstruction, models were trained on GA-M-N”. What’s GA-M-N? 2) “To evaluate the performance under label scarcity, we train with 5%, 10%, 20% and 100% of the data in GA-S, and 20% and 100%, in RPD-S”. However, I have not found any evaluation under label scarcity in the Supplementary materials.
3. In the first paragraph of Section 3, it seems lack of statistic analysis for the sentence “However, in most cases, the differences have not been found statistically significant”.
4. Fig. 4 only illustrates the results of Lachinov et al. and the proposed method for RPD-S.
5. In Section 3, I suggest to modify “Ablation study” as “effect of reconstructed modality” or incorporate SSL effect into Ablation study since SSL effect is also belong to ablation studies. In addition, there is a spelling mistake: “asses” -> “assess” in “We conducted ablation studies to asses the effect of …” and “To further asses the effect of the SSL, …”.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed self-supervised learning (SSL) strategy is novel and interesting, and the paper is of good clarity and organization.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper presents an approach for 3D->2D segmentation where the input data is 3D but the segmentation is enface 2D. Althoug the paper receives favourable scores, the reviewers raised some critical comments, especially the comparison and difference compared with previous method such as ReSensNet [20] Strength: 1) The paper is generally well organised and easy to follow. 2) The results show some improvement over earlier methods, though not quite consistently. Weakness: 1) It is not clear what is the difference between the proposed method and earlier ReSensNet [20] method. A comparison of the performance is missing as well. 2) The title seems confusing to me, especially the word “Cross Modality”. To me, there is no cross modality here. 3) Some results mentioned to be in the supplementary are actually missing. 4) Numeric results are not provided, instead, the authors provided plot in Fig. 4 and the difference between different methods seem to be small for larger ratio (e.g., ratio =1.0) in some results.

Author Feedback

We thank all reviewers and AC for the insightful comments, and for finding our SSL method interesting (R1-3), novel (R2-3) and effective (R2-3), and the paper well-written (R1-3). The reviewers’ concerns are addressed as follows.

– Differences between the proposed method and ReSensNet [20], and the performance comparison (AC,R1,R3). The main goal of the novel feature projection blocks (FPBs) is to make the architecture more versatile to different input sizes while keeping its structure. We thus did not initially conduct an ablation study for this change. In ReSensNet, each FPB has a final convolution that projects 3D features to 2D via a 3D kernel with the same depth as the input feature map. Thus, ReSensNet’s FPBs must be modified whenever the input dimension changes. Instead, our novel FPBs handle features of any size by replacing all those convolutions with 1x1x4 convolutions followed by an adaptive avg. pooling with output depth = 1. Still, we agree that the performance comparison between our network and ReSensNet is of interest. Below we present the Dice scores (%) obtained by both networks (ReSensNet; Ours) on GA datasets for different ratios of labeled data (0.05|0.1|0.2|1.0). Input was adapted to a fixed depth of 128. In each case, a Wilcoxon signed-rank test was performed (* p<0.1, ** p<0.05, *** p<0.001).

GA-S-1 78±17; 80±16** 84±16; 85±18*** 90±7; 91±8*** 93±7; 93±6

GA-S-2 78±16; 80±16* 83±11; 84±11 86±10; 87±9* 88±8; 88±9

GA-M-S 72±23; 73±21 73±25; 77±24*** 81±22; 83±21*** 80±23; 85±21***

These results demonstrate that our network performs equal or more often better than ReSensNet, while being more versatile. These results will be added to the final paper. ReSensNet also differs from our approach in that it is applied pixel-wise. Thus, each en-face pixel requires one forward pass (FP), making the prediction computationally inefficient. Instead, ours infers the final map in a single FP using the whole volume as input.

– Confusing title, especially the term “cross-modality” (AC). We agree and will replace it with “inter-modal”.

– Missing results in the supplement (AC,R3). Fig. 4 shows all the results discussed in the paper, including those of the methods fine-tuned in label-scarce scenarios (ratio≤20%), hence no results are missing. We will rephrase the reference to the supplement to clarify this.

– Table besides Fig. 4 and significance tests (AC,R1,R3). Due to space limits, we could not include both graphical and table-based results. Given the large number of experiments, we think plots are easier to read, while being sufficient for discussion. That said, we can add a table to the supplement, additionally including the results of the significance tests of our model vs others.

– Small performance differences for ratio=1 (AC,R1). Our work proposes an SSL method and CNN that are effective in scenarios with very few labeled data, as demonstrated by the results. However, it is expected that all models perform similarly if enough data is available, since all architectures are alike (U-Net based) and the training setting is identical. This is not in conflict with our contributions.

– Differences between the proposed approach and MMRP [7] (R1). MMRP has only been proven useful for localizing non-pathological structures on 2D fundus images, while our SSL method enables models to learn meaningful representations for the en-face 2D segmentation of different pathological structures in 3D OCT. Also, this is the first work proposing an SSL method for 3D➜2D models, studying the impact of the reconstructed modality, and approaching 3D➜2D reconstruction. Besides, our method can be easily extended to other tasks.

– Clarifications on some training and evaluation details (R2-3). We always used GA-M (misspelled “GA-M-N” in the paper) for pretraining. In target tasks, all weights are fine-tuned, so none are directly shared with the reconstruction model. For RPD-S, we picked only the best SOTA method.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors have provided clarifications on the difference from ReSensNet, which seems reasonable. Additional, they have also provide comparison with the method. Therefore, I would recommend this paper to be accepted. The authors are required to include the comparison with ReSensNet into the final paper.

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

According to the reviewer comments and the rebuttal, I think the authors have addressed the main issues raised. The authors have clearly stated the differences between the method and ReSensNet and have added comparative experiments. The authors also provided clarification on some details. The three reviewers unanimously agreed to accept the paper. Therefore, I recommend acceptance.

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Recommendation is to accept from each reviewer and the authors have responded to the issues raised.

back to top

GA-S-1	78±17; 80±16**	84±16; 85±18***	90±7; 91±8***	93±7; 93±6
GA-S-2	78±16; 80±16*	83±11; 84±11	86±10; 87±9*	88±8; 88±9
GA-M-S	72±23; 73±21	73±25; 77±24***	81±22; 83±21***	80±23; 85±21***

Self-supervised learning via inter-modal reconstruction and feature projection networks for label-efficient 3D-to-2D segmentation