Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Mikhail Goncharov, Vera Soboleva, Anvar Kurmukov, Maxim Pisov, Mikhail Belyaev

Abstract

This paper introduces vox2vec - a contrastive method for self-supervised learning (SSL) of voxel-level representations. vox2vec representations are modeled by a Feature Pyramid Network (FPN): a voxel representation is a concatenation of the corresponding feature vectors from different pyramid levels. The FPN is pre-trained to produce similar representations for the same voxel in different augmented contexts and distinctive representations for different voxels. This results in unified multi-scale representations that capture both global semantics (e.g., body part) and local semantics (e.g., different small organs or healthy versus tumor tissue). We use vox2vec to pre-train a FPN on more than 6500 publicly available computed tomography images. We evaluate the pre-trained representations by attaching simple heads on top of them and training the resulting models for 22 segmentation tasks. We show that vox2vec outperforms existing medical imaging SSL techniques in three evaluation setups: linear and non-linear probing and end-to-end fine-tuning. Moreover, a non-linear head trained on top of the frozen vox2vec representations achieves competitive performance with the FPN trained from scratch while having 50 times fewer trainable parameters. The code is available at https://github.com/mishgon/vox2vec.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_58

SharedIt: https://rdcu.be/dnwdF

Link to the code repository

https://github.com/mishgon/vox2vec

Link to the dataset(s)

https://www.synapse.org/#!Synapse:syn3193805/wiki/89480

https://zenodo.org/record/7262581/files/amos22.zip

https://flare22.grand-challenge.org/Dataset/

https://wiki.cancerimagingarchive.net/display/NLST/National+Lung+Screening+Trial

https://wiki.cancerimagingarchive.net/display/Public/NSCLC-Radiomics

https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=80969742


Reviews

Review #1

  • Please describe the contribution of the paper
    • The paper introduces a contrastive method for self-supervised learning of voxel-level representations, specifically, the model is optimised to learn dense, patch-level representations, pixels of augmented image views from the same region of the original image should have similar representations, while different pixels should have dissimilar ones.

    • The model has been pre-trained on a large number of CT images of the thorax and abdomen, then evaluated on 22 segmentation tasks, with linear probing, non-linear probing, and fine-tuning, showing the effectiveness of vox2vec on linear and non-linear probing.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well-written, easy to follow,

    • The authors have conducted thorough experiments to validate the proposed idea, thus solid experiments.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The novelty of idea is limited, as contrastive learning has been explored extensively by the community.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have promised to release code and models, thus is fully reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The paper is of good quality, despite the idea is not novel, I think it’s worth being accepted as it has demonstrated new results on linear and non-linear probing.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • paper writing.

    • solid experiment.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The study proposed a vox2vec framework which learns voxel level representations of medical images via contrastive learning and stores learned image features as vectors in pyramid levels. The authors pre-trained their vox2vec-FPN model on over 6500 CT scans spanning 6 datasets of thorax and abdoman. They will made the model public, which could serve as a starting point for a variety of downstream tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The voxel-level representation in the pyramid form has merits in high-dimensional, fine-grained and multi-scale. Such representations make the model suitable for the segmentation of different organs and tumors in full resolution. 2) The pretrained vox2vec-FPN model, with frozen weights plus trainable heads outperforms SOTA models with less trainable parameters. 3) The linear and non-linear probing regimes are novel in medical image domain when evaluating dense self-supervised learning.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    In addition to number of trainable parameters, FLOPs (Floating Point Operations Per Second) could be used to for comparing the model complexity. This is not weakness but a nice-to-have.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The author will release the code and the pre-trained model. Looking forward to it.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    In addition to what the author mentioned in their last paragraph, they can also explore the model’s potential on other downstream tasks like few-shot learning, landmark detections, etc.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Good method, strong results, clear statement. And the release of code and the pre-trained model on 6000+ scans is a plus.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This work proposes a framework for self-supervised learning (SSL) of voxel-level representations. The authors propose to use Feature Pyramid Network (FPN) to extract multi-scale representations for contrastive learning. The empirical evaluations on segmentation tasks show that it outperforms baseline methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method is simple and effective.
    2. The experiments show that the proposed methods outperforms baseline methods in multiple benchmarks.
    3. The writing quality is good and easy to follow.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Some details of pre-training is missing. For example, how many GPUs are used? How long it takes to pre-train the model?
    2. The authors didn’t compare computational cost of pre-training with baseline methods.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors agree to release the code in the checklist.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The authors could evaluate their methods on other tasks, including detection and classification.
    2. The authors should include more details of pre-training, including number of GPU used and pre-train time.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the idea of introducing FPN extract multi-dimensional features for contrastive learning is simple and effective. The experiments show performance gain, despite that some details of pre-training is missing.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    First, it introduces vox2vec, a framework designed for contrastive learning of voxel-level representations. By using a straightforward negative sampling strategy and storing voxel-level representations in a feature pyramid form, it achieves high-dimensional, fine-grained, multi-scale representations that are well-suited for segmenting various organs and tumors in full resolution.

    Second, it utilizes vox2vec to pre-train a Feature Pyramid Network (FPN) architecture on a diverse set of six unannotated datasets comprising over 6,500 CT images of the thorax and abdomen.

    Lastly, it compares the performance of the pre-trained model with baseline models on 22 segmentation tasks across seven CT datasets, employing three different setups: linear probing, non-linear probing, and fine-tuning. The results demonstrate that vox2vec achieves slightly superior performance to state-of-the-art models in the fine-tuning setup and significantly outperforms them.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper’s writing is effective in communicating the research findings, and the figures are well-designed to illustrate the main concepts. Experiments are conducted extensively.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper has several weaknesses that need to be addressed. Firstly, the novelty of the proposed method is questionable as it appears to be a modification of contrastive learning applied to voxels instead of patches. Secondly, there is a lack of sufficient comparison with other pre-training methods like MoCo or SimSiam. This omission limits the assessment of the proposed method’s performance against existing alternatives. Lastly, an ablation study is missing, which would help identify the specific components of the model that contribute to its effectiveness.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors state that they will publish the code and weights.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Regarding the comparison with other pre-training methods (such as MoCo, SimSiam), it would greatly enhance the paper’s contribution and validity to include a comprehensive comparison. This would provide insights into the strengths and weaknesses of the proposed method compared to existing alternatives.

    Conducting an ablation study would be highly valuable for understanding the individual contributions of different components of the model. By systematically analyzing and reporting the results of removing or modifying specific parts of the model, the paper can demonstrate which elements are critical for achieving the reported performance.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please refer to weaknesses.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviewers recognized the strengths of the paper while criticizing its limited novelty, lack of evaluation in classification and detection, and insufficiency in comparisons and ablation studies. I suggest that you take the rebuttal opportunity to win over all reviewers by carefully addressing each of the weaknesses indicated by all three reviewers.




Author Feedback

We thank the reviewers for their thoughtful and generally positive feedback, with R1-3 ranking our work as top1 in their review pool. We are encouraged that R2, R3 find our method effective and R1, R4 view our experiments as solid and extensive. We appreciate that R1, R2 recognize the novelty of our evaluation approach in the context of medical image segmentation. We are pleased that R1-4 agree that our method outperforms SOTA self-supervised learning (SSL) models in medical images, specifically in linear/non-linear probing.

Below we address the reviewers’ concerns.

[Limited novelty] R1: “The paper is of good quality, despite the idea is not novel, I think it’s worth being accepted as it has demonstrated new results on linear and non-linear probing.” R4: “The novelty of the proposed method is questionable as it appears to be a modification of contrastive learning applied to voxels instead of patches.”

The novelty of our work is two-fold:

  • We propose a novel voxel-level SSL method that outperforms previous SOTA models by a large margin (see Tab 1, Fig 2). R2 and R3 agree with that.
  • As R1 and R2 recognize, we are the first to evaluate SSL models for medical image segmentation in linear/non-linear probing setups, a standard evaluation protocol in the SSL community [7,8,22]. As a result, this is the first work in medical imaging that significantly reduces the performance gap between end-to-end supervised segmentation and segmentation via frozen SSL model plus shallow supervised head. Regarding the novelty of our method, adapting contrastive learning for producing voxels’ representations is not trivial. As we argue in Sec 3.2, contrastive learning requires representations to be sufficiently high-dimensional. Storing high-dimensional voxels’ representations in a plain feature map of high resolution is infeasible. Therefore in previous works authors apply contrastive learning at patch-level [6,24] and combine it with generative SSL [1,25]. As we state in Sec 1, par 6, the novelty of our method is that it produces high-dimensional voxels’ representations in the form of a feature pyramid. Tab 1 shows that this component is crucial for good model performance in linear/non-linear probing, unlike the generative component employed by [1,25].

[Ablation study] R4: “An ablation study is missing, which would help identify the specific components of the model that contribute to its effectiveness.”

We ablate the main component of our method: storing voxels’ representations in a feature pyramid, see Tab 2, vox2vec-FPN and vox2vec-UNet. As R3 noticed, our method is simple, yet effective, and there are no other components to ablate.

[Comparison with other methods] R4: “There is a lack of sufficient comparison with other pre-training methods like MoCo or SimSiam.”

We compare our method with SwinUNETR [25] and TransVW [1], current and previous SOTA SSL models in medical imaging. In addition, the weights of these models are officially published, unlike models from [6,24,29]. To the best of our knowledge, no works have adapted MoCo and SimSiam for voxel-level SSL in medical images.

[Other downstream tasks] In the commentary sections, R2 and R3 suggest testing our model on additional downstream tasks like object or landmark detection, classification, and few-shot learning.

Following [25] we evaluate our method on semantic segmentation, the most widespread task in 3D medical imaging. R1, R4 consider our experiments extensive, as they cover 22 segmentation tasks of organs and tumors of different sizes. Our method, producing representations for individual voxels, is not intended for classification which requires image-level representations.

[Technical details] R2 and R3 asked for more information on models’ FLOPs and pre-training details. We pre-trained our vox2vec model on a single A100-40Gb GPU for 3 days. Its complexity is 115 GFLOPs compared to 391 GFLOPs of SwinUNETR [25]. We will include these details in the camera-ready version.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have addressed most of the concerns



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Overall, ​​the reviewers recognized the strengths of the paper (interesting method, strong results, clear statement, and the release of the code and pre-trained model on 6000+ scans).



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal fails to adequately address the main concerns about limited novelty raised by the reviewers. SSL models for segmentation with linear/non-linear probing configurations are not new in the community. Moreover, due to a lack of comprehensive experimentation, it is difficult to assess the superiority of the proposed voxel-wise SSL over path-level SSL. Lastly, the results on the BTCV dataset aren’t compelling, as there are insufficient comparisons on the online test set. Given these issues, the meta-reviewer recommends rejecting the paper.



back to top