Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yejia Zhang, Pengfei Gu, Nishchal Sapkota, Danny Z. Chen

Abstract

Modern medical image segmentation methods primarily use discrete representations in the form of rasterized masks to learn features and generate predictions. Although effective, this paradigm is spatially inflexible, scales poorly to higher-resolution images, and lacks direct understanding of object shapes. To address these limitations, some recent works utilized implicit neural representations (INRs) to learn continuous representations for segmentation. However, these methods often directly adopted components designed for 3D shape reconstruction. More importantly, these formulations were also constrained to either point-based or global contexts, lacking contextual understanding or local fine-grained details, respectively—both critical for accurate segmentation. To remedy this, we propose a novel approach, SwIPE (Segmentation with Implicit Patch Embeddings), that leverages the advantages of INRs and predicts shapes at the patch level—rather than at the point level or image level—to enable both accurate local boundary delineation and global shape coherence. Extensive evaluations on two tasks (2D polyp segmentation and 3D abdominal organ segmentation) show that SwIPE significantly improves over recent implicit approaches and outperforms state-of-the-art discrete methods with over 10x fewer parameters. Our method also demonstrates superior data efficiency and improved robustness to data shifts across image resolutions and datasets. Code is available on Github.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_31

SharedIt: https://rdcu.be/dnwHa

Link to the code repository

https://github.com/charzharr/miccai23-swipe-implicit-segmentation/blob/master/README.md

Link to the dataset(s)

https://www.synapse.org/#!Synapse:syn3193805/wiki/89480

https://amos22.grand-challenge.org/

https://paperswithcode.com/dataset/kvasir-sessile-dataset

https://www.kaggle.com/datasets/balraj98/cvcclinicdb


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a novel approach to medical image segmentation called SwIPE (Segmentation with Implicit Patch Embeddings). SwIPE leverages the advantages of implicit neural representations (INRs) and predicts shapes at the patch level to enable both accurate local boundary delineation and global shape coherence. The method uses both patch and image embeddings to create continuous representations for segmentation, allowing for greater flexibility and accuracy in object shape recognition. The paper demonstrates that SwIPE significantly improves over recent implicit approaches and outperforms state-of-the-art discrete methods with over 10x fewer parameters.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel approach: The paper proposes a novel approach to medical image segmentation called SwIPE

    2. Improved accuracy and efficiency: The paper demonstrates that SwIPE significantly improves over recent implicit approaches and outperforms state-of-the-art discrete methods.

    3. Clear presentation: The paper presents its ideas and results in a clear and concise manner.

    4. Sufficient evaluation: Experimental evaluation: The paper provides extensive evaluations on two tasks to demonstrate the effectiveness of SwIPE compared to other methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Though the paper is novel, the experiments are not complete, such as for the 2D polyp segmentation. The comparisons against previous advanced methods [1][2][3] are ignored.

    1. For the 3D organ segmentation, why ONLY 30 CT scans from AMOS are exploited to evaluate the robustness instead of the entire AMOS dataset.

    [1] Shallow Attention Network for Polyp Segmentation. [2] Adaptive Context Selection for Polyp Segmentation [3] BoxPolyp:Boost Generalized Polyp Segmentation Using Extra Coarse Bounding Box Annotations

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Based on the information provided by the authors, the paper has been submitted with a completed reproducibility checklist. The authors have listed implementation details that can be followed easily.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Please refer to part 5 for more details, enhance the experiment part and polish the paper writing and figure quality.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    N/A

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors present an implicit image segmentation method that is both data efficient and robust to input data shift. With SwIPE (Segmentation with Implicit Patch Embeddings) they combine image and patch based features to implicitly encode shape representations. The method is tested on two datasets in the field of medical image segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method uses fewer parameters than the standard explicit methods, while achieving comparable performance and allowing a much wider range of different sized images to be segmented without retraining or large performance losses.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The major weaknesses of the approach is its complexity, although Fig. 1 helps a lot in understanding the architecture, it was hard to follow while reading the text only.

    Without being to familiar with implicit methods, to me it was verry unclear while reading Section 2. how the Dice Loss is formulated in this case. A reference to Section 2.3 where the details follow, would have helped a lot in understanding the method.

    In Table 1. the authors present the results of 6 runs, this is quite confusing as they state in Section 3.1 to use (static?) train/val/test splits of 60:20:20. What are those runs, was there a cross validation performed?

    Comparing Table 1. with Table 2. (middle) it seems that the method achieves better results on the second unseen Dataset (CT AMOS) then on the original one (CT BCV), is that really the case?

    In Section 3.3 it took me a while to understand what the authors mean with i.e. (rows 1 to 6) until I understood those are the row numbers of Table 2. It would be clearer if the rows are indexed directly after the reference to Table 2.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code and Data are available, used training routine and hyperparameters well described.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Interesting work with good results but with some flaws in presenting the method (for details pls. see weaknesses)

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Unless complexity of the method it self and the minor weaknesses making it hard to follow the work, the method and especially the achieved results are impressive which is why I would be voting to accept the work.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper introduces a segmentation method (SwIPE) that uses a UNet encoding structure to encode features at multiple levels. It then concatenates these patch features along with position and a global feature and uses them as input these as input to an mlp (implicit neural representation) to estimate segmentation. The authors demonstrate performance over a similar implicit neural representation method (iosnet). They also run an ablation study on multiple different model pieces.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper outperforms prior works by adding some tweaks onto implicit feature networks. Additionally they provide in-depth analysis of FLOPs, efficiency, and data shifts.

    2. The most novel part of their work seems like a useful addition to on IFA (implicit feature alignment) a by adding a novel Multi-Stage Embedding Attention (MEA) rather than concatenation.

    I would recommend adding a to the references.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The main weakness is the lack of an apparent motivation or reasoning for using patches, as we are operating convolutions over the full image anyways.

    2. Why have a separate architecture for a global vector instead of just treating it as the final level of downscaling of a patch (as it is essentially that)?

    3. Comparing to IOSNet with more downsampling layers (eg. 5 rather than the 3 it has) would seem more fair as SwIPE has 4 and a global pooling layer.

    4. Comparing to the IFA a architecture could be useful as it seems to address the same task. If there is a reason not to compare, then explain said reasons.

    5. The patches are pulled after a CNN over the whole image, and then downscaled to match dimension on output. Patches are then upsampled to match a final shape for the INR (implicit neural representation). This doesn’t make sense and likely adds artifacts. Why not just directly sample the location from each patch level?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code should be easy to reproduce from the details in the paper, and it will be released on github.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. RBF>RFB in Figure 1

    2. On the ablation experiments (Table 3) for MEA, where are 3,4,5 happening? Is it: addition of the residual before mlp1? swapping for the concat on pg 5 before mlp2? or something else?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper has an extremely interesting component (MEA) which makes it promising. This component on its own likely has great performance benefits for implicit models. The drawbacks are that the overall model architecture used seems to be overall more complicated than it has to be, and the motivation for using patches is not clearly stated. This could possibly be clarified by better detailing these in the paper. Additionally, the work of IFA is not mentioned or compared.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The paper gets results, but I am worried about the framing of patches; the novelty focus should lie in MEA and SPO. Responses to rebuttal follow: “> Patches: …” The CNN encoder backbone E_b outputs feature maps at multiple scales which the encoder neck E_n convolves/resizes to have identical shapes (H/32xW/32xD). Each ‘pixel’ here is treated as a patch with no cropping; thus I do not see explaining these as patches to be useful, maybe more confusing. The only patch-relevant data is the location, p_i^P, passed into the patch decoder. Everything else is convolutional. SPO as regularization is the primary motivation here. I do not see any operations in this network which requires gathering patches, since they are really downsampled pixels. Table 3. left does not necessarily support patches, it supports multiscale features, as that is what z^p really is. IFA does this same thing, just with ‘relative coordinates’ as the name instead of ‘patch coordinates’. R2 mentions a weakness being the complexity, and I have a feeling the patch justification is overly complex. “> Separate Global Decoder: … “ Fair, what I am trying to ask though is why have a separate global branch when you could have more downscaling feature maps instead. “> Resizing INR Features:…” Yes, but you are losing data by downsampling the features to be a single pixel. It would likely improve performance to sample directly from the relevant location. “[R3] (IFA) & MEA: …” Your patch seems to be a pixel by the time you are using it. Yes, it represents a region, but so does any pixel in a CNN with downsampling. No mention of cropping/gathering patches is used in the paper. I think the main reason your method outperforms IFA is the usage of MEA, and not the patches. This should be emphasized as the primary contribution as the patches piece seems to distract. For example how does IFA+MEA compare to yours (patches + MEA). This would be the relevant ablation experiment.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This submission proposes a continuous model for image segmentation using recent implicit patch-based neural representations. The originality resides in the novel patch-based embeddings that move implicit neural reps towards predicting shapes at the patch level with global consistency. The evaluation is on two tasks, 2d polyp segmentation and 3d abdominal segmentation. The reviews are mixed, ranging from weak rejection to acceptance. The authors are therefore invited to address the following key concerns in a rebuttal:

    • Relation to implicit feature alignment - The proposed multi-stage embedding attention (MEA) may be related to the implicit feature alignment (IFA) (R3). The authour could further the methodological motivation for using patches and specify the relation between the proposed MEA and/or complementarity with IFA.

    • Clarification of Evaluation - The description and interpretation of the evaluation may require clarification to avoid misunderstanding on which datasets were used, and how, in each experiment (R1,R2).




Author Feedback

We are grateful to the reviewers & AC for their constructive feedback and approvals of novelty & performance. We also appreciate the suggestions for improving clarity (esp. R2) and will incorporate them into our revisions.

[R3] Motivation for Patches & Other Components

Patches: Recent segmentation (seg.) methods that utilize Implicit Neural Representations (INRs) adopt a shape feature vector that represents either the entire image’s or a single point’s shape information. The former may overlook boundary details, while the latter often struggles with globally coherent shapes (Sec. 1, paragraphs 3 & 4). Our adoption of patches & additional regularization mechanisms strikes a balance and strives toward both objectives. These patch-based features are conceptually distinct from convolutions which involve an implicit bias for encoder feature extraction. The spatial granularity of shape features (image vs. patch vs. point) used as input to INR decoding represents a separate & crucial design decision. Here, INR with patches outperforms other granularities (see Sec. 3.2); Tab. 3 (left) also supports this as dice notably increases after adding patch-based features. Separate Global Decoder: Global shape coherence is the primary challenge when using patch-based shape features. A separate global MLP was one of many regularization strategies (see ablations for effectiveness, Tab. 3 left). Plus, the light global MLP added negligible overhead while facilitating shape understanding of whole structures. Resizing INR Features: The final shape feature vector for a patch is obtained from a single grid position in the encoder neck’s (E_n) output. Resizing this allows flexibility to determine patch size for different tasks (in evaluations we did not resize; the patch coverage size was appropriate for our datasets). The resizing is linearly interpolated and we observed no artifacts (similar to interpolating the shape feature vector in IOSNet).

[R3] Implicit Feature Alignment (IFA) & MEA: IFA, akin to IOSNet, utilizes concatenated point-wise features from multiple scales but with Nearest Neighbor interpolation. In contrast, SwIPE’s shape feature vector represents a patch region instead of a point. If we were to define our “patch” as a pixel, SwIPE would sample shape features in a comparable way to IFA. We initially implemented IFA but opted to compare against IOSNet instead since a) IFA performed on par with IOSNet for polyp seg., but worse on CT seg. (unsurprising since IFA was proposed for 2D natural images), and b) we wanted to focus on medical imaging methods. We will add IFA results and supplement a study on MEA’s potential benefits over concatenation in IFA & IOSNet.

[R1,R3] More Polyp Seg. Baselines (Sec. 3.2) – Results for SANet & IFA will be added to Tab. 1. We report SANet since it outpeforms ACSNet (esp. on EndoScene which has smaller polyps like in our dataset) and omit BoxPolyp since there are no weakly supervised labels. The dice scores are 83.73% for SANet & 78.55% for IFA, which further demonstrates SwIPE’s advantages over leading discrete & implicit methods.

[R1,R2,R3] Evaluation Clarifications (Sec. 3)

[R1] The data robustness study analyzes AMOS liver masks with models trained on BCV. “30 CT scans” was referring to BCV (mentioned in Sec. 3.1); we adopted the setting in [1] and used 200 CTs from the AMOS training set. [R2] Some AMOS dice scores in Tab. 2 were higher than the BCV scores in Tab. 1 because Tab. 2 reports AMOS liver seg. and the liver is relatively easier to segment than the average of 13 BCV organs (stated in Sec. 3.3, last paragraph). [R2] The train/val/test split was kept constant across 6 runs, each with different seeds. [R3] The IOSNet backbone contained the same number of downsampling layers (or stages) as SwIPE which was empirically better in both tasks vs. the 3 downsamples in the original paper.

Refs [1] Y. Zhang et al., “..Debiasing Contrastive Learning with Spatial Priors..” BIBM’22




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal has clarified major concerns on the evaluation. A general consensus among the reviews exists on the novelty of the proposed patch-based embeddings, which further improves implicit neural representations. New results were proposed, whereas the rebuttal should focus on clarifying the choices made at submission rather than providing post-submission experiments. The less enthusiastic review also indicates “an extremely interesting, promising component”, with queues on improving the clarification of the contribution. These could be helpful for a journal extension. The scientific merit of the paper remains valid. For these reasons, the recommendation if towards Acceptance.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents a novel framework for medical image segmentation with the design of SwIPE (Segmentation with Implicit Patch Embeddings) via learning continuous representations of foreground shapes at the patch level. Extersive evaluation studies have been conducted on two segmentation tasks (for both 2D polyp and 3D abdominal organ) and experimental results demonstrate the proposed approach outperforms other state-of-the-art methods. The overall paper quality is good and there is some novelty in the method design. The authors’ rebuttal has provided good responses and addressed the reviewers’ concerns regarding clarification and motivation. The final version should be revised to include these addressed points.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The major concerns are partially addressed in the rebuttal. As all reviewer suggested, the strength of the current version overweight the weakness. I would strongly suggest the author to carefully read all review comments for the future work.



back to top