Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Go-Eun Lee, Seon Ho Kim, Jungchan Cho, Sang Tae Choi, Sang-Il Choi

Abstract

We propose a novel text-guided cross-position attention module which aims at applying a multi-modality of text and image to position attention in medical image segmentation. To match the dimension of the text feature to that of the image feature map, we multiply learnable parameters by text features and combine the multi-modal semantics via cross-attention. It allows a model to learn the dependency between various characteristics of text and image. Our proposed model demonstrates superior performance compared to other medical models using image-only data or image-text data. Furthermore, we utilize our module as a region of interest (RoI) generator to classify the inflammation of the sacroiliac joints. The RoIs obtained from the model contribute to improve the performance of classification models.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_52

SharedIt: https://rdcu.be/dnwHx

Link to the code repository

N/A

Link to the dataset(s)

https://www.kaggle.com/datasets/aysendegerli/qatacov19-dataset

https://monuseg.grand-challenge.org/Data/

Reviews

Review #1

Please describe the contribution of the paper

The authors propose CPAMTG to effectively combine text information with image feature maps, aiming to improve segmentation performance in various medical imaging domains.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper introduces a new text-guided cross-position attention module (CPAMTG) that efficiently combines text information with image feature maps.
- The authors conducted comprehensive experiments on three medical datasets (MoNuSeg, QaTa-COV19, and SIJ dataset) to evaluate the performance of their proposed method.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- A key weakness of the paper is that the performance improvement achieved by the proposed method, CPAMTG, is relatively small compared to other methods. The difference in performance between the proposed method and other techniques is not significant enough to convincingly demonstrate the superiority of CPAMTG.
- The authors have not explicitly provided examples of the text annotations added to the slices in the sacroiliac joint (SIJ) dataset.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors are plannig to provide the codes via GitHub.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- Describe the type of information included in the text annotations and provide examples.
- It is essential to report measures of statistical significance when comparing the performance of your proposed method to existing techniques.
- Including measures of spread, such as standard deviation or interquartile range, in the tables would provide additional information on the variability of the performance metrics.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The technical novelty of the work presented in the paper appears to be quite limited, and the improvements in segmentation performance are minimal at best. Although the proposed text-guided cross-position attention module (CPAMTG) attempts to combine textual and image data to enhance medical image segmentation, the results seem insufficient to justify its adoption over existing techniques.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

This paper proposes a new text-guided cross-attention module for assisting medical image segmentation that is more effective at learning position information than existing text-image methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The motivation behind the study is compelling since knowing how to combine text and image information is critical for effective medical image segmentation. The paper selects “position information” as a means of bridging the gap between text and image modalities, which is a smart choice. The CPAM module’s design focuses on cross-attention to the position attention module and is well-executed. The extensive experiments conducted on multiple datasets (organs) support the paper’s conclusions, showing significant improvements with the CPAM module.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The paper lacks qualitative experiments, which could have shown which words in the text and which regions in the image were highlighted in deep learning predictions and whether they were well aligned.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

reproducibility is good
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

See weakness
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The nolvety of this paper (e.g., choose position information to bridge the gap between text and image modalities)
Reviewer confidence

Somewhat confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

I appreciate the paper’s motivation and the thoughtful model design. However, I concur with Reviewer #3’s viewpoint regarding the rushed nature of the writing.

Review #3

Please describe the contribution of the paper

The authors propose a new text-guided cross-localization attention module designed to apply multimodality of text and images to localize attention in medical image segmentation.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The RoIs obtained from the model help to improve the performance of the classification model.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The formatting of this paper has so many obvious errors that I suspect the author was in a hurry and did not simply check the whole thing before submitting. For example, the first Reference is “ **: ****. **** (), **** (*)”
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Should be able to reproduce
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

This paper needs to be enhance as a whole.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

2
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper is undoubtedly going to be rejected.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

The writing of the paper needs to be polished before publication.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper proposed a novel text-guided cross-loc attention module to improve the medical image segmentation performance. The authors did a lot of experiments on three related datasets and validated the method can achieve very good results. However, there are some issues, too. The paper is probably written in a hurry. The writing and formats is poor. The performance gain brought by the method is relatively small.

Author Feedback

We thank the reviewers for their time and effort in providing feedback. They have recognized our model’s novelty (R1, R2) and effectiveness (R1, R2) in learning multi-modal information. We’re also delighted that our proposed module for addressing the text-image gap was highlighted as a “smart choice” (R2). Furthermore, the comprehensive experimental results on multiple datasets (R1, R2) and improvements in classification performance (R3) were acknowledged.

[R3] Typos and strange formatting, improper citation in the reference section. We will thoroughly refine and polish our entire paper in preparation for the camera-ready version. The reference ([1]) the reviewer mentioned pertains to our previous work. We’ve striven to strictly abide by the MICCAI guidelines, specifically Subsection 3.5 titled “Citing your own previous work,” in the document provided by MICCAI [A]. We’ve marked the citation with asterisks, as directed by the instructions. [A] K. Wong, Preparing Manuscripts for MICCAI: Avoiding Desk Rejects, The MICCAI Society, 2022.

[R1, R2] Explanation and effect of text annotation. The text annotations added to the dataset are the six types of sentences, providing information on the number, size, and location of the ilium: e.g., “There are two long iliums on either side.”, “There is a short ilium on right side.” An example of text annotation is shown in Fig. 1 as text(T). In the camera-ready version, we will add the above regarding text annotation to the main text.

[R1, R2] Comparison of segmentation performance. The main contribution of our approach lies in its proficiency to effectively segment target objects within medical images. Precise segmentation is particularly challenging when relying solely on pixel texture information, and our strategy utilizes textual information containing pertinent object information as a guiding factor. In Table 1, X-ray (Qata-COV19) and RGB images (MoNuSeg) are relatively straightforward in terms of texture, and it is easier to differentiate between target and non-target objects than MR images (SIJ). Our approach outperforms other methods, especially when dealing with images that contain target objects with more complex shapes (SIJ). As in Table 1, across all datasets, our model and the LViT model, which incorporate text information, exhibit superior performance compared to other methods that do not utilize text information. Although our model exhibits performance similar to LViT’s when dealing with X-ray and RGB images, it significantly outperforms LViT in handling the more challenging MR images.

[R2] Qualitative experiment showing the relation between the text’s words and the images’ regions. We acknowledge that analyzing the correlation between the image and word spaces was challenging due to using a lossy (non-invertible) text encoder of CLIP in our model. However, the results presented in Table 1 and Fig. 3 suggest that incorporating text information enhances regional attention within the image feature map. In our future work, we aim to analyze the spatial effects of each word using invertible text encoders to refine our understanding of this interaction further.

[R1] Report measures of statistical significance. We noted considerable differences in the accuracy (0.8473 vs. 0.7748, p<0.001) and specificity (0.8503 vs. 0.7424, p<0.003) when comparing the auto-determined RoI patch with the manually set RoI patch. These significant differences clearly indicate the superior performance of the automatically configured RoI patch. Furthermore, the results consistently demonstrated that our automatically set RoI patch either exceeded or matched the recall, precision, NPV, and F1 score of the manual RoI patch. However, considering that our focus lies predominantly on diagnosing either the presence or absence of a given condition, it poses a challenge to incorporate statistical measures such as standard deviation or interquartile range into this analysis.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal has mostly addressed my concerns. Considering all reviewers has selected accept, I also vote to accept it.

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The author addresses my concerns. I agree with reviewers 2 and 3 that despite the larger formatting issues the article is overall a workable article

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The key negative raised is the rushed writing - while not good, this is not a reason to reject the paper. I would be happy to accept if the authors thoroughly revise the writing, which they promise to do.

back to top

Text-Guided Cross-Position Attention for Segmentation: Case of Medical Image