Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yutong Xie, Lin Gu, Tatsuya Harada, Jianpeng Zhang, Yong Xia, Qi Wu

Abstract

Masked image modelling (MIM)-based pre-training shows promise in improving image representations with limited annotated data by randomly masking image patches and reconstructing them. However, random masking may not be suitable for medical images due to their unique pathology characteristics. This paper proposes \textbf{M}asked m\textbf{ed}ical \textbf{I}mage \textbf{M}odelling (MedIM), a novel approach, to our knowledge, the first research that masks and reconstructs discriminative areas guided by radiological reports, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge word-driven masking (KWM) and sentence-driven masking (SDM). KWM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify discriminative cues mapped to MeSH words and guide the mask generation. SDM considers that reports usually have multiple sentences, each of which describes different findings, and therefore integrates sentence-level information to identify discriminative regions for mask generation. MedIM integrates both strategies by simultaneously restoring the images masked by KWM and SDM for a more robust and representative medical visual representation. Our extensive experiments on various downstream tasks covering multi-label/class image classification, medical image segmentation, and medical image-text analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms ImageNet pre-training, MIM-based pre-training, and medical image-report pre-training counterparts. Codes are available at \url{https://github.com/YtongXie/MedIM}.



Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_2

SharedIt: https://rdcu.be/dnv9m

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a new masking method for image representation. Instead of randomly selecting masks in the image, it uses radiology reports to choose regions in the image for masking, which contribute more to image construction, guided by the radiology reports.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel method to use use radiology reports to guide what regions in the image should be masked, in the context of image representation and masked image modeling.
    • Comprehensive evaluation using other existing methods, and 3 large datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Performance improvement is largely negligible for classification and segmentation tasks
    • It is not clear whether the marginal performance improvement is statically significant
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper seems to be reproducible given the data and code are publicly available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The authors are commended on developing a new method, which uses radiology reports to guide image masking for image representation. The method has potential for a wide range of applications in medical imaging.
    • The writing of the paper can be improved. Examples for unclear sentences “While the random masking strategy is commonly used in current MIM-based works, randomly selecting a percentage of patches to mask.”, “Noted that the back regions in the generated mask will be masked.”
    • It needs some clarification when only a portion of labelled data is used. Is this self-supervised for pretraining?
    • Tables headers need more clarification
    • Results are not convincing that the new method is actually better. Showing if the results are statistically significant would strengthen the paper.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is novel, and has potential for a wide range of applications. The results however are not convincing.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    MIM has achieved great success in the medical field, but its strategy of randomly selecting a certain percentage of patches for masking may not necessarily be the most suitable for medical images. This paper proposes the MedIM model, introducing two mask strategies that only mask and reconstruct meaningful features in medical images, align the semantic correspondence between medical images and radiology reports, and reconstruct the areas masked under the guidance of the learned correspondence.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) A new model, MedIM, is proposed, which is the first work exploring the potential of radiology reports in medical image masking generation, providing a new perspective for improving the accuracy and interpretability of medical image representations. (2) Two mutually integrated masking strategies, KWM and SDM, are proposed, which can effectively identify discriminative clues at the word and sentence levels to guide the generation of masks. (3) Experiments demonstrate that the model performs well in four X-ray-based downstream tasks. (4) The text structure is rigorous, the ablation experiment design is complete, and the logic is sound.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The introduction section does not provide a sufficient background explanation. (2) The interpretation of the experimental results is inadequate.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The supplementary materials of this paper do not provide relevant explanations, it is suggested to provide reproducible evidence.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    (1) The Approach section could provide a more detailed overview of the model framework. (2) In the Introduce section, further elaboration on the application of MIM in the medical field and other mask strategies is needed. (3) In Table 2, the comparison experiment between different pre-trained models lacks an explanation of the evaluation metrics. (4) In the experimental part of Figure 3, the paper mentions “we compare it with three counterparts, No masking, Random masking, and Low-activated masking.” However, there is a lack of explanation on how “Low-activated” is defined.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper demonstrates a good overall level of innovation and presents satisfactory results. However, more explanation is needed regarding its reproducibility. I would recommend a weak acceptance.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors propose MedIM, a masking approach utilizing radiological reports to guide mask generation during pre-training. They introduce two effective strategies, KWM and SDM, and demonstrate the performance in medical image analysis and retrieval tasks compared to competing methods. Numerical results shown that the MedIM method outperformed other existing approaches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This is an interesting and useful direction, particularly for tasks with very limited labels and samples. Multimodal fusion of both the text information and the images will be useful for many image-based real world disease diagnostic scenarios.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It is not very clear to me how the method guarantees KWM and SDM can correctly localise (or align) the area referred by text content (e.g. cardiac) to the image, as it seems this fusion and alignment was done at the token level (from Fig 1).

    The metrics used in Table 1, although briefly introduced in the main content in page 7, is not clearly indicated in the table. I am not very sure which values in table one represent what metric.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Didn’t see the code or data links in the submission system.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Might be to consider further filtering the keywords for specific downstream tasks? - as some of the keywords should NOT give attention regions, e.g., a keyword ‘vascular’ cannot indicate specific attention region, alternatively it is more like a mesh of vessel.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method provides a novel angel and methodology for image and text fusion on specific medical imaging tasks. I believe it is of interests to some groups of readers of this conference. Further justifications (as mentioned in the weakness block) needs to be provided and the relevant content needs to be corrected before publication.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The exploration of this direction is both intriguing and valuable, especially in situations where there is a scarcity of labels and samples. The fusion of text information and images through multi-modal approaches holds great promise for real-world disease diagnostic scenarios. The paper demonstrates a good level of innovation and presents satisfactory results. However, there are several areas that require improvement and clarification before publication. This includes: (1) Provide a more detailed overview of the model framework. (2) The paper needs enhancement and in-depth details regarding the training process of the proposed model as detailed by reviewers to improve reproducibility; this includes training data detail, model details, etc. (3) Addressing comments regarding statistical robustness of results.




Author Feedback

We appreciate the great efforts and constructive comments. We are glad that all reviewers enjoy MedIM’s novelty and potential application. For example, R1 says, “The proposed method is novel, and has potential for a wide range of applications”. R2 claims it is “the first work …”. R3 took it as “a novel angel and methodology …”. R2 and R3 are also satisfied with our performance. In what follows, we clarify the reviewers’ concerns and will incorporate improvements in our final version.

Q1-Detailed overview (PC, R2) MedIM is a pre-training approach that leverages radiological reports for mask generation from a novel angle (R3). It aligns semantic correspondences between images and reports, and reconstructs masked areas based on this learned correspondence. It uses two unique masking strategies. Knowledge Word-Driven Masking (KWM) focuses on MeSH word tokens during mask generation, matching MeSH words in reports and extracting their representations. High-activated masking is then used to remove the discovered MeSH attention regions. Sentence-Driven Masking (SDM) integrates sentence-level information from reports to identify discriminative regions for mask generation. For each report, a sentence is randomly selected, its representations are extracted, and high-activated masking is applied. MedIM combines both strategies, designing a decoder to restore images masked by KWM and SDM, enhancing medical visual representations. After pre-training, MedIM’s weight parameters can be transferred to downstream tasks.

Q2-Experimental details (PC) To facilitate reproducibility and potential application, we will release our codes upon publication. Q2.1-Tables headers (R1) & Metrics (R2, R3) Table1- Segmentation (Seg) performance is evaluated on SIIM dataset using Dice score. Multi-label classification (Cls) performance is assessed on CheXpert dataset with mean AUC, and multi-class Cls performance on COVIDx dataset with Accuracy. Table2- `R@k’ represents the Recall of the corresponding image/report appearing within the top-k ranked images/reports, measuring retrieval performance [9]. Q2.2-Low-activated (R2) refers to masking the tokens exhibiting a low response in both KWM and SDM strategies.

Q3-Statistical robustness of results (PC, R1) Our pre-trained MedIM markedly surpasses random initialization, offering gains of at least 10.15% in Cls and 19.35% in Seg. It also outperforms other pre-training methods with minimum improvements of 0.77% in Cls and 2.01% in Seg, better than MGCA (NeurIPS’22)’s minimum gains of 0.3% in Cls and 0.2% in Seg. Through a two-tailed paired t-test comparing each method with MedIM across varying proportions of labeled data, we confirm the statistical significance of our improvements, as all P-values are under 0.01. MedIM vs. Rand/IN/MAE/GLoRIA/MRM/MGCA: 0.0018/0.0005/0.0002/0.0033/0.0006/0.0013

Q4-Clarification on label usage (R1) Following standard self-supervised methodology, we only use labels when fine-tuning downstream tasks. During pretraining, our model utilizes only medical images and associated raw reports, without explicit labels.

Q5-MIM’s medical application & other masking strategies (R2) Masked Image Modelling (MIM) pre-training has been successful in medical domain [24,5,20,4,11,6] such as chest X-ray and CT analysis. However, random masking, used widely in MIM-based studies, may overlook key features in medical images critical for early diagnosis. Target-aware masking like MST and AttMask, popular in computer vision, utilize high/low attention response but struggle with the intricacies of medical images. We propose using radiology reports to guide mask generation, which facilitates models in emulating expert focus on essential report-related features, thus refining attention in medical image representation learning.

Q6-Alignment (R3) MedIM employs the base model [17] to semantically link medical images and reports through instance-/token-/disease-level alignments.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    MedIM introduces a fresh perspective to pretraining by utilizing radiology reports to generate masks. By establishing semantic correspondences between images and reports, the method effectively reconstructs masked regions. This approach is innovative and holds promise for various applications. The majority of the reviewers express satisfaction with the method’s performance, and the rebuttal successfully addresses significant concerns, such as robustness and clarification of label usage and beyond. Considering these factors, I recommend accepting the paper



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The reviewers generally appreciate the novelty and potential of the proposed method for medical applications. The reviewers appreciate the novelty and potential of the proposed method, and the results are considered satisfactory. While there are areas that require improvement, such as providing a more detailed overview and addressing concerns about statistical robustness and other masking strategies, the authors adequately address most of the concerns raised in the rebuttal. The overall positive feedback from the reviewers justifies accepting the paper. Nevertheless, the authors should incorporate the necessary improvements and rebuttal improvements, but also further consider the inclusion of other masking strategies.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors largely addressed the concerns



back to top