Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Ashwin Raju, Micha Kornreich, Colin Hansen, James Browning, Jayashri Pawar, Richard Herzog, Benjamin Odry, Li Zhang

Abstract

Automated magnetic resonance imaging (MRI) pathology localization can significantly reduce inter-reader variability and the time expert radiologists need to make a diagnosis. Many automated localization pipelines only operate on a single series at a time and are unable to capture inter-series relationships of pathology features. However, some pathologies require the joint consideration of multiple series to be accurately located in the face of highly anisotropic volumes and unique anatomies. To efficiently and accurately localize a pathology, we propose a Multi-series jOint ATtention localization framework (MOAT) for MRI, which shares information among different MRI series to jointly predict the pathological location(s) in each MRI series. The framework allows different MRI series to share latent representations with each other allowing each series to get location guidance from the others and enforcing consistency between the predicted locations. Extensive experiments on three knee MRI pathology datasets, including medial compartment cartilage (MCC) high-grade defects, medial meniscus (MM) tear and displaced fragment/flap (DF) with 2729, 2355, and 4608 studies respectively, show that our proposed method outperforms the state of the art approaches by 3.4 to 8.0 mm on L1 distance, 6 to 27 percent on specificity and 5 to 14 percent on sensitivity across different pathologies.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_25

SharedIt: https://rdcu.be/dnwJH

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper
    1. Introduce a multi-contrast (different image orientations) framework with Transformer models for MRI processing
    2. Design a Transformer-based decoder to allow localization of pathology across MRI contrasts/modality;
    3. Test the proposed method on three knee pathologies
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Multiple SOTA methods were compared
    2. Integration of multi-contrast scans with different orientations for image processing
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The major contribution of the manuscript is to integrate multi-sequence data (different contrasts and orientation), but there is no validation on the benefit of doing so.
    2. The architecture shown seems to only take 2 contrasts while the experiment claims to use three different sequences (two different contrasts and three different orientations)
    3. Co-registration of different contrasts may mitigate the complexity of the presented solution
    4. The general techniques including masked self-attention is not novel as claimed while the application-specific setup is questionable considering the mixed types of input data
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The private data included and the limited details on network architecture can affect the reproducibility of the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The general technique is not well described, including the ground truth generation, rationalization of the techniques that meet the claims, and the pre-processing details.
    2. There seem to be other better (less complex) solutions if additional preprocessing is included, such as image co-registration and selection, etc for the same task.
    3. From Fig 1, it is hard to tell that the network can take more than 2 series while the author used 3 different sequences.
    4. Instead of a point-wise regression problem, maybe a bounding box setup will be more appropriate.
    5. No statistical tests were conducted to further confirm the comparison results
    6. There are no image demos for the input images, ground truths, and the results
    7. There is no validation/ablation studies regarding the benefit of including multiple image contrasts. With deep learning, maybe one contrast is sufficient
    8. The caption of Table 1 does not correspond to the content of the table (e.g., negative values, etc.)
    9. While the author claims the name of Masked self-attention, the concept and term have been used widely for transformer models
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. Lack of sufficient novelty
    2. The proposed framework may be better formed by including additional preprocessing (e.g., co-registration)
    3. The proposed architecture does not seem technically sound with respect to the input data described
  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    From Fig 1, it shows that the network can only take 2 series while the author used 3 different sequences (further modification of Fig1 is not described in the rebuttal). From the rebuttal, it seems like the orientation of the images also won’t matter. However, if no rough linear co-registration is performed, this may require particular training strategies for point location regression and is not mentioned. The influence of different image orientations should be discussed. Performance of a single-contrast setup should have been compared. Finally, there is no statistical tests to further confirm some of the metrics (e.g., L1).



Review #4

  • Please describe the contribution of the paper

    The paper presents a multi-series joint attention framework (MOAT) which enables the use of different MRI series to jointly predict pathological location(s) in each MRI series. Validated on three knee MRI pathology datasets. The authors demonstrate that MOAT outperforms existing techniques by 3.4-8.0 mm on L1 distance, 6-27% on specificity, and 5-14% on sensitivity for various pathologies.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novel: The main novel contribution is the simultaneous use of multiple MRI series for pathology localization through Masked Self-Attention. The framework allows different MRI series to share latent representations with each other, which improves the accuracy. This is not dissimilar to how human radiologists read scans. Performance gain: The authors show that the proposed framework outperforms several state-of-the-art approaches by a significant margin in terms of L1 distance, specificity, and sensitivity across different pathologies (all knee MRIs). Good validation: The paper provides extensive experimental results on three knee MRI pathology datasets, including medial compartment cartilage (MCC) high-grade defects, medial meniscus (MM) tear, and displaced fragment/flap (DF), demonstrating the effectiveness and efficiency of the proposed framework. Clear methodology: The methodology is well-explained and well-presented, and the paper provides sufficient technical details to enable the replication of the experiments.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Missing negatives: Negatives are missing in both validation and test sets; it would be interesting to look at false positives performances added to Table 2 on scans completely free from pathologies.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have not provided the following “An analysis of situations in which the method failed.”

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Overall, the paper is very well presented and I have only few comments. Fig. 1: The “pathology localization decoder” is a bit hard to read and details of it how it is implemented can be expanded upon in the caption. A brief explanation is mentioned in Section 2.4 but I think it would be better to walk us through how it works using the figure via the caption. Eqn. 2: B is said to be “a mask tohandle missing series and it shares the same equation as 3.” but how does is exactly work for something that is missing. Is it not better to just assume 0 for missing cases?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper presented a novel and well validated approach to localize pathologies, in this case validated on knee MRIs but easily applicable to other imaging. The one minor point is that the paper’s experimental results are based on only knee MRIs datasets, which may not be representative of other pathologies or imaging modalities. Overall, the method is clear and solves a problem where multiple scans series are typically left unused in standard approaches. I believe this paper presented a strong novel idea that will prove to be useful in other areas in our community.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #6

  • Please describe the contribution of the paper

    This paper introduces a multi-series joint attention localization framework for MRI, which facilitates information sharing across different MRI series in order to collaboratively predict pathological locations within each series.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of this paper are listed here,

    1. A new framework that enables the use of multiple MRI series at the same time and shares pathology information across different series through masked self-attention.
    2. A transformer-based decoder model is developed to predict consistent locations across series in an MRI study.
    3. The proposed method was evaluated on three knee pathology datasets and demonstrated its effectiveness and efficiency.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weaknesses from my side come from the 2.5 Loss function section. For example,

    • (a). In Eq. (6), the Huber loss is employed to penalize the predicted location. Why did this loss function is chosen? Is it proposed by the authors? If not, please add some references.
    • (b). In Eq. (7), I found the $lambda$ values for this margin loss. Could you please give some explanations of how did you choose these values?
    • (c). In Eq. (9). The total loss function contains four different components, and it will be helpful to conduct an ablation study for each component. For the rest part of this paper, please refer to item 9 in the following for revision.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility of this paper can be achieved by following the paper. It will be helpful if some sample implementation is provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. In Table 2, some methods are reported with *. What does this * mean?
    2. In Table 3, why this CBAM method is selected, and make a comparison? Could you please give more explanations?
    3. In Table 4, I think the weight factor of 0.01 leads to the lowest distance 5.1. Could you please double-check it?
    4. In Page 2, top part, it is said: “… Hourglass-based methods can be overly resource-intensive when applied to 3D volumes…”. Please add some references to this statement.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is a well-written paper, and the idea is clearly presented. The proposed method is evaluated on the knee MRI dataset, and achieve the best performance compared with other state-of-the-art methods. Thus, I recommend to accept this paper.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I read the authors’ responses, and I decide to maintain my rating. Thank you.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This study focuses improving pathology localization in multi-paramteric MRI (refered by the authors as multi-series MRI) using joint attention. Decoupling the different series in the MRI is valuable, because each may provide complementary information, many aspects remain unclear in this study, thus reducing the emthusiam. Strenghts include the independent consideration of the different series, evaluation in three different tasks, comprehensive evaluation, including distance calculation, sensivity and specificity. Weaknesses include lack of clarity regarding the labels (e.g., “location” - is it a point, is it a segmentaiton, what is an MMC defect), lack of visual representation of results and ground truth, Table 4 minimum distance. Moreover, as pointed out by reviewer 1, the registration between the different series is not discusses or considered (these multiple series in an MRI are aquired within the same acquisition, and are relatively well aligned). The study has merits, but the above mentioned weaknesses need to be addressed.




Author Feedback

Need for Multiple Contrast Proton density (PD) images offer detailed anatomical information, particularly for evaluating menisci due to their high sensitivity to fibrocartilage signals. Fluid-sensitive sequences like T2FS or PDFS are crucial for visualizing acute injuries, including edema, meniscal tears, ACL injuries, and cartilage defects. To encompass a wide range of studies in model development and testing, both PDFS and T2FS contrasts were included, as multiple image contrasts enhance optimal pathology diagnosis.

Need for Point-wise Regression Our study utilized landmark point annotation for pathology localization, which proved to be more efficient compared to ROI box annotation, allowing Subspecialty Radiologists to label landmark points accurately while maintaining the visibility of anatomy/pathology. This method enabled annotators to label the entire extent of pathology with minimal annotation time. Furthermore, whereas it is not easy to define the exact extent of the three knee pathologies for many cases in this work, point-wise regression can alleviate label ambiguity greatly. In addition, fewer output variables of the point-wise regression also reduce the model complexity.

Self-attention vs Co-registration Reviewer 1 asked whether co-registration could be used as a simpler alternative to our self-attention approach. To answer this question, the task of co-registering different sequences in an MR study comes with its own complexities , ascribed to two challenges: 1) Establishing correspondences with missing information due to the large slice spacing especially for registration between sagittal and coronal sequences; 2) The necessity of non-rigid registration due to patient knee movements If a co-registration is used as the preprocessing step, the registration error would be added on top of the localization error. Moreover, even if all the sequences are well-aligned, combining and utilizing signals from different MR sequences at the same spatial location is not trivial. The proposed joint-attention method addresses this difficulty through a seamless end-to-end training scheme.

Ablation Studies and Choice of Certain Components R6 had asked for some clarifications regarding the models chosen for the conducted ablation study. CBAM was chosen as it utilized a different attention mechanism, which predated the more recently popularized self-attention approach in imaging. In addition, we employed the statistically robust Huber Loss to mitigate the effect of outliers in our ground truth. This was motivated by previous studies, which highlighted significant inter-reader disagreements between expert annotators, as well as false positives ( [anonymized], [https://link.springer.com/article/10.1007/s00330-009-1298-5]). This clarification was now added to the manuscript, along with our un-anonymized reference. Regarding the question about the selection of different “lambda” hyperparameters by R6, we based our decision on the performance metric obtained from the validation dataset.

Handling Missing Series In Equation 2, B becomes -infinity for the series that are missing and while taking softmax, the attention becomes 0 for these series.

Minor Changes: We carefully considered all minor changes and sections requiring more detail, incorporating the suggestions from the reviewers.

As mentioned by R1, we have updated the supplementary paper with the visualization of the prediction and ground truth.

Figure 1 has been modified as per the recommendations of R1 and R4.

The “*” in Table 2 refers to models that were trained using the best hyper-parameters for each dataset, which confused R6, has been clarified in the updated caption.

A typo in Table 4 has been corrected, and a reference has been added to the description of “Hour-glass methods” as highlighted by R6.

As requested by R4, Negatives are added to the test dataset and Table 2 was updated with studies having negatives as well.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This is a great study, that we welcome to this year’s conference.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal helped to clarify some of the concerns about the paper. While there are still concerns from reviewer 1, the other two reviewers support the publication of the paper.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After integrating all information, and reading the paper (which is not at the best clarity), AC tends to more agree with reviewer 1. There are many places unclear. There should be some pictures showing how the defect location is detected. Given the large number of available data, and MRI imaging protocol with less flexible pose, the defects should be not too hard to detect?

    “1. The general technique is not well described, including the ground truth generation, rationalization of the techniques that meet the claims, and the pre-processing details.

    1. There seem to be other better (less complex) solutions if additional preprocessing is included, such as image co-registration and selection, etc for the same task.
    2. From Fig 1, it is hard to tell that the network can take more than 2 series while the author used 3 different sequences.
    3. Instead of a point-wise regression problem, maybe a bounding box setup will be more appropriate.
    4. No statistical tests were conducted to further confirm the comparison results
    5. There are no image demos for the input images, ground truths, and the results
    6. There is no validation/ablation studies regarding the benefit of including multiple image contrasts. With deep learning, maybe one contrast is sufficient
    7. The caption of Table 1 does not correspond to the content of the table (e.g., negative values, etc.)
    8. While the author claims the name of Masked self-attention, the concept and term have been used widely for transformer models.”



back to top