Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Santiago Gómez, Daniel Mantilla, Brayan Valenzuela, Andrés Ortiz, Daniela D. Vera, Paul Camacho, Fabio Martínez

Abstract

Localization and delineation of ischemic stroke are crucial for diagnosis and prognosis. Diffusion-weighted MRI studies allow to associate hypoperfused brain tissue with stroke findings, observed from ADC and DWI parameters. However, this process is expensive, time-consuming, and prone to expert observational bias. To address these challenges, currently, deep representations are based on deep autoencoder representations but are limited to learning from only ADC observations, biased also for one expert delineation. This work introduces a multimodal and multi-segmentation deep autoencoder that recovers ADC and DWI stroke segmentations. The proposed approach learns independent ADC and DWI convolutional branches, which are further fused into an embedding representation. Then, decoder branches are enriched with cross-attention mechanisms and adjusted from ADC and DWI findings. In this study, we validated the proposed approach from 82 ADC and DWI sequences, annotated by two interventional neuroradiologists. The proposed approach achieved higher mean dice scores of 55.7\% and 57.7\% for the ADC and DWI annotations by the training reference radiologist, outperforming models that only learn from one modality. Notably, it also demonstrated a proper generalization capability, obtaining mean dice scores of 60.5\% and 61.0\% for the ADC and DWI annotations of a second radiologist. This study highlights the effectiveness of modality-specific pattern learning in producing cross-domain embeddings that enhance ischemic stroke lesion estimations and generalize well over annotations by other radiologists.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_74

SharedIt: https://rdcu.be/dnwEr

Link to the code repository

https://gitlab.com/bivl2ab/research/2023-cross-domain-stroke-segmentation

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors develop a multimodal convolutional segmentation network that jointly segments ischemic lesions in ADC maps and DWI volumes. The proposed approach used two different encoders to extract features from the two inputs which are then concatenated and the combined features were fed to both decoders. A cross-attention mechanism is used to provide attention to regions in the encoder features based on the combined features at every level / depth of the network. The model was trained on a single institute dataset and the segmentation performance was evaluated using dice score, precision and recall. In the ablation experiments, the proposed architecture performed better than the individual baseline segmentation models and other variants.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The architecture is relatively simple and uses additive attention for the skip features from the encoder based on the combined features. Ablation experiments to understand the variation of segmentation performance with architecture choices.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors provide no comparison with existing approaches for stroke segmentation on ADC maps and DWI volumes in the literature. Hence it is hard to evaluate the advantage of this network architecture and the reported dice scores are not high (Table 1). There was a grand challenge (Ischemic stroke lesion segmentation - ISLES’22[1]) for multimodal MRI infarct segmentation in acute and sub-acute stroke in MICCAI 2022. The ground truth lesion segmentation was provided for DWI only and the dice scores were reasonably high for inter-rater agreement (0.57 - 0.98). Please look into methods used in this challenge (ex., https://arxiv.org/pdf/2209.09546.pdf). Though the task may be slightly different, consider reporting performance on this dataset additionally for benchmarking purposes.

    1. Hernandez Petzsche, M.R., de la Rosa, E., Hanning, U. et al. ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset. Sci Data 9, 762 (2022). https://doi.org/10.1038/s41597-022-01875-5
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Implementation details are missing; ex., number of filters at every level

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The term multitask architecture is misleading, the two are lesion segmentation tasks on two complementary inputs, consider rephrasing for clarity. Do not agree with what the authors refer to as architecture generalization study: training on the annotations of one reader and assessing the performance on the annotations from the second reader. The inter-rater variability is reasonably high (Fig.2 Kappa values). The models were trained and assessed on a single institute dataset. The multitask loss function is similar to a weighted binary cross entropy loss but the components seem to correspond to background voxels and it is unclear what corresponds to ground truth labels and model predictions.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The architecture seems like a close variation of Attention U-Net (Additive attention instead of multiplicative attention), there is no comparison with prior literature, making it hard to assess the value of this approach. As an application, the dice scores and sensitivities are not high.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The rebuttal addresses the concerns raised during review and the preliminary results look encouraging.



Review #2

  • Please describe the contribution of the paper

    In this paper, the authors propose an end-to-end deep learning architecture for acute ischemic stroke lesion segmentation from ADC and DWI imaging. This architecture leverages the principles of multi-modal and multi-task learning to address the challenges associated with inter-rater reliability, allowing for more accurate and robust estimations of the ischemic stroke lesion in both imaging modalities. The proposed model was compared to other variations of itself (by measures of DSC, precision, and sensitivity), outperforming variations that only learn from one modality to segment the ischemic stroke lesion. The authors conclude that by using complementary information from multi-modal imaging data, their proposed approach has the potential to improve the annotation of stroke lesions in clinical practice.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors propose a novel architecture that, although resembles a typical encoder-decoder-like network, distinguishes itself with the integration of cross-attention-based skip connections. While sharing commonalities with normal attention-based modules utilized in other image segmentation tasks (dois: 10.1007/978-3-031-20233-9_41; arXiv:2110.08811v2), the cross-attention-based skip connections allow for the sharing of information between MRI-modalities and have not yet been employed in the domain of ischemic stroke lesion segmentation to the best of my knowledge. In comparison to other multi-modal ischemic stroke lesion segmentation works, which predominantly use channel-wise convolutions to merge MRI-modalities (dois: 10.1016/j.mri.2022.06.001; arXiv:1803.05848v1), the proposed approach is relatively simple to integrate within a single model, meaning that it is theoretically compatible with the many encoder-decoder-like networks that have been optimized in other publications for stroke lesion segmentation. The method is evaluated on annotations from different radiologists, which demonstrates the generalization capability of the proposed model.

    The authors have also taken care to perform an ablation study of their model by comparing it against multiple variations of itself. This allows us to analyze the contribution of the proposed cross-attention-based skip connections on the segmentation accuracy at different stages of the framework.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The technical novelty of the proposed model is being overshadowed by the analysis of the radiologist annotation agreement, which is not really a contribution on its own. Such an inter and intra-rater analysis between ischemic stroke lesions of ADC and DWI has been done before (doi: 10.1016/S0730-725X(03)00087-0). Thus, rather than making it the main finding, it should serve as a motivation for the proposed work. Stronger, or at least more direct evidence could be provided to highlight the benefit of incorporating multi-modal imaging/annotations. For example, performing a statistical analysis (e.g., paired t-test) to determine if there is a significant difference between the model variations is needed.

    Some aspects of the justification for the study seem to be at odds with its implementation. In particular, the authors state in the introduction that “approaches that are calibrated to one-expert annotations introduce the possibility of learning expert bias.” However, it does not appear that their proposed segmentation model was calibrated on multiple-expert annotations. Instead, annotations were used solely during the evaluation phase, which does not address the possibility of learning single-expert bias. In addition, some of the authors’ conclusions do not necessarily appear to be supported by the experimental results. For instance, the authors suggest in the results section that “the dual path proposed architecture (Dual V1) achieves the best score”. However, the variation Dual V3 (DSC = 0.62) vastly outperformed their proposed Dual V1 model (DSC = 0.58) for the DWI segmentations, while performing similarly for the ADC segmentations. The authors should provide further discussion to explain these discrepancies.

    Finally, the authors did not acknowledge the limitations of their study, and several aspects of the data preprocessing and experimental analysis are not sufficiently described.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have provided a comprehensive description of the model, which includes mathematical formulations of the loss functions and feature representations, as well as a clear graphical illustration. It would be helpful if they could provide a brief explanation of how they applied their data preprocessing in their experiments and provide code for their cross-attention module.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    To better align the experiment with its justification, I can think of one possible recommendation. If the author’s objective is to calibrate the model to include multiple-expert annotations, then I would expect to see different annotations being used during the training of the model. This can be achieved by randomly selecting annotations from either of the two radiologists during the training process and calculating the loss accordingly. An approach like this would ensure that the model is trained to learn from diverse perspectives and can generalize well to new data. Based on the high inter-rater agreement, I believe that this approach is unlikely to negatively impact the performance of the model. Otherwise, using the current setup, it is unclear how the authors are avoiding expert bias.

    There are several areas where the study could benefit from providing more clarity. (1) The authors should provide the value of the class weight maps that were used to increase the contribution of lesion voxels. (2) It is unclear why control patients were included in the experiments, and whether the images were acquired at baseline or not. The study would benefit from providing information on the number of slices per patient, the spatial resolution of the images, and other relevant acquisition details. (3) Were additional preprocessing steps (e.g., registration, skull stripping) performed to the original images? (4) A more detailed description of the data augmentation steps implemented is missing from the study. (5) The authors should specify whether the evaluation metrics were computed for each 2D slice or for the patient as a whole, and which deep learning library was used to build models (e.g., PyTorch, TensorFlow).

    Regarding the interpretation of the authors’ results, I would strongly recommend incorporating statistical analyses to provide more insights into the experimental differences and to distinguish them from random error. For instance, a paired t-test could be conducted to examine the significance of differences between experimental conditions. Additionally, including a Bland-Altman plot could help to visualize the agreement between the methods being compared and support their findings, specifically when the authors say that “the proposed approach achieved adequate localization without overestimating the lesion….” Measures of volume similarity would help too.

    To further enhance the proposed approach, an interesting avenue for future research could be to explore the potential benefits of employing a cross-attention mechanism to combine the latent representations of both modalities, rather than simply concatenating them. Prior research on multi-modal fusion (dois: 10.1007/978-3-031-16443-9_14; 10.1007/978-3-031-16443-9_47) has shown that cross-attention can effectively facilitate this process, yielding notable improvements in performance.

    Apart from that, there are a few formatting errors that should be addressed:

    • Table I is hard to read. I would recommend reducing the size of the numbers if possible.
    • In section 2.3, “224^2” should read “224 x 224 pixels” instead.
    • To improve the clarity and organization of the paper, I recommend introducing the ablation studies in the experimental setup (section 2.3) rather than in the results section.
    • The running header title is suppressed due to length on odd-numbered pages.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper shows novelty in its particular approach to fuse information from multi-modal MRI datasets for a multi-task segmentation problem. However, the experimental evaluation of the proposed model is limited, and some changes to the manuscript are required to communicate the value of their experiments clearly. I see the potential for a very interesting paper, especially considering the effort that was obviously required to evaluate so many variations of the proposed method. Overall, the paper shows promise as a potential contribution in ischemic stroke lesion segmentation from multi-modal MRI.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    A multimodal and multitask deep autoencoder approach which recovers ADC and DWI stroke segmentations has been proposed. This study highlights the effectiveness of modality specific pattern learning in producing cross-domain embeddings that enhance ischemic stroke lesion estimations and generalization.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.The methodology is novel as it explores and model the complementary lesion annotations from DWI and ADC modalities.

    1. The authors claim that some challenging lesion cases can be segmented using the proposed approach.
    2. Proposed architecture generalizes the trained model with one reference radiologist and evaluates with respect to 2 radiologists’ annotations.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The platform on which the analysis is being conducted is not specified.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Mathematical computations and the description about the dataset are clearly mentioned in the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Using optical distortions in data augmentation will change the lesions orientations and
    2. Is the architecture trained as a single channel CNN architecture or 2 channels (1 with DWI and other with ADC).
    3. If the architecture is trained with a single channel, is there a possibility to train both DWI and ADC using 2 channels?
    4. What were the number of slices in each sequence? What was the final training image dataset size?
    5. There are variants of autoencoders, can using sparse autoencoders have any impact on the current study?
    6. Does the current study uses axial, coronal and sagittal planes of DWI and ADC sequences?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is very well written, and the approach is novel and has clinical significance.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviewers have different opinions on the novelty of the work. Furthermore, the insufficiency of evaluation is also raised as a concern. Please address these comments in your rebuttal.




Author Feedback

The proposed approach exploits delineations from DWI and ADC to discover complementary radiological findings. We carried out a t-test analysis between different versions of the proposed approach, finding statistically significant differences between Unimodal and Dual models, while the distributions among different Dual models have a similar statistical performance. This analysis was extended to compare the masks of the models with respect to expert radiologists. In this study, the masks of the Dual V1 model have the same distribution (p > 0.05) as the reference dice scores over both ADC and DWI. Similarly, the Dual V3 model masks follow the same distribution as the reference dice scores, but only on DWI. Complementary, Bland-Altman shows that Dual models exhibit small and clustered differences around the mean, with a mean difference closer to zero. In contrast, single models show more spread-out differences and a higher mean difference compared to Dual models.

To bring clarity to the paper: the architecture receives ADC and DWI inputs (each with size HxWx1), which thereafter are processed by autoencoders with five convolutional blocks (with 32, 64, 128, 256, and 512 filters for both networks and 1024 filters in the bottleneck). Regarding training and evaluation, we included studies from 82 subjects (40 for training). Control patients with stroke symptoms were included to diversify tissue samples, potentially enhancing our models’ ability to segment stroke lesion tissue. Each study has between 20 to 26 slices with resolutions ranging in each axis as x,y=[0.83-0.94] and z=[5.50-7.20] mm. To project axial slices to the net, the spatial resolution was resampled to 224x224 px. Registration and skull stripping were not considered in this study. The class weight map values were set to 1 for non-lesion voxels and 3 for lesion voxels. The data augmentation includes adding random brightness and contrast, random flips, small rotations, and optical distortions. The TensorFlow implementation will be made available.

Further evaluation of the architecture with public datasets can support the proposed network’s contribution. To this end, we adapted the architecture to leverage one mask and extended experiments with public datasets. We achieved a dice score of 0.34 (ADC+Tmax), and 0.36 (ADC+TTP+Tmax) in ISLES17, being the second-best architecture reported in this dataset. We also preliminary validated one branch of the proposed approach with ISLES22, achieving a dice score of 0.61 over a 5-fold cross-validation. To obtain such results during this rebuttal period, we carry out experiments with a resolution of only 2 mm isometric, and without an exhaustive hyperparameter search. A more careful validation will be attended for the final version of the manuscript.

Finally, to be precise about the scope of the paper contribution, we agree that we study inter-rater variability, rather than constructing a generalizable architecture. We train with annotations from one expert and test with two experts. We decided to remove as a contribution the analysis among radiologists. The term “multitask architecture” will be changed by “multi-segmentation architecture”. Also, we noticed that the formula for the weighted focal loss was missing an alpha (α=2) term. Regarding the performance of Dual models of the proposed approach, the Dual V1 was selected over the Dual V3 because of the consistent dice score in both ADC (V1=0.58 vs V3=0.53) and DWI (V1=0.6 vs V3=0.62) for both radiologists. However, the Dual V1 doesn’t have the best performance in all experiments (DWI), hence, conclusions will be adjusted to be precise with the observations in the tables. Also, the final manuscript will include perspectives and related limitations of the current study. For instance, although this approach may be able to identify possible complementary radiological findings that impact a better stroke delineation, it introduces potential challenges in scalability.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper addresses an interesting clinical application and all reviewers found the work clearly and well presented. The authors have addressed the concerns in the original review and R1 has raised the score from 3 to 5. Congratulations!



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes a multi-modal joint segmentation methods using both ADC maps and DWI volumes. The rebuttal has well addressed the major concerns and the merit of this paper makes it worthy to be accepted by MICCAI.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents a multimodal convolutional segmentation network that jointly segments ischemic lesions in ADC maps and DWI volumes. There were concerns about the novelty of the proposed method and the insufficient evaluation. However, its seems that reviewers’ concerns were convincingly addressed in the authors’ feedback. In your camera-ready version please include the clarification, comparisons, and details provided in the rebuttal.



back to top