Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zehua Ren, Yongheng Sun, Miaomiao Wang, Yuying Feng, Xianjun Li, Chao Jin, Jian Yang, Chunfeng Lian, Fan Wang

Abstract

Accurate segmentation of punctate white matter lesions (PWMLs) are fundamental for the timely diagnosis and treatment of related developmental disorders. Automated PWMLs segmentation from infant brain MR images is challenging, considering that the lesions are typically small and low-contrast, and the number of lesions may dramatically change across subjects. Existing learning-based methods directly apply general network architectures to this challenging task, which may fail to capture detailed positional information of PWMLs, potentially leading to severe under-segmentations. In this paper, we propose to leverage the idea of counterfactual reasoning coupled with the auxiliary task of brain tissue segmentation to learn fine-grained positional and morphological representations of PWMLs for accurate localization and segmentation. A simple and easy-to-implement deep-learning framework (i.e., DeepPWML) is accordingly designed.It combines the lesion counterfactual map with the tissue probability map to train a lightweight PWML segmentation network, demonstrating state-of-the-art performance on a real-clinical dataset of infant T1w MR images. The code is available at https://github.com/ladderlab-xjtu/DeepPWML.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_22

SharedIt: https://rdcu.be/dnwG0

Link to the code repository

https://github.com/ladderlab-xjtu/DeepPWML

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents a new approach for detecting and segmenting Punctuate White Matter Lesions (PWMLs) that builds upon existing methods that mainly rely on exploiting the structural information provided by T1-weighted (T1W) sequences. The authors propose to enhance these methods by adding new channels of information that provide attention mechanisms to the model.

    These new channels of information include: a) the output of the last layer of a model trained for segmenting cerebrospinal fluid (CSF), gray matter (GM), and white matter (WM), which is interpreted as the confidence in the segmentation and allows the model to guide itself in the inter-tissue regions, and b) a spatial probability map for the lesions. This map is automatically extracted using a simple and effective procedure inspired by counterfactual image generation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is easy to follow, and the ideas are well-presented. The authors have made an effort to ensure that the paper is understandable to a wide audience.

    • While the taxonomy of some of the ideas used is debatable (i.e., counterfactuals, uncertainty), the authors have opted for a simple approach to instantiate them. The paper serves as an example that it is not always necessary, especially in the context of limited extension works, to embark on complex development supported by incomplete theories that often attempt to hide basic defects in the proposal to efficiently use hot topic concepts in the field.

    • The proposed framework is easily reusable for other segmentation problems with sparse manifestations. Given the modularity of the proposed method, its extensions, particularly those related to uncertainty quantification and causality, are almost infinite and worthy of study.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • In my opinion, the biggest weakness of the paper lies in the validation.

    • The language used is often too vague and uncommon in the field of medical imaging (i.e., “cube”, “normal control data”, etc.).

    • The code is not available.

    • To avoid being overly redundant, I have expanded on the weaknesses and possible proposals to address them in the comments section to the authors.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Aside from some minor details about hyperparameterization, the code should be easily reproducible. However, having access to the code would be very beneficial to accelerate future work. The biggest barrier in this work is likely the implementation using TensorFlow, as the library is much less widespread among the community.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    About validation:

    • None of the metrics have an error mean. Please include them.

    • Are the metrics patch(cube)-wise or subject-wise? To make a fair evaluation, they should be subject-wise. If they are not, could you please include them?

    • Are the metrics used in detection, True Positive Rate (TPR), and Positive Predictive Value (PPV), at the lesion or voxel level? At the voxel level, they would be of little use for evaluation. At the detection level, neither metric includes information on True Negatives (TN), which suggests that only patches with PWML have been used, representing a serious problem. Please clarify if they are at the lesion level, include them if they are not, and also provide more informative metrics in this case, such as AUC or F1.

    • If similar datasets are available, it would be essential to analyze the model’s inference on them. These types of images are highly sensitive to different domain shifts.

    • If there are no more PWML datasets, it could be a good alternative to consider datasets where the manifestations appear quite similar. e.g., Cortical Lesions. The results are compared with two methods specifically developed for the same problem, PWML segmentation. However, the model that probably best represents the state of the art for segmentation problems, nnUNet [1], is overlooked. Moreover, it could probably be very useful as a backbone. Please include this comparison in the results. By the way, why isn’t PPV given for “Refined Segmentation R-CNN”?

    • It is not specified if the different reference methods or the implemented one use Data Augmenting techniques. It would be very informative to see the performance in these cases and in different parts of the pipeline, for example, what happens if the image is modified with the more traditional transformations used for data augmenting until modifying the classifier’s output instead of using the conditional autoencoder?

    • Why does Figure 2 show results of the baseline method but not of “Refined Segmentation R-CNN”? It should be included.

    • Both Figure 2 and 3 should include examples where there is no lesion.

    On lax concepts in the text:

    • Identifying the value given directly by the softmax output as uncertainty in charge of tissue segmentation can lead to error. After all, this magnitude is a normalization over the results of the last layer; it can be used as an approximation to per-voxel probability, but the literature on uncertainty and uncertainty quantification nowadays is much more advanced and can lead to error [2].

    • The same occurs with “Counterfactual/counterfactually”; the term begins to be overloaded. Generally speaking, it can be validated because it starts from a real image (factual) that is modified in a supposed scenario where the lesion disappears or appears. However, the definition of a counterfactual given a model is quite more complex, especially in the case of DL (this work can be quite helpful [3]). In fact, there are methods closer to a formal counterfactual in the literature [4-6], which, by the way, have not been cited, that can help clarify the language for authors and the community.

    • The entire text refers to “cubes” to define inputs, I understand that to emphasize the same and fixed size in all dimensions. However, I think the more appropriate term is still “patch.”

    • “The task is to mark every pixel…” probably is not needed to define the task at all, but in case you do, it is probably better to use “classify” instead of “mark.”

    • Instead of using “normal premature infants’ images” and “normal data”, I will use “control premature infants’ images” and “control data” to avoid using “normal” as a descriptor for a group of individuals.

    • “…this probability map naturally contains brain tissue anatomy information…” is an overstatement

    • “CF” abbreviation is not defined.

    Misc.

    • Is there a minimum threshold of lesion volume or relative volume compared to total lesion volume to consider a patch as positive or negative when using a sliding window approach to form patches?
    • Is it possible to use this approach with fetal images? [7]

    [1] Insesee et al. Nat. Met. 2021 [2] Abdar et al. Inf. Fus. 2021 [3] Monteiro et al. https://arxiv.org/abs/2303.01274. 2023 [4] Pawlosky et al. NeurIPS. https://arxiv.org/abs/2006.06485. 2020 [5] Gordaliza et al. https://arxiv.org/abs/2203.01668.2022 [6] Reinhold et al. https://arxiv.org/abs/2103.03158. 2021 [7] Payette, K. et al. https://doi.org/10.1038/s41597-021-00946-3. 2021

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is an example of “less is more” that can be very useful, but it is necessary to be able to carry out a fair evaluation of the results once the authors resolve the doubts raised.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors have addressed my major concerns, and I am pleased to reconsider my decision. However, it is crucial for further studies to include lesion-based metrics and clarify any vague concepts by referencing the relevant literature. These aspects should also be expanded upon for a more comprehensive understanding



Review #2

  • Please describe the contribution of the paper

    This paper proposed a deep neural networks framework to perform segmentation of Punctate White Matter Lesions (PWML) in preterm infants from T1-weighted (T1w) brain MRI. PWML is a typical type of cerebral white matter injury in preterm infants, potentially leading to psychomotor development delay, motor delay, and cerebral palsy without timely treatment. The biggest challenge of PWML segmentation is the contrasts of tissue-to-tissue and lesion-to-tissue are very low in T1w due to underlying immature myelination of infant brains. Because of that, automatic segmentation is prone to undersegmentation. To alleviate this problem, the authors proposed a deep neural networks framework named DeepPWML that leverages label-efficient counterfactual learning coupled with brain tissues segmentation. The proposed framework of DeepPWML outperformed previously proposed methods by 5.98% in Dice similarity coefficient (DSC).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of the proposed framework is it successfully produced better segmentation results of PWML. Segmenting PWML is hard due to low contrasts of tissue-to-tissue and lesion-to-tissue in T1w of brain MRI. The proposed DeepPWML employs clever tricks to solve this “ambiguity” (i.e. due to low contrasts) by providing the framework with counterfactual images and brain tissues segmentations.

    In general, the proposed DeepPWML framework is formed of 4 independent (yet simple) deep learning models, which are a tissue segmentation model (T-SEG), a patch-based classification model (CLS), a counterfactual generation model (CMG), and a PWML segmentation model (P-SEG). All these models are patch-based models (32 x 32 x 32). T-SEG model peforms brain tissues segmentation of T1w brain MRI (in patches), whereas CLS model performs classification of whether a patch contains PWML or not. Both T-SEG and CLS were trained independently. After fully trained, CLS was frozend and used to train CMG model. After fully rained, the CMG model would be able to generate counterfactual images of positive/negative patches (i.e. patches with/without PWML, respectively). Counterfactual images mean that positive patches of T1w brain MRI would be classified as negative patches by the CLS model, while negative patches would be classified as positive patches by the CLS model. After the T-SEG, CLS, and CMG were fully trained, P-SEG was trained by using the original T1w of brain MRI combined with the outputs of T-SEG model (i.e. probability maps of brain tissue segmentation) and CMG model (i.e. counterfactual images). In the inference/testing, all trained models were used to produce the segmentation of PWML.

    The evaluations were also done properly where both quantitative and qualitative results are provided. From both quantitative and qualitative results, it is clear that the proposed framework of DeepPWML outperformed previously proposed methods/frameworks. Ablation studies were also proposed to see which parts of the DeepPWML (i.e. T-SEG, CMG, or CLS) are important. It turned out that the combination of all models performed better than other possible combinations.

    Also, it worth mentioning that the proposed framework could be used in other segmentation tasks of small lesions in medical imaging.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weaknesses of this paper are some important information were not properly explained.

    1. Information regarding training was not complete. The authors only stated that the train/validation/test ratio was 0.7/0.15/0.15, but there was no information whether the ratio was on subject-level or patch-level and how many patches used/sampled in total.

    2. I personally think that explanation about counterfactual (CF) map was not properly defined (i.e. it was suddenly referred as CF) and explained. Explanation of CF map here means that the readers (including me) are currently not explained what to expect about CF map. My guess is that CF map will transform positive patch (with PWML) into negative patch (without PWML) and vice versa. Is this correct? I hope the authors could be more specific in this matter because counterfactual image generation is not (yet) common. Fig. 1. III (i.e. about Update CMG) also does not help too much in the explanation of CF map. In fact, I was a little bit confused of why CF map in Fig.1 (and also in Fig. 3) for both positive and negative patches have red dots/regions in them. What are they?

    3. All figures in the paper could be improved further by adding some context/information of each figure. Especially for the CF map, what are the red regions? Also, in Fig. 2, please add the citation for the “Baseline” method for readability.

    4. I personally think that it is misleading to refer the brain tissues segmentation produced by the T-SEG as “uncertainty” in the paper. While the tissues segmentation was given to the P-SEG as probability map, it technically provides location information to the P-SEG, not uncertainty. If the authors argue that the PWML was a different type of brain tissues, the T-SEG was trained for segmenting different tissues of the brain which are cerebrospinal fluid (CSF), gray matter (GM), and white matter (WM), not including PWML. Because of that, it is not correct to refer this probability map produced by the T-SEG as uncertainty information. Please change the “uncertainty” reference into something else (e.g. location information, etc.), especially in Table 1.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No concern

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Please see the weaknesses of the paper.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed framework is simple yet effective in segmenting small lesions in medical image analysis, which is quite common especially for brain lesions. I believe the proposed framework would be important/interesting for the MICCAI community.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors present a simple idea for segmentation task on PWMLs using generative counterfractual inference combined with an auxiliary tasks of brain tissue segmentation and disease classification. Compared with standard methods using general segmentation networks, such proposed method could learn fine-grained positional and morphological representations of PWMLs for accurate localization and segmentations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. A multi-task segmentation method is proposed using generative counterfractual inference combined with an auxiliary tasks of brain tissue segmentation and disease classification.
    2. The proposed method captures fine-grained positional information for PWML localization and segmentation.
    3. Avlations studies were also included to investigate the effectiveness of various modules.
    4. The proposed method also takes multiple inputs for improved performance on the PWML segmentation module.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1 Lack of comparison of performance with visualized results among the proposed multi-task segmentation method and other multi-task segmentation methods in literature such as https://www.sciencedirect.com/science/article/pii/S2666956021000428, https://www.frontiersin.org/articles/10.3389/fneur.2020.01008/full.

    1. Lack of description of results with central tendency (e.g. mean) & variation (e.g. error bars).
    2. Perhaps a cleared description/mathematical formular of how different losses were constructued/combined on various modules would be helpful.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducible

    1. Training and Evaluation codes available
    2. Model description included
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    For future work, I would recommend

    1. Comparison of performance among the proposed and state-of-the-art multi-task segmentation methods.
    2. Extension to CT and other modalities.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    A new method for fine-grained localization and segmentation is proposed with good results, more details on the state-of-the-art comparison and results with central tendency (e.g. mean) & variation (e.g. error bars) would be preferred.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    Authors have adequately addressed the major concerns on more complete evaluation, baseline comparison and implementation details. The paper is novel and contributes to MICCAI society.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper proposed a series of approaches to improve the segmentation of PWMLs. As mentioned by reviewers, the validation could be further improved, methodological details could be added such as the training parameters, and extra comparison results could be provided.




Author Feedback

We thank in-depth reviews and appreciate for affirming our contributions. The main concerns are addressed below. *[R1, R3]-Clarity of metrics & more complete evaluation 1) To measure PWML segmentation performance, we used three metrics (Dice, TPR, & PPV), which were quantified at the subject level (i.e., voxel-wise). Since we formulated the application as a segmentation task rather than object detection, detection accuracies (e.g., in terms of TPR & PPV) were not quantified in this work. But we do agree such measures are also helpful for PWML application and will add them as future works. 2) We’ll update the quantitative results to present all metrics as “mean (std)”. *[R1, R2]-Refine visualizations & update results of “Refined Segmentation R-CNN” 1) The counterfactual/CF maps for both positive & negative patches have activations (e.g., marked as red in Figs. 1-3), as our method performs bidirectional transformations to stabilize counterfactual learning. We’ll refine the figures by including no-lesion examples & more detailed context. 2) Following suggestions, we have conducted a more complete evaluation of “Refined Segmentation R-CNN” for comparison. Its quantitative performance is significantly lower than ours. We’ll complete the quantifications in Tab 1 & add the respective visualizations in Fig. 2. *[R1]-Compare to nnUNet & use it as backbone As required, we have trained a nnUNet to segment PWMLs. While its performance is significantly lower than our current backbone (denseUNet), e.g., Dice is 0.40 (0.37) vs. 0.65 (0.24), probably due to unoptimized nnUNet settings in our specific task of small lesion segmentation. The primary goal of this work is to verify the effectiveness of counterfactual learning to construct a simple but effective framework for PWML segmentation. We’ll more thoroughly test other backbones in the future. *[R3]-Compare to multi-task networks We have further added a multi-task brain segmentation network (3D-MASNet: 10.1002/hbm.2617) for comparison. Results show that our method led to better results, e.g., Dice of 0.72 (0.17) vs. 0.70 (0.22). *[R1, R3]-Potential evaluation on similar datasets/tasks Following the insightful suggestions, we did find a relatively similar benchmark of white matter multiple-sclerosis lesion segmentation from Shifts Challenge 2022. However, the direct inference of our trained model on this dataset and the fetal data mentioned by R1 obtained poor results, largely due to strong domain shifts. We’ll conduct more thorough training & evaluation for these tasks. *[R1, R2]-Precise description of tissue segmentation’role We agree that the ‘uncertainty’ expression is indeed not precise and using softmax output as uncertainty quantification is unreliable. We’ll replace it with ‘location information’ for clarity. *[R1, R2]-Clarity of “counterfactual” Counterfactual mapping, in our context, is learning a residual activation map for conversion between normal and PWML. We’ll further detail the description and cite related references. *[R1]-Code availability We’ll publically release our code in both TensorFlow and translated PyTorch versions; currently not available due to anonymization. *[R1]-Threshold of lesion size in patches The lesion is not thresholded, meaning that if any patch has at least one lesion voxel, it will be a positive patch.

*[R1, R2, R3]-Implementation details, e.g., data division, augmentation, & loss combination 1) The train/val/test sets were split at the subject level, i.e., patches from one subject will not exist in more than one set. 2) We didn’t use any morphological changing-based data augmentation during training. Thanks for the constructive suggestion and we’ll check this point in the future. 3) In our experiments, tissue segmentation used the voxel-wise Cross-Entropy Loss, classification used the Categorical Cross-Entropy Loss, & counterfactual mapping combined the sparsity loss (L1 & L2 norms) with the classification loss. PWML segmentation used the Dice.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors provided more details of the method as well as comparisons with other methods. Most concerns from reviewers were addressed in the rebuttal.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal has addressed the reviewers’ concerns regarding the implementation details and further evaluation studies. The main issue is clarity. Overall, this paper has novelty, and the performance is promising. The final version should be revised to address the reviewers’ concerns.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal addressed the reviewers’ concerns. In the camera ready version please include details, discussions and results as provided in the rebuttal



back to top