Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Qinghua Zhang, Zhao Chen

Abstract

Semantic segmentation of whole slide images (WSIs) helps pathologists identify lesions and cancerous nests. However, training fully supervised segmentation networks usually requires plenty of pixel-level annotations, which consume lots of time and human efforts. Coming from tissues of different patients with large amounts of pixels, WSIs exhibit various patterns, resulting in intra-class heterogeneity and inter-class homogeneity. Meanwhile, most existing methods for WSIs focus on extracting a certain type of features, neglecting the relations between different features and their joint effect on segmentation. Therefore, we propose a novel weakly supervised network based on tensor graphs (WSNTG) for WSI segmentation. Using only sparse point annotations, it efficiently segments WSIs by superpixel-wise classification and credible node reweighting. To deal with the variability of WSIs, the proposed network represents multiple hand-crafted features and hierarchical features yielded by a pretrained Convolutional Neural Network (CNN). Particularly, it learns over the semi-labeled tensor graphs constructed on the hierarchical features to exploit nonlinear data structures and associations. It gains robustness via the tensor-graph Laplacian of the hand-crafted features superimposed on the segmentation loss. We evaluated WSNTG on two WSI datasets, DigestPath2019 and SICAPV2. Results show that it outperforms many fully supervised and weakly supervised methods with minimal point annotations in WSI segmentation. The codes are published at https://github.com/zqh369/WSNTG.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_25

SharedIt: https://rdcu.be/cVRrH

Link to the code repository

https://github.com/zqh369/WSNTG

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors are addressing the well-known challenge of annotating medical data, in this case, the annotation of large-scale Whole Slide Images (WSI) for semantic segmentation. For this, the authors combine the idea of superpixels with sparse point annotations and graph networks. They propose a new architecture with three paths including different approaches (Handcrafted features + graph network, learned features + DL-Classification, learned features + Graph Network).
    The overall network architecture is evaluated using two public datasets. They conduct a comparison with other fully-supervised and semi-supervised methods and include an ablation study to the data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    • The authors proposed an interesting approach that combines different techniques and ideas, including such as learned and hand-crafted features, neuronal classifiers and graph classifiers. • The paper is generally well-written and good to understand, I really enjoyed reading it. (Although there is still room for improvement) • The authors indicate that they will release the source code of the paper. This, and the fact that the data is public lead to reproducible results.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    • While the evaluation is done on two datasets, which is a clear plus, there are some shortcomings in the evaluation which reduce its value of it. For example, the comparison methods are not in the current state of the art but were published in 2019 (with one exception); only fractions of the datasets have been used during training, and no measures for uncertainty are given. • The overall idea of using superpixels and graph networks is not new. Also, the type of annotation has already been used. (Of course, the authors do not claim this as a novelty) • The authors used only parts of the training data due to hardware limitations. In general, the method seems to be rather computational expensive compared to the benefit. • It is unclear how the meta-parameters were obtained.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Of course, the paper itself is not reproducible, as too many details are not described (Likely to space limit). But the general ideas are well-reported and easy to get. The authors suggest that they will release the source code for the experiments and used public data. This should guarantee a high degree of reproducibility for the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    (Major + Minor) Points that should be addressed within the review:

    1. The proposed methods does have a significant number of hyper-parameters. How were they parametrized? Using the optimal value overall experiments? Please comment (and if possible, include this information in the paper)
    2. Please comment on the computational cost for the algorithm, especially as you haven’t been able to use all training data. This is especially interesting for the additional cost by l1, which really adds only slightly to the overall performance.
    3. Please provide some measures of uncertainty for the experiments. For example, by calculating a confidence interval using Bootstrapping.
    4. In Table 1, the results of WSNTG are all bold, even if other results are higher. This should be changed, it is very misleading.
    5. There are too many variables defined. A significant number of all the variables defined in the paper are only used once, during their definition. This makes it really hard to remember the important variables. Please reduce the number of variables.
    6. The paper is generally well-written. However, there is still room for improving the language. For example, active voice is often used like “WSNTG does …..” which should be formulated in passive. Also, the use of “the” is often wrong.
    7. On page 5 is a type. I assume FAAM should be GAAM

    While this is most-likely too much for the revision, I would suggest addressing the following points before submitting this work again (to another conference or a journal):

    1. Using two datasets is already quite good, however, I would suggest including more of the publicly available datasets in this field. This would make the evaluation even stronger.
    2. Include more current state of the art methods, for example YAMU by Samanta et al.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is interesting and well-written. While the main idea is not new, the proposed solution seems to be well-engineered with some additional tricks. The evaluation is generally good, but has some limitations in terms of comparison methods and computational cost.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    Again, I need to state that I really enjoyed reading the paper. It is, after all, a solid research paper with interesting ideas. So I do not agree with Reviewer #3, as I think the paper is well-written (although the topic is definitively not easy) and worth being presented. However, I still do not see that the paper stands out from the crowd and the author’s response to the reviews did not help to improve the impression at this point. However, after reading it again and given the fact that the authors really tried to address major points like the missing uncertainty, I slightly improved my rating (also given the comparison to two additional late reviews). So in general: I recommend the acceptance of the paper but given the high competition at MICCAI, I would totally understand if it not being accepted. In this case I would love to see it being published at a different conference of as a journal, which might be even better suited for such a complex topic.



Review #3

  • Please describe the contribution of the paper

    This work propose to 1) use superpixel classification to split patch into super pixels 2) extract features per super pixel, 3) define graphs based on which, multiple losses are defined in addition to segmentation cross entropy loss. One of the additional losses if node classification.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method seems to be novel with a complexed setting. The results of the proposed method are better than the baselines.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The explanation of the proposed method is hard to understand. The paper is not easy to read.

    For the tensor graph learning, it is hard to follow the explanation without some clear definition. For instance, “Graph adaptive and aggregation module (GAAM) learning combines and adjusts between different relations encoded in multi-relational graphs.” There’s no equation to explain how different tensors interact.

    Also for the tensor graph, it is not clear how to define the edge?

    Regarding the loss, it is defined as a trace of a matrix, as it is not a traditional loss such as cross entropy, it would be better to explain the meaning of this loss, or cite a reference.

    For the node reweighing, what does \hat{y}_m represent in the loss l_3?

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors state that the code will be released. The data is also public. However as the method is not easy to understand, it may be hard to reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The authors should improve the writing to explain the methods in more details or the reasoning of some choices.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    2

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The results may be strong however the explanation of the methodology is very poor compare to other papers in the stack and many concepts remain unclear after multiple attempts of re-reading.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    Proposed a weakly supervised network based on tensor graphs for segmentation of whole slide images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Please see below

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Please see below

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Yes

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This paper proposed a weakly supervised network based on tensor graphs for segmentation of whole slide images. It efficiently segments WSIs by superpixel-wise classification and credible node reweighting using only sparse point annotations. The proposed network represents multiple hand-crafted features and hierarchical features yielded by a pretrained CNN to deal with the variability of WSIs. Experiments conducted on two benchmark datasets. This paper is easy to follow and to understand, the writing is well and feel like the idea is interesting, but it is not from my domain, so I could not provide some good insightful domain advice. I noticed there is no Related Work section which is very helpful for researchers to understand the context and background, some recent works even in a different domain such as, Liu, Xien, et al. “Tensor graph convolutional networks for text classification.” Proceedings of the AAAI conference on artificial intelligence. Vol. 34. No. 05. 2020., could be discussed like intra-and inter-graph propagation. Experiments could consider more publicly available benchmark datasets and to compare with more state-of-the-art methods, and more analysis suggest in the ablation study.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novelty and Experiment

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Not Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #4

  • Please describe the contribution of the paper

    In this paper, the authors propose a weakly supervised framework based on tensor Graph for WSIs segmentation. The method uses sparse point annotations as supervisory information to extract multiple features of the WSI, which including hand-crafted features, low-level features, and high-level features,and learn the data associations between different types of WSI features by capturing contextual information in the network through tensor graphs. Besides, the authors propose dynamic credible node reweighting. The experimental results show that the proposed approach achieves comparable results on two challenging datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.This paper extracts multiple features, and focus on details in WSI and contextual information of the different regions between WSI. This method improves the performance of segmentation. 2.This paper uses tensor graphs to capture global information in the same MSI features and correlation information between different WSI features.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1.In Section 2.1, The SLIC segmentation algorithm is used for WSI preprocessing in this framework. Compared with different segmentation algorithms, how to prove the robustness of SLIC to the segmentation accuracy of cancerous region. 2.In Section 2.3, it is seen from the network framework that there are multiple tensor diagrams. The meaning of each tensor graph and the definition of nodes and edges in each tensor graph are not very clear. 3.According to Table 2, the tensor-graph regularization by adding the hand-crafted features does not greatly improve the segmentation performance of the network, and the article does not experiment on the robustness of the network after it is added, can it be proved that the introduction of prior knowledge can improve its robustness? 4.Table 1 and its description of the comparative experimental results in Section 3.3 are ambiguous. The DICE 71.47% is described in the first part of Section 3.3 should be the DICE value obtained on DigestPath2019 dataset, but the corresponding data is not found in Table 1.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper is practically reproductive.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    1.In Section 2, the paper says “let …, where n = 1, 2, …, N, N is the number of WSIs”. The definition of N is not reflected in the formula. 2.In Section 2.3, the word FAAM is first mentioned in this paper, whether FAAM refers to a new unit or a previous GAAM (Do you misrepresent it here?). 3.The defined of 0 in Figure 1(a) is not match the description in this paper.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The decision will be made after the rebuttal.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper presents an architecture with combination of three different approaches including handcrafted feature, deep learning and graph network. The experimental results on two datasets demonstrated good performance. However, the reviewers have raised consistent concerns including old fashioned comparison methods, unclear implementation setting (e.g., hyperparameters), reproducibility possibility, explanation of proposed method is hard to follow. These issues should be addressed properly.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    10




Author Feedback

Thanks to the reviewers and AC. We appreciate that R1, R2 & R4 found our paper well-written/ satisfactory. Here are our responses.

Explanation of WSNTG (M, R1, R3, R4): As shown by Fig.1, our method WSNTG engages superpixel classification to segment WSIs efficiently, tensor-graph Laplacian regularization to utilize hand-crafted features and semi-labeled tensor-graph learning to exploit deep features and credible instances. It learns highly nonlinear data associations from multiple graphs built on different features and relations, instead of a single graph. Tensor graph construction is described in S2.3, whereas edges are defined by adjacency matrices on P4 (R3, R4). The hand-crafted feature learning is not very time-consuming as the features are only 5*1 vectors, which can be inferred from S2.2 (R1). It characterizes local color features. For targets that exhibit greater variability and finer details, its contributions to segmentation may be more obvious (R1, R4). The Laplacian regularization l1 defined by trace is widely used in graph learning (R3). Reference [19] is cited on P4. For node reweighting, \hat{y}_m first appeared in Line 22, S2.1 (R3). We will correct typos, remove redundant symbols and adjust text styles in the tables to facilitate reading.

Robustness (R4): SLIC (10.1109/TPAMI.2012.120) clusters similar pixels by multi-domain distances, which are more robust than single-domain distances. Its adherence to boundaries of various objects, including cancerous regions also reflects robustness. As for tensor-graph regularization, we could add noise to WSIs or use different hand-crafted features to change the graphs to test its robustness and evaluate the results by classification margin (10.1016/j.aiopen.2021.05.002).

Efficacy VS Efficiency (R1): Due to tensor graph learning and credible node reweighting, WSNTG only needs sparse labels on a few patches of WSIs to be effective. It is sped up by MLPs that reduce feature dimensions, as shown in Fig.1. It costs about 25mins and 40mins per epoch for DigestPath2019 and SICAPV2, respectively. The time could be further saved by upgrading hardware. Besides, training on selected patches is not problematic as it is adopted by many other works (e.g., 10.1007/978-3-030-87237-3_34).

Reproducibility (M, R3): As mentioned in the Abstract, our codes will be released. WSNTG has a clear structure with three branches trained end to end. It is reproducible as agreed by R1 and R4.

Implementation details (M, R1):Related info is given in S3.1 and S3.2. Hyperparameters are defined on P7. Their values, set empirically are fit for the benchmark datasets as proved by experimental results. Due to limited space, we will provide more details in our codes.

More datasets (R1, R2): We are glad that M, R1 & R4 approved the evaluation on DigestPath2019 and SICAPV2. We applied WSNTG to another benchmark, CAMELYON16 using its original data splits (camelyon17.grand-challenge.org/Data), and obtained OA: 98.91%, Dice: 86.32% and Jaccard: 79.23% using 0.005% of the pixels as sparsely labeled samples. The best results of other weakly supervised models are OA: 98.94% by WESUP, Dice: 83.81% by SizeLoss and Jaccard: 75.63% by SizeLoss. It shows that WSNTG can be well generalized to new datasets.

Comparison (M, R1, R2): WSNTG outperformed 4 fully supervised models and 3 weakly supervised ones, including WESUP (2021). We will compare it with YAMU (2022) as suggested by R1.

Measures of uncertainty (R1): It is a good idea to evaluate uncertainty of indexes. We performed Bootstrap (1000 replicates and significance testing at the 95% level) on the test data with the same settings for WSNTG as before. The results are as follows. (%) DigestPath2019 SICAPV2 CAMELYON16 OA (95.80,97.07) (97.03,98.82) (97.95,99.87) Dice (80.05,85.86) (63.96,79.37) (84.25,91.91) Jaccard (70.41,78.62) (51.64,69.55) (73.85,87.31)

Related Works (R2): They are reviewed in S1. The recommended papers will be cited.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I agree with the majority of reviewers that this paper has merits on the methodological contribution in the pathology application and experimental results on several datasets demonstrated better performance (more SOTA method comparison should be provided). Although the unclear description and symbols make it hard to follow the idea exactly (some even exist after the rebuttal), given the above mentioned merits, I am inclined to recommend acceptance. Furthermore, I strongly suggest the authors significantly improve the readability in the final version.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    11



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    interesting idea with good empirical results

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper is interesting, with a nice methodological contribution to digital pathology. I am leaning toward acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



back to top