List of Papers By topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Shiyi Du, Qicheng Lao, Qingbo Kang, Yiyue Li, Zekun Jiang, Yanfeng Zhao, Kang Li
Abstract
In breast radiology, pathological Complete Response (pCR) predicts the treatment response after neoadjuvant chemotherapy, and therefore is a vital indicator for both personalized treatment and prognosis. Current prevailing approaches for pCR prediction either require complex feature engineering or employ sophisticated topological computation, which are not efficient while yielding limited performance boosts. In this paper, we present a simple yet effective technique implementing persistent homology to extract multi-dimensional topological representations from 3D data, making the computation much faster. To incorporate the extracted topological information, we then propose a novel approach to distill the extracted topological knowledge into deep neural networks with response-based knowledge distillation. Our experimental results quantitatively show that the proposed approach achieves superior performance by increasing the accuracy from previously 85.1% to 90.5% in the pCR prediction and reducing the topological computation time by about 66% on a public dataset for breast DCE-MRI images.
Link to paper
DOI: https://link.springer.com/chapter/10.1007/978-3-031-16434-7_6
SharedIt: https://rdcu.be/cVRq3
Link to the code repository
https://github.com/zoedsy/DK_Topology_PCR
Link to the dataset(s)
https://wiki.cancerimagingarchive.net/display/Public/ISPY1#20643859f2ec9d7881eb4a408ae1f347ea462beb
Reviews
Review #1
- Please describe the contribution of the paper
The authors use topological information and a student/teacher network to improve pathological Complete Response (pCR) for breat cancer diagnosis based on MRI.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The fusion of topology and the student/teacher network makes for compelling reading and the improvement in performance appears to be considerable.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
There is no statistical analyses of the results in Table 1.
The paper is difficult to follow and lacks clarity on several points.
- Please rate the clarity and organization of this paper
Poor
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Low. The used data is public but recreating this network and being able to compute the topological priors is an advanced task for anyone.
If the code is made available, then this will improve. But as the code would only cover the network, there would still be some serious barriers to other users.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
“distill the topological priors (i.e., Betti curve)” Are these really priors?
I do not understand why the authors need to write out Eqn. 2. It says L_{RKD} = L_{R}, couldn’t that have been easierly achieved with just the statement that “L_{RKD} is the cross entropy loss between the softmax functions logits of the teacher and student model.” or something similar?
What is “we use the middle half slices”? I will assume it is slices 15-45 of the 60 available slices, is this correct? The authors should clarify in the text.
Figure 3, currently, confuses me. Subfigure (a) shows, I think, the 3D 0-cycle, 1-cycle, and 2-cycle Betti curves. If this is the case then these images should be captioned or be annotated (a1), (a2) and (a3), with the captioning explaining things.
Now subfigures (b)-(d) show different filtration levels for the Betti curves. But I do not know if this is just for two subjects one pCR and one non-pCR? Or multiple subjects?
It bothers me that the scales of the x & y axes are completely different. I think the x-scale should be the same in all three, even if it only goes up to 50 filtration steps. I think the y-axis might be better as a percentage rather than a total count. Just an opinion.
What is absolutely driving me crazy, and I only can only complain about this because the authors provide the total counts, is the totals for the 0-cycles and 1-cycles. If I have N 0-cycles, I can have at most ((N * N-1) / 2) 1-cycles, because math. From Fig. 3(b) at filtration step 15, there appears to be less than 100 0-cycles, maybe below 50 0-cycles. From Fig. 3(c) there are north of 20,000 1-cycles. (This is the case for both pCRand non-pCR.) I cannot square these two numbers.
Can the authors explain the discrepency? Or am I on the wrong wavelength.
Grammar
“which has been shown high” -> “which has shown high” OR “which has been shown to have high”
“implemented by the PyTorch” -> “implemented within the PyTorch” OR “implemented using the PyTorch”
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
6
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
It is very interesting work.
- Number of papers in your stack
5
- What is the ranking of this paper in your review stack?
1
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #2
- Please describe the contribution of the paper
This paper proposes a deep learning method with topological priors for pCR predictions. The authors use DenseNet as the backbone prediction network, and extract Betti curves as topological priors. They incorporate the extracted topological features into a linear layer and distill them into DenseNet. Compared with previous method (TopoTxR), the key difference is 1) the distillation makes it possible to avoid computing topological features at inference time; 2) the usage of Betti curve is different from persistent homology. Betti curve is a weaker feature (less expressive) than persistent homology. But it seems sufficient for this task.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Using topological features for breast cancer imaging is novel and can have a big impact. I really think this is a very good research direction and is underexplored.
- The key advantage of the proposed method is the possibility to use topological information without computation at inference stage. TopoTxR can be very expensive as it not only computes persistence diagrams, but also computes representation cycles. I think this is a quite nice motivation and should be better stressed and supported empirically.
- Empirical results reasonably demonstrated the performance gain. But I have some issues with the baselines. Will explain below.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The motivation of using Betti curves as topological features is a bit weak. As mentioned in the paper, it is more efficient in terms of computation. However, the topological features in the distillation method are only extracted once, and are involved only in training. It does not seem very necessary to use the faster yet less expressive feature (Betti curves) instead of persistent homology. To strengthen the motivation, empirical evidence should be provided (e.g., computational speed of Betti curves over persistent homology). Ideally, the comparison should be in the same resolution (right now the processing reduces the input image resolution).
- The empirical comparison needs improvement. The authors cited the baseline results of TopoTxR from [Wang et al. 2021]. However, the backbone in the original paper is using a much weaker backbone than DenseNet. Without using the same backbone, it is not clear whether the performance boost is due to the proposed contributions or the more advanced backbone.
- The authors used baselines DenseNet-CONCATE, DenseNet-MSE and DenseNet-KD to show the distillation is necessary. The main issue is that these baselines are not well designed. The authors used the logit output of a DenseNet to concatenate with / map to / compare distribution with topological features. But the logits (only 2 dimensional if I understood correctly) are already losing too much information. These baselines should be done using the last layer representation (e.g., the embedding before the last FC/soft-max), not the logits.
- In general, I think the real benefit of the proposed method is the efficiency not performance gain. Even if the baselines, including TopoTxR with DenseNet backbone, DenseNet-CONCATE, and DenseNet-KD are slightly better than the distillation method, it is OK. The key is the time saved during the inference time.
- 42 pCR and 116 non-pCR do not add up to 162, the reported total number.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The reproducibility seems OK. The method is clearly explained. Ideally the code should be released upon acceptance along with hyperparameters.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
See the weakness section.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Overall, I like this paper. I think it is well motivated. But I think the authors are a bit distracted from the main benefit (efficiency). And the empirical evaluation is not sufficiently convincing currently. I am happy to increase my scores as long as my concerns are addressed.
- Number of papers in your stack
5
- What is the ranking of this paper in your review stack?
2
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #3
- Please describe the contribution of the paper
In this paper, the authors study the neoadjuvant chemotherapy treatment response in patients with breast cancer named pathological Complete Response (pCR). Their main contribution is this paper is 1) Extracting persistent homology-based features from breast DCE-MRI images using Betti-curves which is allegedly less time-consuming. 2) Using a response-based knowledge distillation approach, they designed a network to fuse features from a DenseNet network and topological based feature extraction block based on a teacher-student model 3) Outperforming the state-of-the-art by 5% margin, thanks to the topological feature fusion.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1 - The main strength of the paper is the incorporation of the topological features with a CNN model to outperform the state-of-the-art results in the literature. This is the first time a medical imaging study incorporates both CNN features and topological features. 2 - Authors exhaustively report the results based on the state-of-the-art as a benchmark and they did use different strategies for knowledge-based distillation as an ablation study.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1 – While the approach in this paper is novel there are some ambiguities in this paper. Authors should clearly justify the term “quicker or less-time consuming” in Betti-curve compotation as opposed to the previous persistent homology-based approaches. 2 – The cubical complex filtration example and its definition are not very clear to the reader. While other types of simplicial complex filtration are based on the appearance and disappearance of homology groups, like graphs, which are intuitive, their cubical complex counterpart is not. Also, the term “structure” in appearance and disappearance is not a good choice for homology groups. 3 – There are some state-of-the-art approaches to deep learning-based persistent homology presented in top machine learning venues like “PLLay: efficient topological layer based on persistent landscapes” and “PersLay: A simple and versatile neural network layer for persistence diagrams” and “Deep learning with topological signatures”. All these approaches circumvent the non-Hilbert Space nature of persistent diagrams by appropriate kernelization while in this study it is not clear how Betti curves are used as mere features for concatenating with CNN features. 4 – Although the authors claim the previous approaches do need feature engineering, in this study, the normalization and adaptation of two very different sets of features can also be the same engineering (clearly Betti curves need feature engineering to be used as features). In terms of pre-processing, however, the authors correctly address the minimum effort to do so.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Good.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
As mentioned in section 5, the authors should clearly determine the time complexity of the Betti curve over the previous ones. Also, they should address why they did not include the mentioned deep learning-based approaches using PH in their work. Those methods are more deep learning-friendly. Betti curves are not inherently feature-ready to use for classification/regression purposes. It is strongly recommended to better explain and illustrate the cubical complex filtration and show the appearance-disappearance of homology groups (when a node, cube, or square appears and dies).
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
6
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Although there are some ambiguities that are addressed in section 5, the authors’ approach to studying the very vital medical treatment prediction, pCR, is novel by incorporating topological features and CNN ones. Topological features can be a great informative feature as a complementary counterpart with other ubiquitous image-based features. The outperformance of the proposed model with a clear benchmark comparison is another reason for the reviewer to recommend this paper.
- Number of papers in your stack
2
- What is the ranking of this paper in your review stack?
5
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
The paper proposes a topological priors for pCR predictions. The reviewers are consistent about the strengths of the paper, such as novelty and performance gain.
- What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
2
Author Feedback
First and foremost, thanks so much to all the reviewers and meta-reviewer for your constructive comments to this paper, which are believed to improve this work. Next, we feel sorry not to clarify some points like some statistical analyses of one table in this paper, which would be explained in more details in the final version. Also, we will address the linguistic issues our reviewers have mentioned and correct the errors out of our carelessness. About the comments from Reviewer #1 related to the x-scale of some diagrams, here the x-scales in the diagrams are the same, but we intercept them to display the important part of different diagrams. About the comments from Reviewer #2 related to weaker backbone of the baseline, it is true that the original paper is using a much weaker backbone than DenseNet. Thus, we compare our proposed network with pure DenseNet backbone to show part of the performance boost is due to the proposed contributions. About the comments from Reviewer #3 related to ambiguities in this paper, we will more clearly justify the computation complex of Betti curves. Finally, we are so grateful to all your suggestions and will try to improve our final version according to your comments.