Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Peter Hirsch, Caroline Malin-Mayor, Anthony Santella, Stephan Preibisch, Dagmar Kainmueller, Jan Funke

Abstract

Tracking all nuclei of an embryo in noisy and dense fluorescence microscopy data is a challenging task. We build upon a recent method for nuclei tracking that combines weakly-supervised learning from a small set of nuclei center point annotations with an integer linear program (ILP) for optimal cell lineage extraction. Our work specifically addresses the following challenging properties of C. elegans embryo recordings: (1) Many cell divisions as compared to benchmark recordings of other organisms, and (2) the presence of polar bodies that are easily mistaken as cell nuclei. To cope with (1), we devise and incorporate a learnt cell division detector. To cope with (2), we employ a learnt polar body detector. We further propose automated ILP weights tuning via a structured SVM, alleviating the need for tedious manual set-up of a respective grid search. Our method outperforms the previous leader of the cell tracking challenge on the Fluo-N3DH-CE embryo dataset. We report a further extensive quantitative evaluation on two more C. elegans datasets. We will make these datasets public to serve as an extended benchmark for future method development. Our results suggest considerable improvements yielded by our method, especially in terms of the correctness of division event detection and the number and length of fully correct track segments. Code: https://github.com/funkelab/linajea

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_3

SharedIt: https://rdcu.be/cVRvn

Link to the code repository

https://github.com/funkelab/linajea

Link to the dataset(s)

https://doi.org/10.5281/zenodo.6460303

https://doi.org/10.5281/zenodo.6460375

http://celltrackingchallenge.net

Reviews

Review #1

Please describe the contribution of the paper

In this paper, the authors propose an incremental extension of Linajea [14], a state-of-the-art tracker of cells in embryonic image data, to improve its tracking performance by employing a ResNet18-based cell state classifier [21] and to facilitate its hyper-parameter fine-tuning via structured SVM [8]. The proposed method has been quantitatively evaluated using three different sets of C.elegans embryo recordings and achieved insignificant improvements in the DET and TRA scores (third and fourth decimal digits as reported in Tables 1 and 2) when compared to Linajea.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Despite its incremental methodological novelty over Linajea, the idea of hyper-parameter fune-tuning to alleviate the need for identifying a suboptimal hyper-parameter configuration using grid-search seems to be promising. However, it is unclear whether such a strategy is viable for other types of embryonic image data, or time-lapse cell image data in general. The authors also promised to make two fully annotated datasets publicly available along with the paper, with the aim of accelerating further algorithmic developments in this area.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The main shortcomings of this work are (i) incremental methodological novelty of the proposed method that only combines previous concepts ([14], [21] and [8]) into a single framework; (ii) insignificant superiority of the tracking performance in terms of DET and TRA compared to Linajea [14] that is named as JAN-US in the Cell Tracking Challenge; and (iii) very limited performance comparison with the state-of-the-art methods. Based on the scores listed in Tables 1 and 2 and available on the Cell Tracking Challenge website for the JAN-US (i.e., original Linajea) method (http://celltrackingchallenge.net/participants/JAN-US/), it does not seem the integrated cell state classifier would improve the tracking performance by any statistically or practically significant margin. Furthermore, it is unclear why the top-performing methods for the Fluo-N3DH-CE datasets (i.e., KTH-SE (1) and KIT-Sch-GE (1) publicly available on the Cell Tracking Challenge website) were not taken as the baselines for the mskcc-confocal and nih-ls recordings.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

It is a pity that the mskcc-confocal and nih-ls datasets and the results achieved over them were not made privately available for the peer-review purposes. Furthermore, This reviewer did not find any relevant information about the computational aspects of the proposed method (i.e., hardware requirements, execution time, memory footprint, etc.) in the paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
Besides the comments given in the box #5, there are other points that deserve additional attention:
- It seems that the proposed method is not listed on the Cell Tracking Challenge website. It is therefore questionable whether it really ranked first at the time of paper submission, especially when its predecessor (i.e., Linajea or JAN-US in the Cell Tracking Challenge terminology) achieved practically the same scores as those reported in Section 3 (2nd paragraph).
- It is unclear whether the cell state classifier can deal with temporal gaps in cell detections, and thus correct imperfections of the cell detector employed.
- The description of the evaluated performance measures is unclear, which makes the numbers listed in Tables 1 and 2 difficult to interpret. For example, the last reported value of FPdiv in Table 2 (i.e., 0.046) does not give any integer number after being multiplied by 18 (i.e., by the number of evaluated configurations). The formal definitions of individual measures need to be provided. Furthermore, please clarify where the 16% reduction of manual curation comes from (Section 3, 3rd paragraph) when comparing Elephant and the proposed method.
- There is no information about the annotation protocol followed for the nih-ls dataset. Furthermore, it is unclear whether multiple manual curations of the Starrynite results for the mskcc-confocal dataset were prepared and subsequently fused to reduce the subjectivity and error-proneness of the final reference annotations.
- The numbers listed in Tables 1 and 2 are mostly similar across the evaluated approaches. Are the differences statistically or practically significant?
- As the proposed method shall be used a baseline for the nih-ls dataset, it is unfortunate that the cell state classifier did not account for apoptotic celss. How much would the performance of the proposed method improve when training the ResNet18 backbone for that class too?
Other remarks:
- Table 2: There is a typo (cls -> csc) in the table caption.
- To this reviewer’s understanding of the Cell Tracking Challenge format, two test sequences per dataset can be downloaded from the Cell Tracking Challenge website (i.e., the test data is public).
- The FP decrease from 3.7 to 2.5 in Table 2 does not seem to be dramatic. Please tone this claim down.
- Check the spelling of Linajea across the paper.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

3
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The work presented in this paper suffers from (i) incremental methodological novelty; (ii) insignificant superiority of the tracking performance compared to the baseline (Linajea); and (iii) very limited performance comparison with the state-of-the-art methods.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

In their rebuttal, the authors clearly explained statistical and practical significance of the tracking results achieved by their proposed approach when compared to Linajea, which partly justifies rather incremental methodological novelty of the proposed approach too. However, it is unclear why the proposed method cannot be compared with IGFL-FR and KIT-Sch-GE when the pretrained models for the Fluo-N3DH-CE dataset are publicly available, and thus could be applied on the mskcc-confocal and nih-ls recordings despite the lack of segmentation masks for these two datasets. The same applies for KTH-SE that does not require any training data and seems to achieve competitive tracking results for the embryonic datasets included in the Cell Tracking Challenge.

Review #2

Please describe the contribution of the paper

This manuscripts presents a method for improved lineage tracing from whole-embryo C. elegans data. The authors build two additional modules: a cell-state detector and a weight estimator, on top of the earlier-developed algorithm. This results in improved performance, in particular to detection of cell divisions. The proposed method is leading the scoreboard of the dedicated challenge in this particular data category.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

• Developed methods improve the performance, both in terms of higher validation scores and ease of use. • Top-ranked performance on the challenge data.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

• Improvement in terms of two established validation measures, related to detection (DET) and tracking (TRA), respectively, looks marginal. • Modified version of the algorithm does not results in consistent performance on all the validation measures: while it improves the scores in some categories, the results with respect to other validation measures actually deteriorate.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The methodology is valid and clearly presented. The most convincing results were achieved on the challenge data, with the submission being evaluated by the organizers.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
1. The main result of this work for me is that the proposed method achieved top-raking performance on the Cell Tracking Challenge data. Please, by the way, specify this explicitly in the corresponding paragraph of the Results section. StarryNite, while being often used for benchmarking, is known to be rather sensitive to the input data; meaning that its performance on data that differ from their own data tends to be significantly worse.
2. It is interesting to notice that the modified method and the original one seem to prioritize different validation measures. The same holds for ablation study presented in Table 2. This is something that needs to be discussed by the authors.
3. Judging from the two established measures of lineage tracing (DET and TRA), the improvement provided by the modified method is very marginal. I do realize that it is impossible to assess its statistical significance due to low number of data samples. But this observation does somewhat undermine the value of this work. An alternative way to show the worth of the improved performance would be to demonstrate its positive impact on the downstream analysis; which is missing in this manuscript.
4. “In this regard, our improvement in TRA over Elephant should mean that our method entails a 16% reduction in manual curation effort as compared to Elephant.” I find it difficult to understand where the 16% are coming from; taking into account that the corresponding tracking scores are 0.979 and 0.975.
5. Please explain the meaning behind the underscored entry in Table 2.
6. “By adapting the cost function Δ one should be able to modulate this depending on respective application-specific needs.” I find this sentence somewhat speculative. Otherwise the authors have to explain how this can be achieved.
7. Please report the value of the λ parameter for completeness.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This work presents methodology for making the algorithm for lineage tracing in C. elegans data more accurate and easy to use. The methods are valid and clearly presented. However, the added value of the developed methodology is somewhat limited as the 1) It was not possible to assess statistical significance; 2) It is inconsistent with respect to different validation measures; and 3) Improvement on the most-commonly used validation measures (DET and TRA) is marginal. On the other hand, this was enough to climb on the leading position of the dedicated challenge.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

3
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

6
[Post rebuttal] Please justify your decision

I am satisfied with how the authors approached the revision and addressed my comments. My recommendation remains unchanged

Review #3

Please describe the contribution of the paper

The authors present an extension to the previously published tool linajea to perform lineage reconstruction in whole embryo recordings of C. elegans. The extensions include cell state/polar body classification that are incorporated to the learning-based approach. Moreover, an automatic hyperparameter identification module using a structured SVM approach is used. The extensions are validated on three different data sets and show (slightly) superior results to previous methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The achieved tracking quality of the proposed method is a strength of the paper (although the scores of the prior methods linajea are already impressive as well).
- Although not invented by the authors, an interesting way of automatically identifying hyperparameters via a structured SVM is applied to eradicate the need for manual hyperparameter tuning / grid searches.
- The additions to the ILP formulation are reasonable extensions and it’s generally a great approach to incorporate biological constraints directly to the optimization problem.
- Last but not least the paper is nicely written, comprehensible and I didn’t even spot a typo.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- I’m wondering why the additional metrics were not computed for the cell tracking challenge data set and why it is mentioned separately in the text, rather than combining it to Table 1.
- There was a recent submission to the cell tracking challenge that outperforms the proposed approach. While this is to be expected for an ongoing challenge, it would still be good to correct the statements in the results section accordingly.
- The improvements over the previous methods seem to be quite modest. It would be good if the authors could describe the implications of the improved scores. Can this be somehow quantified? For instance: How many manual corrections are saved by these minor improvements? How many more complete lineages are present?
- I’m aware that it would unveil the identity of the authors, but for the final manuscript it would be great to include a link to the annotated data set as the authors propose in the abstract. This would indeed be a valuable addition for the community.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Seems to be reproducible to me. However, as mentioned above it would be good for the final publication to include links to data and repositories to be used for benchmarking and reproduction the community.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

See list of weaknesses.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall a nicely written paper proposing reasonable extensions to a previously published method. Performance is assessed on three data sets and demonstrates the validity of the performed modifications.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Not Answered
[Post rebuttal] Please justify your decision

Not Answered

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This work presents an extension of previous work on cell tracking and applies it to a publicly available cell tracking challenge dataset. It seems to be top ranked on the leaderboard at the time of submission.

Despite good average score, reviewers have identified some highly important weaknesses of the work. (1) most importantly, despite climbing the top of the leaderboard, the difference to the state of the art method (which is also the foundation method for this work) seems to be marginal and practically insignificant. Additionally, a rise in tracking performance seems to come at the cost of other metrics. (2) the methodological contribution of this work seems small and incremental, given the baseline method this work builds upon, (3) there is limited comparison with state of the art methods except for the tracking challenge results.

My overall assessment of this work is borderline, therefore all these important issues have to be discussed in the rebuttal, and other significant points raised by reviewers should be considered as well, up to the assessment of their significance by the authors to stay within rebuttal limits.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

7

Author Feedback

We thank the reviewers and AC for their constructive feedback. In the following we address three main points as summarized by the AC:

Practical impact and statistical significance of improvements over the baseline method linajea To address this point, we assessed the statistical significance (using Wilcoxon’s signed-rank test) of our improvements over linajea on the mskcc_confocal and nih_ls datasets. The DET and TRA accuracy measures are designed to capture the manual effort necessary for correcting an automated tracking solution, thus directly reflecting practical impact. Both are significantly improved by our method (linajea+csc+ssvm) vs baseline linajea (p<0.01). Furthermore, for many biological applications, maintaining a cell’s identity over time is crucial, yet cell identity is lost with every false or missing cell division. Our method significantly outperforms linajea in terms of division errors (“div”, p<0.001). To further strengthen this point we evaluated an additional quantitative error measure of practical relevance, namely the fraction of correctly reconstructed tracks over a range of track lengths (alike to Fig. 2 of [16]). Our method improves by 3% over linajea for tracks of length 100 frames and 6% for tracks of 200 frames for mskcc_confocal, and 5%/2% for nih_ls, all statistically significant (p<0.01).

It is to be expected that there is some trade-off among the elementary error types for different methods. However, the increase of some elementary error types we observe for our method vs linajea is vastly outweighed by decrease of others, as reflected by our significant improvements in terms of the practically relevant summary error metrics discussed above. E.g., the slight increase in FP and FN errors is not significant (both p>0.1).

To clarify our claim of 16% reduction of manual tracking effort on the CTC data: With the previous state of the art, an estimated 2.5% of the fully manual effort is still necessary (as reflected by TRA=0.975), while for our method it is 2.1% (TRA=0.979). This constitutes a reduction by 16% (=(2.5-2.1)/2.5) over the previous state of the art.

Lastly, we would like to clarify: Baseline linajea has not been submitted to the CTC for the C. elegans data. The JAN-US results on the leaderboard employ our linajea+csc method.

Methodological contribution over baseline linajea Our methodological contribution is twofold: We incorporate a cell state classifier into an ILP for tracking. This involves phrasing novel feasibility constraints (linear inequalities) to ensure global consistency of the obtained solution, i.e., to ensure that each cell has exactly one state, while at the same time guaranteeing that the combination of states at the end points of each edge is valid. This comes on top of an extended objective function to include the predicted cell state scores. Altogether, this is not trivial. Secondly we use a structured SVM in a tracking-by-assignment model. This is a contribution in the sense that sSVMs have not been applied to this kind of problem before, albeit we do not propose any modification as such. In Fig. A3 we show that the application of an sSVM in this context results in consistent hyperparameter values.

Comparison with the state of the art methods At the time of submission there were 13 submissions for the C. elegans dataset to the CTC thus comprising a comprehensive comparison. Both the second place method (wrt. TRA) Elephant (IGFL-FR) and KIT-Sch-GE, the leader wrt. OP_CTB, require strongly supervised labels, namely segmentation masks, which prevents us from applying them to our new datasets. We would like to emphasize that we will make our new datasets public together with our manuscript, thereby extending the pool of high quality annotated tracking datasets for future method development. We would also like to note that the labels for the CTC test dataset are not public. Thus we are unable to report any error measures other than DET and TRA on this data.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors have shown the statistical significance of their results and placed it well in terms of comparison with other methods and in terms of justifying pros and cons regarding evaluation metrics. Also the methodological contribution was clarified, leading to all reviewers now voting for acceptance of the paper. While minor concerns remain, overall this meta reviewer thinks that the proposed approach is a good step in the development of tracking approaches, as indicated by the state of the art performance of the approach. Therefore, this reviewer votes for acceptance, given that the revisions discussed in the rebuttal are included in the final manuscript.

This work presents an extension of previous work on cell tracking and applies it to a publicly available cell tracking challenge dataset. It seems to be top ranked on the leaderboard at the time of submission.

Despite good average score, reviewers have identified some highly important weaknesses of the work. (1) most importantly, despite climbing the top of the leaderboard, the difference to the state of the art method (which is also the foundation method for this work) seems to be marginal and practically insignificant. Additionally, a rise in tracking performance seems to come at the cost of other metrics. (2) the methodological contribution of this work seems small and incremental, given the baseline method this work builds upon, (3) there is limited comparison with state of the art methods except for the tracking challenge results.

My overall assessment of this work is borderline, therefore all these important issues have to be discussed in the rebuttal, and other significant points raised by reviewers should be considered as well, up to the assessment of their significance by the authors to stay within rebuttal limits.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

2

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Authors addressed all reviewers concerns.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

2

back to top

Tracking by weakly-supervised learning and graph optimization for whole-embryo C. elegans lineages