Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Youbao Tang, Ning Zhang, Yirui Wang, Shenghua He, Mei Han, Jing Xiao, Ruei-Sung Lin

Abstract

Automatically measuring lesion/tumor size with RECIST (Response Evaluation Criteria In Solid Tumors) diameters and segmentation is important for computer-aided diagnosis. Although it has been studied in recent years, there is still space to improve its accuracy and robustness, such as (1) enhancing features by incorporating rich contextual information while keeping a high spatial resolution and (2) involving new tasks and losses for joint optimization. To reach this goal, this paper proposes a transformer-based network (MeaFormer, Measurement transFormer) for lesion RECIST diameter prediction and segmentation (LRDPS). It is formulated as three correlative and complementary tasks: lesion segmentation, heatmap prediction, and keypoint regression. To the best of our knowledge, it is the first time to use keypoint regression for RECIST diameter prediction. MeaFormer can enhance high-resolution features by employing transformers to capture their long-range dependencies. Two consistency losses are introduced to explicitly build relationships among these tasks for better optimization. Experiments show that MeaFormer achieves the state-of-the-art performance of LRDPS on the large-scale DeepLesion dataset and produces promising results of two downstream clinic-relevant tasks, i.e., 3D lesion segmentation and RECIST assessment in longitudinal studies.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_51

SharedIt: https://rdcu.be/cVRwC

Link to the code repository

N/A

Link to the dataset(s)

https://nihcc.app.box.com/v/DeepLesion

https://github.com/JimmyCai91/DLT


Reviews

Review #1

  • Please describe the contribution of the paper

    This work proposes an architecture and set of training objectives for RECIST lesion diameter prediction on DeepLesion data. As in some other works, the architecture involves a convolutional encoder followed by a transformer encoder (with positional encoding). As in prior work, the architecture includes a convolutional decoder path that outputs a weakly supervised segmentation and heatmaps which indicate RECIST diameter keypoint positions. This work proposes the addition of a second decoding path which uses a transformer decoder with a keypoint regression output. Furthermore, this work proposes two consistency losses between the two decoding paths. Ablative experiments show that the transformer encoder is particularly useful (this has been proposed before for other tasks but not validated on this task) and the transformer decoder path with consistency losses yields an additional (though small) gain in performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • New decoder path for RECIST diameter prediciton in DeepLesion : keypoint regression via transformer.
    • Novel consistency losses.
    • The ablation studies are thorough and informative.
    • SOTA performance.
    • Thorough comparisons to other methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Some technical details are unclear (eg. step 1 outputs: do they use a segmentation objective, do they output only two heatmaps, do they use the same consistency losses and are these suboptimal for bounding box prediction, as opposed to keypoint prediction?)
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    What were the Adam optimizer hyperparameters (other than learning rate) and what was the batch size? While not mandatory, I strongly encourage the authors to release the code for reproducibility – this is the most productive and useful way to allow reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • Transformers on CNN is not a new contribution - as noted in the paper, this is based on prior work. Instead, the authors can list SOTA performance as a contribution.
    • How did you select the loss weights (lambda)?
    • Do you only predict two heatmaps in step 1 (bounding box)? Please clarify in the text.
    • Do you apply the second consistency loss (distance to segmentation boundary) for the bounding box corner prediction in step 1? Please clarify in the text. If you do, this seems suboptimal as bounding box corners can be expected to typically be outside of the segmentation boundary. What is the impact of omitting this consistency loss in step 1?
    • The ablation on consistency losses and model components is useful. Please perform a statistical significance test since results are so similar.
    • Please better detail the weak supervision method. Is the first pseudo-mask an ellipse based on RECIST diameters? Are the second and third pseudo-masks the intersection of the prediction with the previous pseudo-mask? Is there an ignored region? Is step 1 (bounding box prediction) trained without the segmentation objective?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While modest, the new model contributions are clear and yield state of the art performance. Clarity could be improved. While some ablation studies could be added, the current set of ablation experiments is pretty good.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The paper proposes a model for semi-automated RECIST diameter prediction and segmentation. The model is applied in two, consecutive steps, in which the first step aims to predict a rough bounding box, while the second aims to predict a segmentation map, heatmaps for long- and short-axes diameter keypoints, as well as an additional direct regression of these points. The paper evaluates the algorithm on a seemingly manually chosen and annotated subset of the DeepLesion dataset and achieves superior performance in comparison with a variety of other algorithms.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Although not the first to propose this, the submission addresses an interesting combination of CNNs and Transformer networks with a similarly interesting application in a clear radiological environment and with a straightforward practical use. The manuscript contains a variety of descriptive figures, which facilitates a better understanding of the authors’ arguments.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Overall, the impact of the method felt rather difficult to assess. – While the idea of single-click RECIST diameter estimation via the use of a combination as proposed by the authors is rather novel, single-click lesion segmentation is not, and is part of the regular clinical workflow, e.g. when using clinical reading software (single-click segmentation for example is part of the MM Oncology Suite from Siemens, but similarly exists for other vendors). – While I do not believe that these vendors currently use state of the art deep learning and transformer networks for this purpose, the question arises whether the conducted comparison is therefore adequate, as it seemed to only take into account deep learning-based methods, and with 5 out of 9 (namely [2] and [15-18]) having a very strong overlap of contributing authors, some optimized for the same dataset (e.g. 17,18). As a result, the representativity of the conducted comparisons felt somewhat vague to me.

    • Further, the choice of the data should have been somewhat better motivated. It did not become clear to me, how the 1,000 test samples have been chosen and how they were distributed across the various types of lesions in the DeepLesion dataset.

    • Finally, I was not completely convinced about both methodological aspects of the proposed solution (see below) as well as the significance of the achieved results, on which I would like to ask the authors for further explanation.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Overall, the reproducibility form in my perspective well matched the manuscript. Regarding a few points, I felt that the authors might adapt their answers, as pointed out in the following:

    A clear declaration of what software framework and version you used. [Yes] -> No

    • I felt, that this was not described in detail. While it was mentioned that the authors used PyTorch, no versions have been documented.
    1. For all reported experimental results, check if you include: The range of hyper-parameters considered, method to select the best hyper-parameter configuration, and specification of all hyper-parameters used to generate results. [Yes] -> No
      • This in so far as basically no hyper-parameter optimization seems to have been conducted.

    Details on how baseline methods were implemented and tuned. [Yes] -> No

    • While the authors have stated to have trained the methods on the same data, they did to the best of my understanding not comment on whether they used one of the original implementations or not, nor whether they conducted any kind of tuning for the data at hand.

    An analysis of situations in which the method failed. [Yes] -> No

    • While I might have overlooked this, I did not find an error-case assesment in the manuscript.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    (Note: In the following, original references are referred to as [X] while newly introduced are referred to as [rX])

    Results

    • As pointed out above, while I absolutely understand the clinical motivation of the method at hand, I did not feel completely convinced regarding the practical impact of the method at hand. – This is, first, as a click-based segmentation is part of typical clinical assessment software. Would it be possible to point out the main differences to these solutions? – Secondly, while deep learning-based solutions often constitute the state of the art, especially for this purpose well-functioning more classical algorithms are already existing in a clinical deployment. – Would it thus be possible for the authors to point out why only deep learning-based solutions were taken into account for comparison? – Would it further be possible to briefly comment on the very large authorship overlap across the chosen comparison algorithms (5 out of 9), namely [2] and [15-18]? – Could the authors briefly comment why they did not compare to other state of the art semi-automatic segmentation approaches, such as [r1-2]?

    • Further, to me the claimed superiority seemed to be rather questionable: – While the authors provided standard deviations, they did not conduct any statistical testing. Many of the depicted standard deviations, however, imply that the results are not significant, in particular if taking into account values such as the Dice coefficient of AHRNet in Tab. 1, but also the long- and short-axes deviations in comparison to PDNet, or the segmentation-based results in comparison to TransFuse. As a result, I was not able to rule out the possibility of mere difference by chance. – If possible, I would like to ask the authors to add significance tests, such as t-tests, in order to rule out this possibility.

    • Regarding the nnUNet results, it did unfortunately not become clear to me why the authors did not provide segmentation values, since nnUNet by its very nature is a segmentation approach. – Further, on p6, the authors state that the listed results have been copied from the related nnUNet paper. The paper, however, does not contain an evaluation on the DeepLesion dataset. Could the authors briefly comment on that?

    Method

    • To me, the value of L_cons1 and L_cons2 might have been better motivated and assessed in more detail. Due to their formulation, both L_cons1 and L_cons2 are likely to provide only sparse feedback. – While the authors have assessed them in an ablation study, the results to me seemed to be mostly inconclusive. This is especially, as the results of the heatmap accuracy did not seem to change significantly while already being better than the regression output, neither did the Dice coefficient. Thus, finally the introduction of L1 and L2 does not promise a significant improvement, but a significant additional effort during the training.

    • Drawing the axis the other way around, i.e. exchanging start and end point for each axis, would lead to a total of 4 permutations, which all would describe the same axes. – How do the authors ensure consistency across the keypoints to combat this issue?

    • Some of the hyperparameter choices seemed rather arbitrary and barely motivated. – The choice of N_en=6 layers seemed somewhat arbitrary. Why did the authors chose exactly 6 layers? – Similarly, why was the weighting of lambda_1 to lambda_5 on p5 optimal? How was this assessed?

    Discussion

    • I felt that in light of the above mentioned points some of the statements in the discussion were in my perspective not sufficiently shown. – I would recommend to mitigate some of the claims in the result discussion, namely the consistent improvement (p8), and that long-distance information is encoded, which was not directly assessed (p8).

    • Further, the derivation of RECIST diameters from segmentations in my perspective is mostly state of the art, and is done by likely all large vendors as part of their clinical assessment software, as well as by a variety of institutes who have previously published on lesion segmentation. In a publication from 2008 [r4] this has already been part of the used software. – I would therefore recommend to not state this as a major contribution of the manuscript at hand (cf. p7).

    • The results on the DLS dataset felt somewhat difficult to interpet without further context. While an accuracy of 91.7% at first sounds positive, this is a rather low value if 95% of the lesions would show a similar response pattern (and thus the method would be worse than informed guessing). – As the dataset has not been further used in the manuscript, I would recommend to leave out this information, or to add a separate supplementary with a more thorough evaluation on that, which may then be referred to in the manuscript.

    Dataset

    • The choice of the data should have been somewhat better motivated. It did not become clear to me, how the 1,000 test samples have been chosen and how they were distributed across the various types of lesions in the DeepLesion dataset. – Would it be possible to briefly add some explanation on this?

    Minor

    • The table styling might be refined. A good resource for this purpose is [r3].
    • On p6, there is a formatting issue with a larger reference.
    • The DLS dataset should be briefly described at its first mention, introducing the abbreviation would further help the reader to understand the sentence without having to look into the reference.
    • Where does the stated accuracy of 99.1% come from (p5, bottom)? Why was this 2% better than the previous result? Could the authors add a note on where to find this data?
    • Fig. 2 seemed to be a bit small, which first made it difficult to me to follow the discourse in Sec. 2.
    • The manuscript currently contains a few typos and language issues that the authors might want to fix prior to publication: “On the other hand, transformer is designed […] which is able to […]”, p2; “which attracts tremendous attentions increasingly”, p2; “[…] matrix Q is token as input”, p4; “The purposed MeaFormer”, p4;

    References

    [r1] Lin, Zheng, et al. “Interactive image segmentation with first click attention.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. [r2] Mahadevan, Sabarinath, Paul Voigtlaender, and Bastian Leibe. “Iteratively trained interactive segmentation.” arXiv preprint arXiv:1805.04398 (2018). [r3] https://people.inf.ethz.ch/markusp/teaching/guides/guide-tables.pdf [r4] Jolly, Marie-Pierre, and Leo Grady. “3D general lesion segmentation in CT.” 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro. IEEE, 2008.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, I am not fully convinced by the value of the method at hand:

    • Most importantly, many of the results seemed inconclusive, some remained unclear, and a few seemed inconsistent.
    • The idea in my perspective is current, but still has only limited novelty. The practical value of some of the method’s concepts (e.g. the consistency loss) remained somewhat unclear to me.
    • Regarding these issues, the manuscript unfortunately does not provide a statistical evaluation, which could easily eliminate some of my reservations.
    • The sum of these issues combined with a variety of minor issues and the expected quality of MICCAI as a flagship conference therefore leads to my final recommendation.
  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #4

  • Please describe the contribution of the paper

    the authors propose a transformer-based network (Meaformer, Measurement Transformer) for lesion RECIST diameter prediction and segmentation (LRDPS), which involves three related and complementary tasks, including lesion segmentation, heat map prediction and key point regression. Two consistency losses are introduced to explicitly establish the relationship between these tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    a transformer-based network (Meaformer, Measurement Transformer)

    Two consistency losses are introduced to explicitly establish the relationship between these tasks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    What’s the internal structure of the transformer you are using? It’s best to show it with figure. For example, whether the position encodings are all added to Q, K and V or only added to K and Q. What kind of structure is the encoder-decoder attention you mentioned in the decoder?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    yes

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    I have read this paper. My advice is minor repairs. In this paper, the authors propose a transformer-based network (Meaformer, Measurement Transformer) for lesion RECIST diameter prediction and segmentation (LRDPS), which involves three related and complementary tasks, including lesion segmentation, heat map prediction and key point regression. Two consistency losses are introduced to explicitly establish the relationship between these tasks. Existing problems: In the abstract, you should point out the existing problems in the current research, and then put forward the methods of this paper. What is the meaning of the last sentence “All figures are best viewed in color.” in Figure 1. Is there a prediction head in the MeaFormer in step 1? The MeaFormer difference between step 1 and step 2 is not indicated. In 2.1, the image I_c and I_d should be displayed graphically. In the backbone in 2.1, why is HRNet-W48 used to extract features? What’s the internal structure of the transformer you are using? It’s best to show it with figure. For example, whether the position encodings are all added to Q, K and V or only added to K and Q. What kind of structure is the encoder-decoder attention you mentioned in the decoder? In the final objective function in 2.2, how to set the weight factors of different loss? Is the weight factor set in this paper the best combination of weight factors?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    a transformer-based network (Meaformer, Measurement Transformer) for lesion RECIST diameter prediction and segmentation (LRDPS), which involves three related and complementary tasks, including lesion segmentation, heat map prediction and key point regression.

  • Number of papers in your stack

    7

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The idea of the manuscript has been widely appreciated by the reviewers and the experimental strategy appears interesting and well-designed. There is however some doubts cast about the true relevance of the proposed method. The rebuttal should the questions related to statistical testing and the justification of the points of comparison. If possible some more rationale on the introduction of the two new losses should be included

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5




Author Feedback

[R1&R3] The statistical t-test: We keep only one decimal place in the tables to save space. But the results with three decimal places are used for the t-test. P-values for ablation studies: with vs w/o TranE: 0.0010(dice, D)/0.0073(heatmaps long, HL) with vs w/o TranD (all losses): 0.0029(D)/0.0076(HL)/0.0001(regression long, RL) with vs w/o Lconses: 0.1345(D)/0.0385(HL)/0.0001(RL) L1 vs L1+Lcons1: 0.6867(D)/0.4744(HL)/0.0001(RL) L1+Lcons1 vs L1+Lconses: 0.2812(D)/0.0388(HL)/0.1831(RL) P-values for comparisons: AHRNet: 0.3187(D) PDNet: 0.0454(D)/0.0038(HL) TransFuse: 0.0001(D)/0.0002(HL)/0.0034(seg. long)/0.0023(fusion long) [R3] Comparisons’ justification: (1) We did not have a chance to test the click-segmentation software in the clinical workflow. But after inquiring some radiologists, we can tell the main differences: 1) our method can work on almost all lesion types while the software may work on one or several specific lesion types (e.g., lung nodule, liver lesion, lymph node), 2) besides lesion segmentation, our method also performs accurate RECIST diameter annotation obtained from three tasks. (2) Compared to the classical algorithms (e.g., [r4]), the deep learning-based solutions achieved the best performance in many medical image tasks including the ones here. Thus, to verify the effectiveness of our method, we selected state-of-the-art approaches on these tasks for comparison. (3) Inspired by [17], we start this work by proposing a novel solution. The authors of [17] have published a series of papers on these tasks. So, we should consider them for comparison. (4) [r1-2] perform interactive segmentation with multiple clicks. This work only requires a single click. Introducing more clicks on the incorrect segmentation region could improve the performance, which is considered as our future work. We will compare these approaches including [r1-2]. (5) The results of nnUNet in Table 1 are copied from [17] instead of the original paper. [R3] More rationale on the two consistency losses: They are built based on the keypoint regression results, resulting in sparse feedback. But to provide useful feedback, we must make sure the regression results are promising, which is the main purpose of Lcons1. When using Lcons1, the regression performance is improved significantly. Lcons2 is proposed to better optimize the model, especially for the transformer encoder, to further enhance the representation ability of features for improving the performance of lesion segmentation and heatmap prediction. It is easy to implement them. They are only performed during training with little computation cost. [R1&R4] More details about the differences of the two steps: A prediction head is also included in step 1, which performs the same tasks (segmentation & heatmap prediction (RECIST)) as step 2. No consistency loss is used in step 1. The results of box regression are unsuitable to build such strong constraints as keypoint regression. So omitting them in step 1 can make the model optimize better and faster. [R1&R3&R4] The loss weights: We empirically set the weights to balance the magnitude of different losses, so as to make the model work well for all tasks. [R1&R3] More details about the training setting: Pytorch 1.6 is used. We use the default setting of Adam optimizer’s other hyperparameters (e.g., betas=(0.9, 0.999)). The batch size is 16. We will include these details in the revision. [R1] Yes to all questions about the pseudo masks. [R3] The 1000 test lesions are randomly selected and can be classified into 8 types: lung (247), abdomen (220), mediastinum (166), liver (134), pelvis (88), soft tissue (68), kidney (52), and bone (25). 6 layers are used as done in [19]. The keypoints’ order is defined as [left, right of long axis, top, bottom of short axis]. [R4] The position encodings are added to Q, K and V. We’ll show the internal structure of transformers in Fig. 1. HRNet-W48 can extract highly discriminative high-resolution features.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    With the answers from the rebuttal, the relevance of the paper is well demonstrated and the clarifications requested are well answered. These details should however be included in the final revision for publication

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The key strength of this work is a transformer based segmentation and RECIST estimation method that can be applied to tumor delineation and quantification. One of the reviewers indicated key weaknesses of the work regarding the missing comparison with the state of the art from the clinical point of view, which arguably is highly relevant in this computer aided diagnostics task. After studying the rebuttal, this meta reviewer thinks that these concerns were not adequately addressed. Deep learning methods nowadays seem to dominate in research, however, a good argument has to be made why a novel methodology is superior to the clinically used state of the art. By neglecting interactive segmentation methods as used in commercial systems, confidence in the findings and drawn conclusions will be low. Justifying the proposed method with the argument that it can work on lesions from different anatomical structures is a weak argument, because clinical impact can also be reached by a method tailored to a specific task. Furthermore, the concerns regarding related and compared work mostly being from the same group of authors also were not sufficiently addressed, thus putting some doubt on how representative the performed experimental evaluations are. Overall, this meta reviewer thinks that the paper would require a major revision to meet the standards of MICCAI publication.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    12



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper proposes a model for semi-automated RECIST diameter prediction and segmentation. Novelty lies in usage of transformers. Rebuttal seems to clarify most problems, and reviewers are in general in favour.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8



back to top