Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xin He, Guohao Ying, Jiyong Zhang, Xiaowen Chu

Abstract

The COVID-19 pandemic has threatened global health. Many studies have applied deep convolutional neural networks (CNN) to recognize COVID-19 based on chest 3D computed tomography (CT). Recent works show that no model generalizes well across CT datasets from different countries, and manually designing models for specific datasets requires expertise; thus, neural architecture search (NAS) that aims to search models automatically has become an attractive solution. To reduce the search cost on large 3D CT datasets, most NAS-based works use the weight-sharing (WS) strategy to make all models share weights within a supernet; however, WS inevitably incurs search instability, leading to inaccurate model estimation. In this work, we propose an efficient Evolutionary Multi-objective ARchitecture Search (EMARS) framework. We propose a new objective, namely potential, which can help exploit promising models to indirectly reduce the number of models involved in weights training, thus alleviating search instability. We demonstrate that under objectives of accuracy and potential, EMARS can balance exploitation and exploration, i.e., reducing search time and finding better models. Our searched models are small and perform better than prior works on three public COVID-19 3D CT datasets.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16431-6_53

SharedIt: https://rdcu.be/cVD69

Link to the code repository

https://github.com/marsggbo/MICCAI2022-EMARS

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The author pays attention to the problem of searching the neural architecture problem. By proposing an objective, which is called potential, the author balances the exploitation and exploration operation on finding out promising models and reducing the search space during weight training.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The author proposes a new objective which can be optimized during training, namely potential.
    2. By mutating and cross-overing operations, the author balances the exploitation and exploration during weight training step.
    3. Numeric experiment shows the effectiveness of the method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Details in preliminary is controversial. The author mentions the advantages of mini batch but chooses size 1 as the best size. These two opinions are opposite.
    2. In section 3.1, the author mentions that E is a vector recording the epochs, then it should be a list of numbers of epoch count. How can the transpose of such a vector multiply itself in formula 3? What is the meaning of (E^TE)^(-1) and E^T F? The number of specific item influences the results if formula 3 is right.
    3. Results of experiments should be put more clearly. For example, figure2 is very hard for comprehend a conclusion.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The author provides code and data, but not a requirement for acceptance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The author should describe the potential object in a clearer way, especially the formula 3.
    2. The author should rewrite some paragraph and fix the typos, such as 8.57 hours instead of 8.57 ours.
    3. The author should explain the experiment results in a clearer way. In fig 2, the axis title is missing and the conclusion drawn from three figures is missing in the title.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    From the perspective of models,

    1. The author proposes a new object for optimize, namely potential. The author should put more clearly on the definition. In formula 3, the calculation of potential is confusing. What is the meaning of multiply a vector of number of epochs and its inverse? Why is the higher the potential, the more promising the model?
    2. The one-sample mini-batch is controversial. I think such a trick cannot be included in mini-batch range.
    3. What is the border line between exploitation and exploration? Under what circumstances we choose exploitation? From the perspective of experiments,
    4. In the right bottom picture of fig 1, the author points out three kinds of points. But what do the x-axis and y-axis mean? The title is missing.
    5. The fig 2 and fig 3 is hard to track. It is better to affiliate the conclusion with the title. From the perspective of writings and organization, There are many typos that need to be fixed. In fig 1, the one hot vector is wrong for MBConv
  • Number of papers in your stack

    2

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The paper proposes a neural architecture search based on “potential”, a regression parameter fitted to the history of model accuracy. Higher values of “potential” indicate more promising architectures (subnets of a supernet) which are then more likely to be further exploited by a evolutionary algorithm. The method is evaluated on three separate COVID classification tasks (all 3D CT chest images) and shows promise of previous methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The use of “potential” for evulation-based neural architecture search seems interesting and novel, at least in the context of medical image classification.

    The paper is well-written and interesting. The application is relevant to the medical imaging community.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The evaluation of this approach is quite narrowly focused on COVID chest CT classification. The architecture search algorithm itself seems more general and it would have been interesting to see the performance on several other classification tasks, including for example 2D chest x-ray classification.

    Furthermore, the baseline comparisons are based on methods proposed for the same sets of data (also both by the same group of authors). It could have been interesting to see how the proposed method compares other popular neural architecture search techniques, e.g. DARTS.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Very good. All results are based on public datasets and the authors promise to make code avaialbe.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The method seems relatively general and therefore I would have liked to see it being evaluated on a more varied set tasks, e.g. both 2D and 3D medical image classification.

    I realize that the change to segmentation tasks might require more changes to the pipeline and should be reserved for future work (as mentioned by the authors).

    Typo: Fig. 2a) “ours” -> “hours”

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While I find the evaluation a bit narrow based on only one task (3D chest CT classification), the paper overall is well written and results are convincingly presented and are promising. Therefore, I would tend to accept this work.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    After reading the other reviews and rebuttal, I stay with my score. Seems like some of the negative points raised by other reviewers (batch size) are based on misunderstandings. The authors have addressed those concerns well in the rebuttal.



Review #4

  • Please describe the contribution of the paper

    This paper proposes a more stable neural architecture search (NAS) approach for three COVID-19 prediction from 3D CT data. Based on the observation that weight-sharing (WS) strategy in NAS incurs search instability, the author proposed a new objective called potential, which is used to explore more ‘potential’ or promising sub-nets. Combined with accuracy, the proposed method can get more compact model with better performance on three public COVID-19 3D CT datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper tackle an important problem as NAS instability is a big issue for neural architecture search.
    2. The experiments show a improvement over three public COVID-19 3D datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Search 3D neural architecture is not new[1], and it seems not hard to transfer such method to 3D CT data. The authors could highlight the difference caused by the domain knowledge. More important, the Regression Methods[2], like ordinary least squares (OLS), random forests (RF), Bayesian linear regression (BLR) are commonly used in NAS for performance evaluation and NAS acceleration. From this perspective, it seems the novelty of this paper is limited. The authors should compare the proposed potential objective to such methods to prove its advantages.

    [1] Video Action Recognition Via Neural Architecture Searching, 2019. [2] Accelerating Neural Architecture Search using Performance Prediction, 2018.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors claimed to release the code once accepted.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. As I mentioned in the weakness part, please highlight the difference between the proposed method and the other NAS acceleration methods, like [2].
    2. Seems the authors did not provided the searched architectures for different datasets. Are the architectures are very similar or very different from each other? In addition, what kind of insights we can get from the resulted neural architectures?
    3. In the abstract, the authors mentioned that: Recent works show that no model generalizes well across CT datasets from different countries. Basically, for my understanding, the authors searched three architectures for three COVID-19 datasets from three different countries. That would be great if the authors could search for one unified architecture which provide superiority over all datasets.
    4. Minor issues: I suggest to make all ‘i.e.,’,’e.g.,’, and ‘et al.’ in italic. Method [16] in table 1 should have a name.
    5. Typos: section3.3, relu6 should be relu. Please also check other parts.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    As mentioned before, the novelty of the method.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The efficiency of neural architecture search (NAS) is important for computationally expensive tasks such as 3D medical image analysis. Based on weight-sharing NAS, this paper proposes a method to reduce the instability of the search without removing the promising models. The results show that models with small sizes and high accuracies can be found.

    There are several comments from the reviewers that need to be addressed:

    1. In the introduction, more focus should be put on the existing frameworks of NAS on medical image computing (e.g., past MICCAI papers), and their differences from this submission.
    2. The main contribution of eq (3) needs to be properly explained. For example, clarifying the shapes of E and F (are they model/subnet-dependent column vectors?) can be helpful.
    3. The searched architectures should be provided. Can they be released with the code?
    4. There is concern on the batch size. It is known that small batch sizes are usually used for memory-demanding 3D analysis and a batch size of 1 is not uncommon. May be the authors can briefly discuss about the batch size?
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4




Author Feedback

We sincerely thank all reviewers and ACs for their valuable comments. We are highly motivated by their finding our work effective [R1, R4], well-structured and novel [R3], and acknowledging the important issue of NAS instability [R4]. Besides, we want to point out that R1’s comments about the batch size and potential calculation are not correct. Below we will address the major concerns.

  1. Differences from previous NAS on medical data (AC,R4): We have carefully inspected previous MICCAI NAS methods and the works mentioned by R4 [1][2] and summarized several differences from our work. 1) we use the MBConv-based structure (Fig. 1) that can achieve competitive results and require fewer FLOPs than the cell-based structure in [1] under a similar parameter size. 2) Although [2] also uses time-series (TS) validation performance to predict model performance, it requires training all models independently for hundreds of epochs, consuming many resources. In contrast, we use the weight-sharing (WS) strategy to reduce the search cost, and we found that simply using TS performance (i.e., potential) will limit the exploration of the evolution for WS NAS and lead to a local optimum (see Fig. 3b). Thus, we combine potential with the current validation accuracy and show that this can balance exploration and exploitation to find better models. 3) Most previous evolution-based NAS methods (e.g., HS-NASNet (MICCAI2019) and C2FNAS (CVPR2020)) train all models independently. BiX-NAS (MICCAI2021) speeds up search via weight sharing but only considers validation performance and model size. We believe our potential metric would further benefit BiX-NAS.

  2. More explanation of potential in Eq. 3 (AC,R1): 1) “What is the meaning of (E^TE)^(-1) and E^TF? (R1)” The potential (P) is a scalar value calculated by ordinary least squares, i.e., P= (E^TE)^(-1)E^TF, where E and F are subnet-dependent column vectors with the same shape of k×1, recording the epochs and the validation accuracy. (E^TE)^(-1) and E^TF are a whole and cannot be taken apart. 2) “why is the higher the potential, the more promising the model? (R1)” Potential estimates the relative stability of a model. Specifically, even if the absolute performance of a model is not high, if its performance increases steadily (i.e., higher potential), then it can still get sampled. In other words, the potential metric gives those models that converge slowly a fair chance to compete. Our results show that searching under the potential (Fig. 3b) will gradually focus on promising models and eliminate those that perform poorly or are unstable in the later search stages.

  3. Concern about batch size (AC,R1): AC and R1 may misunderstand Eq. 2, where B is not the batch size but the number of subnets sampled in each batch. In the search stage, the batch size is 32. For example, if B=4, for each batch, we will randomly sample 4 subnets and perform the forward and backward processes on each subnet using the same batch data, and finally we accumulate the gradients of these 4 subnets to update the weights of Supernet. Our experiments find that B=1 works.

  4. That would be great if the authors could search for one unified architecture which provides superiority over all datasets (R4): We need to argue that existing studies show that no single model fits all datasets due to domain gaps between datasets. The advantage of NAS is to find superior models without AI experts.

  5. The searched architectures and the source code (AC,R4): We will open-source them once acceptance.

  6. Unclearness in Fig. 2 and Fig. 3 (R1): Fig. 2 gives the search results under model size and accuracy objectives. It shows that small models can also achieve good performance. Fig.3 compares the effects of accuracy and potential on search results. We will affiliate the conclusion with the title.

  7. Typo issues (R1,R3,R4): We have fixed all typos suggested by reviewers. R4: “relu6 should be relu”. ReLU6 is not a typo, where ReLU6(x)=min(max(0,x),6).




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The efficiency of neural architecture search (NAS) is important for computationally expensive tasks such as 3D medical image analysis. Based on weight-sharing NAS, this paper proposes a method to reduce the instability of the search without removing the promising models. The results show that models with small sizes and high accuracies can be found.

    The authors have properly addressed the reviewers’ concerns in the rebuttal. The authors have clarified the misunderstanding of AC and R1 on the batch size. The clarify of eq (3) is improved with proper explanation. The concern of R4 on the innovation is also addressed with references. Overall this is an interesting paper for efficient NAS on 3D analysis.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal was able to address most of the reviewers’ concerns and clarify the misunderstandings. However, the paper does not necessity offer new methods. The application is interesting and useful but methodologically the paper lacks novelty required for MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents a NAS approach for COVID-19 3D CT classification. The rebuttal seems to well address the concerns from the reviewers, especially addressing the misunderstanding about batch size. I recommend accept.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



back to top