Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yuetan Chu, Longxi Zhou, Gongning Luo, Zhaowen Qiu, Xin Gao

Abstract

X-ray computed tomography (CT) is indispensable for modern medical diagnosis, but the degradation of spatial resolution and image quality can adversely affect analysis and diagnosis. Although super-resolution (SR) techniques can help restore lost spatial information and improve imaging resolution for low-resolution CT (LRCT), they are always criticized for topology distortions and secondary artifacts. To address this challenge, we propose a dual-stream diffusion model for super-resolution with topology preservation and structure fidelity. The diffusion model employs a dual-stream structure-preserving network and an imaging enhancement operator in the denoising process for image information and structural feature recovery. The imaging enhancement operator can achieve simultaneous enhancement of vascular and blob structures in CT scans, providing the structure priors in the super-resolution process. The final super-resolved CT is optimized in both the conventional imaging domain and the proposed vascular structure domain. Furthermore, for the first time, we constructed an ultra-high resolution CT scan dataset with a spatial resolution of 0.34×0.34 mm^2 and an image size of 1024×1024 as a super-resolution training set. Quantitative and qualitative evaluations show that our proposed model can achieve comparable information recovery and much better structure fidelity compared to the other state-of-the-art methods. The performance of high-level tasks, including vascular segmentation and lesion detection on super-resolved CT scans, is comparable to or even better than that of raw HRCT. The source code is publicly available at (***).

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43999-5_25

SharedIt: https://rdcu.be/dnwwE

Link to the code repository

https://github.com/Arturia-Pendragon-Iris/UHRCT_SR

Link to the dataset(s)

N/A


Reviews

Review #3

  • Please describe the contribution of the paper

    In the manuscript, the authors have proposed a novel dual-stream diffusion model framework for CT super-resolution (SR) that incorporates a dual-stream structure-preserving network in the denoising process to realize better physiological structure restoration. They designed a new image enhancement operator to model the vascular and blob structures in medical images. They additionally developed an ultra-high-resolution CT scan dataset for training and testing the SR task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Overall, the manuscript is well written and well presented. The authors have addressed a challenging problem with real-life clinical significance. The performance of the proposed methodology has been demonstrated over one existing and two newly generated datasets, showing its improved performance over SOTA approaches for the SR, detection, and segmentation tasks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The Methods section is not properly explained. In particular, there are several issues and inconsistencies with the equations.
    2. The statistical significance analysis for the detection and segmentation performances should be included.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have stated that they will make all the scripts (both training and testing) and the generated datasets public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    In the manuscript, the authors have addressed a very challenging and clinically relevant problem. The manuscript is overall well written, and the experimental analysis is well presented. There are few concerns regarding primarily the methods involving mathematical equations and the statistical significance analysis of the experimental results, as described below:

    1. The parameter p in Eq (1) is not defined.
    2. Eqs (4) and (5) are not explained. Is the parameter \gamma in Eqs (4) and (5) same as the noise level indicator in Eq (1)?
    3. \lambda_{L1} is written as \lambda_{1} in Eq. (6)
    4. \lambda_{2} in Eqn (9) is not defined. Is this same as \lambda_2 in Eq (3)-(5)?
    5. A detailed statistical significance analysis for the results in Figure 4 should be provided. Explicitly describe the cases where the performance of SR results is comparable or better or worse than that the ground-truth CT.
    6. Figure 1 is not explained and referenced within the manuscript text.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The manuscript addresses a very clinically relevant problem which has the potential of transforming future diagnostic procedures. The authors have well formulated the problem and attempted to solve it in multiple applications over three, including two newly generated, datasets. There are few concerns which should be thoroughly rectified in order to improve the clarity and readability.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    The paper proposes a novel dual-stream model framework for super-resolution addressing the problem of topology distortion and artifact introduction. The main contributions are a new image enhancement operator to model the vascular and blob structures in medical images, and a novel enhancement module consisting of lightweights convolutions for the optimisation of the backpropagation in structural domain. The authors also constructed an ultra-high resolution CT scan dataset as a training resolution training set. Evaluation has been carried out using three different datasets (two of them in-house data collected) and it shows that results outperform SOTA super-resolution methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well written and structured. The contributions are clear and well described.
    • The proposed model for CT scan super-resolution to generate UHRCT results follows a novel approach. The proposed architecture is based on a dual-stream diffusion model framework that preserves structural information. In particular they proposed an imaging enhancement operator for vascular and blob structure and an optimised approach that replace the filtering operation with lightweight convolutional layers for faster and easier back-propagation.
    • The authors constructed an in-house UHRCT scan dataset for training and testing the SR tasks.
    • Extensive experiments have been performed to assess the results: qualitative and quantitative comparison with state-of-the-art works have been presented. The results show that the proposed model can achieve better structure fidelity than SOTA.
    • Further experiments on vascular segmentation and lesion detection have been performed on the super-resolved CT scan to show that the final outcome is comparable with results obtained on the ground truth image.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Some step on the methodology could be further explained and justified. For example, the authors state that the image enhancement operator involves complex calculation and makes the back-propagation difficult. Thus, the necessity of some optimisation. But there is no information about the computational complexity, memory allocation and performance.

    Some images could be better or further explained. Figure 1 is not mentioned in text and figure 3 has some colours that have not been explained.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The author will make the code publicly available so the work will be fully reproducible. Also, the architecture design is clear. Publicly available dates are referenced and properly described. It is not clear in the paper if the in-house constructed dataset will be made available too.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The paper is well written and structured. Overall the contributions are clear and the evaluation has been exhaustive and rigorous. I have some comments/feedback regarding the methodology sections:

    • in Figure 2 in the DSSP module there are two blocks, the Fusion block and the Structure block that have not been fully explained, or if they were explained then it was not clear. Also, in the legend there is a typo. It is written that both are Concolutional Layers, instead of Convolutional.
    • Also, I would add some information about the complexity of the image enhancement operator and the need of its simplification.
    • Figure 1 has not been discussed in text. I could not find any reference for it. I strongly recommend to add few sentence for the figure in the introduction.
    • Figure 3: there are some red and green boxes on the CT slice but it is not explained the use of the two different colours and their meaning.

    Some minor comments:

    • In the abstract the acronym HRCT has been used but it is defined for the first time in the introduction. I would put the whole meaning in the abstract too.
    • In the Conclusion section (6th line) there is the word “ream” that I believe is a typo. Please check it.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall the paper is a very good work where contributions are clear and well explained. The evaluation is rigorous and different datasets have been used for qualitative and quantitative comparisons. I would request just to improve the description of some steps in the methodology but over the work is excellent.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    This paper presents a novel approach for CT super-resolution using a dual-stream diffusion model framework. Compared to previous work, the super-resolution outputs are optimized not only by image-space losses but also by structure-space losses to preserve topology and structure fidelity during the diffusion denoising process. Such refinement leads to better physiological structure restoration. Additionally, this paper establishes an ultra-high-resolution CT scan dataset with a spatial resolution of 0.34 × 0.34 mm² and an image size of 1024 × 1024 for training and testing the super-resolution task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper establishes an ultra-high-resolution CT scan dataset with a spatial resolution of 0.34 × 0.34 mm² and an image size of 1024 × 1024 for training and testing the super-resolution task, which contains 87 scans. Ultra-high spatial resolution CT may lead to a reduction in artifacts such as blooming, as well as an increased ability to quantify features of anatomical and pathological structures. Training a model for such ultra-high super-resolution is meaningful.
    2. Deep learning-based super-resolution methods can generate promising results, but there can still be geometric distortions and artifacts along with structural edges in the super-resolved results. The method proposed by this paper can solve this problem to some extent. This paper uses a dual-stream diffusion model framework for CT super-resolution. By employing an imaging enhancement operator and refining the objective function, the method proposed by this paper can preserve topology and structure fidelity during the super-resolution process and realize better physiological structure restoration.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The inference speed is slow – this method requires 62.35±5.63 seconds to process one image. However, there are already a number of acceleration methods available for diffusion models[1][2], which are not adopted by this paper.
    2. Compared with SR3[3], the innovation of this paper is somewhat less. This paper uses a dual-stream model to replace the original denoising model. This dual-stream model contains two Res U-net, which has almost twice as many parameters as the original denoising model. Therefore, it is argued that the performance gain is realized by the larger parameters, but not the structure innovation.
    3. Some improvements is too trivial and they are not validated by ablation experiments, such improvements include a) The combination of L1 loss and L2 loss; b) The combination of L_{SR}^{pixel} and L_{SR}^{struct}; c) Substitute F_C() with O_{F_C}() . For the combination of L1 loss and L2 loss, it already has solutions, such as smooth L1[4] and Charbonnier Loss[5].

    [1]Song J, Meng C, Ermon S. Denoising diffusion implicit models[J]. arXiv preprint arXiv:2010.02502, 2020. [2]Lu C, Zhou Y, Bao F, et al. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps[J]. arXiv preprint arXiv:2206.00927, 2022. [3]Saharia C, Ho J, Chan W, et al. Image super-resolution via iterative refinement[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022. [4]Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448. [5]Lai W S, Huang J B, Ahuja N, et al. Fast and accurate image super-resolution with deep laplacian pyramid networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2018, 41(11): 2599-2613.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper does not describe the specific network architecture but provides implementation details in the appendix, which includes the train-val-test partition, input size, implementation framework, learning rate, optimizer, hardware and so on. However, other hyper-parameters are not given. The source code may be publicly available later. Additionally, this paper has two in-house datasets that are not publicly available but provides details in the appendix, including the number of scans, ages, sex, spatial resolution, image size, inter-slice thickness, tube voltage, CT scanner and so on.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    1.This paper’s description of the method is not very clear: In section 2.1, the author defines the optimization as Equation 1, in which the network G_DSSP predicts the added Gaussian noise \epsilon, while in section 2.3, the author defines the pixel-wise loss in the image domain as Equation 6, in which the network D_{DSSP}^{image} predicts the reference image y^t. So, whether the network predicts noise or reference image is unclear.

    1. In section 2.3, Author replace the original image enhancement operator F_C with the convolution-based operator O_{F_C}, but there’s no description of how it’s done and no reference given. 3.The symbols in Figure 2 and Equation 6 is confusing. In Figure 2, the network G_{SDDP}^{image} get the source LRCT x and noisy HRCT y_{t-1} as input, and outputs y_t, while in Equation 6, the network G_{SDDP}^{image} get the reference image y^t as input and predicts reference image y^t. This is hard to understand. 4.Ablation experiments are lacking: a) The effect of the combination of L1 loss and L2 loss; b) The effect of the combination of L{SR}^{pixel} and L_{SR}^{struct}; c) The effect of substituting F_C with O_{F_C}; d) The effect of the hyper-parameters lambda_1 and lambda_2. 5.The symbols in section 2.2 and section 2.3 are repetitive, both have lambda_1 and lambda_2. 6.Figure 1 is not referenced in the text. 7.Why is the restored high resolution images still mushy in Figure 3. 8.In Figure 3, the effect of preserving topology and structure fidelity is not so obvious, it is best to put red boxes on significant regions. 9.The results indicate that the performance of high-level tasks, such as vascular segmentation and lesion detection on super-resolved CT scans, is comparable to or even better than that of raw HRCT. An explanation of why this is the case may be required.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper introduces a novel approach for CT super-resolution using a dual-stream diffusion model framework. Compared with previous work, the super-resolution outputs are optimized not only by image-space losses but also by structure-space losses to preserve topology and structure fidelity. Such innovation is interesting. However, the slow inference speed, larger network parameters, absence of ablation experiments and all sorts of little mistakes in paper writing make it weekly accepted.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #5

  • Please describe the contribution of the paper

    The paper presents a method for super resolution in CT imaging. The method is based on a diffusion model. The paper presents a comparison with other state of the art methods and is accompanied with useful data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed methods appears to outperform the state of art (Figure 3 and Table 1)
    2. The extension to higher level tasks (Figure 4) allows the reader to assess the possible clinical impact of the method.
    3. The authors indicate that the source code and data will be made publicly available. If this is the case then the paper will provide a useful resource for other researchers.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The comparison with other methods (table 1) lacks any tests of statistical significance. These should be included.
    2. The authors claim in the text (pg8) that “The performance of of these high level tasks on the SR results is comparable to or even better than that on the ground-truth CT”. This claim isn’t well supported by figure 4. Certainly for lung nodule detection the average accuracy looks slightly less for the SRCT method. So they you either need to revise your claim or develop your analysis of the results further.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have indicated that they will publish source code and data. In which case I would expect that the work will be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. In general the paper reads well and I believe presents a good description of the methods. The authors should attempt to address the main weaknesses of the paper (4 and 5) above.
    2. It would be good to see an expansion of the section on extension to higher-level tasks to better understand the potential clinical utility of the method. As a start you could add the LRCT to the comparison on Figure 4 to get a better understanding of the benefits (or otherwise) of SRCT.
    3. Methods that rely on inferring structure based on a noisy image and learnt knowledge of similar structures are going to be vulnerable to structural outliers. So whilst they perform well on average (fig 4) are there any examples of failure cases in the data from which we could learn.
    4. Possibly related to point 8. From fig 4. it appears that the ground truth CT results in larger spreads. Could you add some discussion of why this is. Is this because your SRCT method tends to produce images where unusual anatomy is removed. Could you have a look at the outliers from Fig 4 and comment on this?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See points 4 and 5 from the major weakness section. The authors need to build on the analysis of their results.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper is generally well-written and structured, with clear descriptions of the contributions and a novel approach to CT scan super-resolution using a dual-stream diffusion model framework. The proposed model aims to preserve structural information and employs an imaging enhancement operator and optimized convolutional layers for faster back-propagation. The authors constructed an in-house ultra-high-resolution CT (UHRCT) dataset for training and testing, which is significant for training models that achieve ultra-high super-resolution. Extensive experiments were conducted, comparing the proposed model to state-of-the-art methods, and the results demonstrated better structure fidelity. Additional experiments on vascular segmentation and lesion detection showed comparable outcomes to the ground truth images. However, the reviewers raised several concerns, including the need for further explanation of the methodology, statistical significance analysis, and clarity in figures. They also suggested conducting ablation experiments to validate certain improvements and addressing inconsistencies in equations and symbols. The computational complexity and performance of the image enhancement operator should be discussed, and references to figures should be included in the text. Furthermore, the reasons for the “mushy” appearance in Figure 3 and the effects of preserving topology and structure fidelity should be clarified. The paper should provide more explanation regarding the performance of high-level tasks on super-resolved CT scans and address weaknesses such as the lack of statistical significance analysis and failure cases. The reviewers recommend expanding the section on extension to higher-level tasks and discussing the differences in spreads between ground truth CT and SRCT results in Figure 4, particularly related to the removal of unusual anatomy and outliers. By addressing these concerns and incorporating additional explanations and statistical analysis, the paper can be further improved.

    The work might be accepted if modifications can be done as suggested above.




Author Feedback

We express our gratitude for this valuable opportunity to refine our manuscript. The comprehensive and constructive feedback provided by the reviewers is greatly appreciated. Nevertheless, we have identified some misinterpretations or inaccuracies in the reviewer’s findings. We provide our further clarification as follows.

  1. There could be some “mushy” in the high-resolution images in Figure 3. (Meta&R4) The “mushy” appearance of the vessels could be attributed to the window width and window position under the lung window. However, this effect is precisely what we aim to showcase in this paper: the image enhancement techniques we employ are capable of highlighting and modeling this structural information. Moreover, the topology preservation technique can effectively preserve and recover these features in the resulting super-resolution images.

  2. The paper should provide more explanation regarding the performance of high-level tasks on super-resolved CT scans: (Meta&R4&R5) We would like to emphasize that we conducted a very detailed analysis of the results, including a thorough comparison of the different algorithms and an investigation into the reasons for the outliers. However, due to space constraints, we could not include all of these aspects in the article. We note that the SR technique has the capability of denoising images. Specifically, for some methods such as the trivial UNet, denoising can reduce the uncertainty and spatial anisotropy of the image, thereby improving the performance of the task [1]. [1] Dong Z, He Y, Qi X, et al. MNet: Rethinking 2D/3D networks for anisotropic medical image segmentation[J]. IJCAI 2022

  3. Different colors are not explained in Figure 3. We have provided clear instructions for the contents of the figures. Specifically, Figure 3 presents two examples of the super-resolution results, and the two colors represent these two examples, respectively. The restored images are displayed in the first and third rows, while the structural features are presented in the second and fourth rows.



back to top