Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Rushdi Zahid Rusho, Qing Zou, Wahidul Alam, Subin Erattakulangara, Mathews Jacob, Sajan Goud Lingala

Abstract

Magnetic resonance imaging (MRI) of vocal tract shaping and surrounding articulators during speaking is a powerful tool in several application areas such as understanding language disorder, informing treatment plans in oro-pharyngeal cancers. However, this is a challenging task due to fundamental tradeoffs between spatio-temporal resolution, organ coverage, and signal-to-noise ratio. Current volumetric vocal tract MR methods are either restricted to image during sustained sounds, or does dynamic imaging at highly compromised spatio-temporal resolutions for slowly moving articulators. In this work, we propose a novel unsupervised deep variational manifold learning approach to recover a pseudo-3D dynamic speech dataset from sequential acquisition of multiple 2D slices during speaking. We demonstrate pseudo-3D (or time aligned multi-slice 2D) dynamic imaging at a high temporal resolution of 18 ms capable of resolving vocal tract motion for arbitrary speech tasks. This approach jointly learns low-dimensional latent vectors corresponding to the image time frames and parameters of a 3D convolutional neural network based generator that generates volumes of the deforming vocal tract by minimizing a cost function which enforce: a) temporal smoothness on the latent vectors; b) l_1 norm based regularization on generator weights; c) latent vectors of all the slices to have zero mean and unit variance Gaussian distribution; and d) data consistency with measured k-space v/s time data. We evaluate our proposed method using in-vivo vocal tract airway datasets from two normal volunteers producing repeated speech tasks, and compare it against state of the art 2D and 3D dynamic compressed sensing (CS) schemes in speech MRI. We finally demonstrate (for the first time) extraction of quantitative 3D vocal tract area functions from under-sampled 2D multi-slice datasets to characterize vocal tract shape changes in 3D during speech production. Code: https://github.com/rushdi-rusho/varMRI

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16446-0_66

SharedIt: https://rdcu.be/cVRUa

Link to the code repository

https://github.com/rushdi-rusho/varMRI

Link to the dataset(s)

https://github.com/rushdi-rusho/varMRI/tree/main/SpeechDatasets


Reviews

Review #2

  • Please describe the contribution of the paper

    The authors pursue improving the spatio-temporal resolution (and image quality) of dynamic speech imaging using a manifold approach to collect 2D slices and time to arrive at a composite 3D volume over time.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The motivation to pursue unsupervised learning is well founded given that it is challenging to collect a “gold standard” dataset in such acquisitions.
    2. The use of preparing a “pseudo-3D” seems practical and exploits k-t space rather than either a 2D only or 3D only approach
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The challenge in assessing these reconstructions is a clear definition of the requirement of the image quality and target spatio-temporal resolution that is necessary (adequate). These are directly tied to the speech task at hand and defining these constraints might enable the algorithm to optimize better
    2. The comparison to earlier CS methods while important to show improvement do not represent the state-of-the art methods for accelerated spatio-temporal acquisitions such as dictionary approaches (atom based), including simultaneous multi-slice methods, etc. Given the niche application, the exploration of these methods may be limited to describing them in the discussion section for the reader’s comprehension.
      1. In future experiments (or in discussion), it will be interesting to visualize and interpret the latency vector in line with the speech task to enable an understanding of the generator’s outputs.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility criteria have been met

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The authors have strived to improve image reconstruction quality compared to previously available compressed sensing reconstructions. A clear understanding of the target acceleration and image quality required will provide meaningful insights into algorithm development. The approach to leverage a “latency vector” is intuitive as an input to the generator and perform the data consistency step. The strengths and weaknesses have been listed above

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The demonstration of the developed algorithm in vivo is appealing. The lack of a gold standard reference is a challenging yet important step to accomplish and compare.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The authors propose an unsupervised deep variational manifold learning approach for reconstructing pseudo-3D dynamic speech MRI from sequentially acquired under-sampled (k-t) space measurements of multiple 2D slices. The propose method shows better performance than conventional compressed sensing reconstruction methods in 2 initial subjects.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed approach does not need large-scale fully sampled Pseudo 3D dynamic speech MRI for supervised training but reconstructs the image time series only from the measured under-sampled (k-t) data

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The proposed method requires subjective iterative optimization, and the number of iterations should be carefully selected to avoid overfitting Effectiveness only demonstrated in limited number of cases

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have attempted to make the code publicly available, which should ensure good reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    1: According to the data acquisition parameters, the acceleration factor of the multi-slice speech MRI is 9-fold? Could the proposed method be applied for higher acceleration factors or even 3D accelerated speech MRI? Please comment? 2: The reconstruction time of each method should be provided 3: The number of subjects recruited in the study should be clarified in the Section of 3.1

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    No fully sampled data is required for supervised training which is difficult to acquire for the speech MRI

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #4

  • Please describe the contribution of the paper

    In this paper, an unsupervised deep variational manifold learning was applied for temporal-spatial fast speech MRI. The experimental results show some subjective superiority of the proposed method over other TV-based methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The authors explored the speech MRI which to my knowledge is less intensively studied in fast MRI research, yet it is a meaningful task with challenges. (2) The proposed unsupervised learning spares the need for fully sampled training references. (3) The paper is well organized and presented with good image quality.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) From the perspective of method originality, this paper offers little novel idea, which seems apply generative manifold learning in this problem. The “Methods” section is not explained and analyzed in detail, which is the major component of this paper. Novelty should be addressed here. (2) No objective evaluation is applied. If the ground truth is not available, some subjective rating scores could be used.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper is presented clearly and the model is easy to reimplement although the code is not publicized.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    (1) The model novelty should be further clarified and stressed. What is the key difference between this model and references [23.24], except for the applications? In my opinion, the vocal tract shaping during speech has regularity, and different letter has different pattern. However, cardiac imaging show distinct regularity. (2) The equation (1), the gradient operation is applied on both s and t, while in the text it seems only t is differentiated. (3) The authors claimed the network learns noise in the reconstruction when certain number of epochs is reached. I wonder if it is due to the self-supervised nature of the proposed method (over-fitting). More analysis would be helpful.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major factor is model novelty.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes an interesting work that employs unsupervised learning for spatio-temporal speech MRI reconstruction. The application is new and the unsupervised approaches have been less studied in the community. Major concerns of the work include the novelty of the methodology and the evaluation of the unsupervised approach. Please address these in the rebuttal based on reviewers’ comments.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7




Author Feedback

‘NOVELTY SHOULD BE CLARIFIED’. The main novelty in this paper is the systematic adoption of the unsupervised generative manifold scheme to realize prospective accelerated dynamic speech MRI and enable for the first time visualization of 3D vocal tract shaping at a high time resolution of 18 ms at 3 T. We clarify that speech MRI is substantially different from cardiac MRI, which makes it a niche application warranting dedicated investigation. First, speech MRI is significantly challenged by off-resonance artifacts due to substantial susceptibility differences at air-tissue boundaries. This is further exacerbated at the higher field of 3 T, which is why current speech MRI protocols are recommended at 1.5 T [Lingala et. al, JMRI 2015]. Longer readout spirals (6-8 ms readout duration) such as those in current 1.5 T cardiac MRI are time efficient but if used directly in speech MRI at 3 T would induce significant off-resonance induced blurring. To address this, we used extremely short readout spirals (<1.3 ms readout duration), which are less sensitive to off-resonance, but this comes at an expense of needing more spiral arms (or a larger temporal foot print) for full Nyquist sampling. To address this, we synergistically combined acceleration capabilities via parallel imaging from a custom 16 channel airway coil, and the generative manifold scheme, which makes our acquisition scheme significantly different from previous manifold based cardiac MRI work. Second, unlike cardiac MRI, motion in speech MRI is arbitrary, and is dependent on the speech task. Different articulators will move at different rates based on the task. For instance, the tongue tip moves much more rapidly during alveolar trills compared to slower tongue body shaping during vowel to consonant transitions, and even slower velo-pharyngeal closure events. To fully characterize the arbitrary dependence of motion on the speech task, our work used the flexibility in adjusting the number of latent vectors and empirically assessed image quality in two speech tasks from two subjects. We also note that in terms of novelty, this work is the first to demonstrate prospectively a ‘time aligned multi-slice 2D or pseudo 3D’ visualization of vocal tract shaping of arbitrary speech motion at 3 T, and have demonstrated feasibility of extracting quantitative vocal tract area functions.

‘THE LACK OF A GOLD STANDARD REFERENCE’. This is a limitation of this study, and to our knowledge, is a common limitation with all prospectively accelerated 3D+time MRI sequences, where ground truth does not exist. If we have an opportunity to revise the discussion, we will include strategies being employed in the field to address this. This will include a) constructing well-defined dynamic physical phantoms (e.g. representing vocal tract shaping), which may be imaged separately with a high resolution reference method such as dynamic CT, and b) seeking image quality ratings from expert end users (e.g. linguists, radiologists) to objectively assess spatial, temporal blurring, and alias artifacts.

‘THE CHALLENGE IN ASSESSING RECONSTRUCTIONS IS A CLEAR DEFINITION OF THE REQUIREMENT OF THE IMAGE QUALITY AND TARGET SPATIO-TEMPORAL RESOLUTION THAT IS NECESSARY’. This is a great point, and is a scope for a future study. A recent review by [Lingala et. al, JMRI 2015] laid out the spatial, temporal resolution requirements for various speech tasks based on consensus of several speech scientists, and linguists. We will include commentary in the discussion section.

‘COMPARISON TO EARLIER CS METHODS DO NOT REPRESENT STATE-OF-THE-ART’. To the best of our knowledge, this is the first work which uses an unsupervised learning method for speech MRI. We clarify that we compared against only classical CS methods because these are the state-of-the-art for prospectively accelerated speech MRI.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The main strength of the paper lies in the proposal of an unsupervised approach for the reconstruction of dynamic speech MRI, which has not been investigated earlier but remains as an interesting and important problem. The rebuttal has addressed well regarding the concern about its novelty and highlighted the differences and challenges compared to other dynamic MR imaging. Though evaluation is currently limited, the authors have provided a few future directions on tackling this. Overall, the strength of the work outweigh its limitations, and it is suggested for acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The research provides an intriguing effort that uses unsupervised learning for spatiotemporal speech MRI reconstruction. The application is new, and unsupervised techniques have received less attention in the community. The uniqueness of the methodology and the assessment of the unsupervised approach are major issues of the paper as mentioned by the reviewers. However, I agree with the authors that the method is novel, the study is prospective and future follow up clinical evaluations can be done. I think the paper should be accepted.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Overall, the authors have carefully addressed the main concerns of the reviewers, including clarifying the novelty of the proposed methodology, and the experimental evaluation of results without having a golden standard. With the authors’ commitment to incorporate reviewers’ feedback (especially those negative comments from R4) in their revised manuscript, I recommend accepting this paper without further questions.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



back to top