Merging acquisition and processing of cineMRI of the vocal tract

Scientific environment and context


Tracking the position of speech articulators along time is crucial to understand speech production better. For a long time X-ray imaging was the only technology able to acquire images at a sufficiently high sampling frequency (around 50 images per second) for visualizing articulatory gestures. However, this technique has been abandoned at the end of the eighties because of the health hazard implied by ionizing radiations. Furthermore, since the whole vocal tract is projected on the image plane contours of organs (especially the mandible, teeth and tongue) overlap on the images making the processing of images very difficult.

The interest of Magnetic Resonance Imaging (MRI) is to provide an excellent contrast of soft tissues for a slice placed in any orientation. Dynamic MRI is acknowledged as a powerful tool for imaging speech production  [9]. It has also the potential to make soft tissue structures as the pharyngeal cavity and internal musculature visible, as well as the articulatory dynamics in the form of an arbitrary-orientation tomography. These capabilities have already been demonstrated through several studies of the speech articulator movements [11, 2, 1], studies about various manners of singing [10, 7] and methodological works about the improvement of spatio-temporal resolution through the utilization of parsimony constraints.

However, current performance of cineMRI remains inadequate in terms of sampling rate and spatial resolution, and the objective of this thesis is to develop more efficient acquisition protocols and algorithms.

Scientific environment

Both laboratories IADI and LORIA have developed a narrow and fruitful collaboration for years which in particular resulted in the development of a “compress sensing” acquisition algorithm and in a research contract on articulatory synthesis.

A working environment covering articulatory modeling and MRI data acquisition domains is now available and will offer very favorable conditions for this work. With the publication of 22 articles on MRI in international refereed journals and the codirection of 8 theses Dr. Pierre-André Vuissoz will pass his knowledge on moving organ imaging to the candidate. The expertise in vocal tract modeling will be provided by Dr. Yves Laprie, director of research at CNRS and project leader of the ArtSpeech national project dedicated to articulatory synthesis of speech.

Socio-economic partnership

The acquisition of dynamic images of the vocal tract is an extremely important topic from the MRI technology and medical points of view, and evidently from that of applications.

Beyond the study of speech, advances allowed by this work concern dynamical physiological phenomena (respiration, heartbeats, swallowing…) which require dynamic acquisition to make a diagnosis. Concerning language disorders dynamic imaging allows the nature and the amplitude of the articulatory deficiency to be efalua|e$, and helps surgical interventions to be prepared so as to limit or control their impact on speech articulation. Development and evaluation of MRI acquisition protocols is a specific strength of IADI laboratory. Works conducted in this thesis would probably lead to industrial applications and cooperation with the manufacturer of the new MRI machines the Nancy hospital is now purchasing. It is too early to give more details since the purchase of these machines is not finalized and their implementation will start in September 2016.

Talking heads represent the second field of dissemination. There are more an more utilized in a pedagogical framework including language acquisition. However, concerning rehabilitation of articulatory gestures it is important that the visualization matches gestures of a human speaker well. So far most of the talking heads incorporating a vocal tract utilizes caricatured gestures which do not help users much. The construction of a realist articulatory and animation model relies on the determination of geometrical deformation modes from dynamic data covering the whole vocal tract. Dynamic MRI is the only technique which allows this kind of data to be collected. We already have contacts with a small company developing speech therapy software and we intend to find industrial partners in the domain of language learning.



The objective is to develop protocols by exploiting the latest advances in MRI, particularly parallel imaging [8] and reconstruction under parsimony constraints called “compressed sensing” [5]. IADI laboratory developed MRI reconstruction techniques with movement compensation [6] and multi-slice dynamic reconstruction enabling super resolution. These techniques have been already applied to cardio-respiratory movements by using physiological signals (ECG and respiratory) as constraints for the reconstruction algorithms.

A first preliminary work consisting of applying these techniques to the domain of speech production has been carried out and was reported in a workshop dedicated to the utilization of MRI for studying movements [12]. A second work was dedicated to the development of an acquisition protocol based on “compressed sensing”. The idea is to exploit the parsimony of the image Fourier transform coefficients in order to acquire only a small number of them, and then to reconstruct the image in an optimal manner.

However, it is possible to do better since the speech signal is acquired simultaneously via an optical microphone and then denoised (by using source separation techniques developed in the LORIA MultiSpeech team) before being segmented into speech sounds. Therefore, the contribution of each line acquired in the image Fourier space can be related to the speech sound it corresponds to, and one can take advantage of this information to improve the resolution of reconstructed images. This is the first idea that will be exploited with the objective of realizing a proof of concept of automatic acquisition/reconstruction of MRI images of vocal tract during speech production.

The exploitation of cineMRI is also a crucial point since it is inconceivable to process so great a number of images by hand. Developing algorithms intended to track articulators in the cineMRI is the second aspect of the work. As part of a national project a 3D MRI database of articulatory configurations is being collected so as to build an articulatory model of the vocal tract. Unlike cineMRI these images have a very good resolution and cover a large variability of static articulatory positions. A first idea would be to exploit the articulatory model as shape constraints when processing dynamic images, with the additional interest of improving their quality by relying on the quality of static images. We would like to go further by utilizing the knowledge of the speech sound and the approximative vocal tract shape predicted by the articulatory model to pilot acquisition.

The success of this PhD work largely relies on the joint utilization of these two ideas. Obtaining cineMRI standing out from other approaches through their spatial and temporal resolutions would be a remarkable achievement for research in speech production.

This is a highly multidisciplinary work requiring narrow collaboration between specialists of automatic speech processing, articulatory modeling and MRI reconstruction algorithms.

The candidate is expected to have a very good knowledge in the design and implementation of numerical algorithms which could be applied in the domains of image reconstruction from MRI data and speech processing, and of course a very good mastery of Matlab.


Yves Laprie (MultiSpeech team at LORIA – Laboratoire Lorrain de Recherche en Informatique et ses Applications, UMR 7503)
Pierre-André Vuissoz (IADI – Imagerie Adaptative Diagnostique et Interventionnelle, unité INSERM U947)

Both supervisors are from Doctoral School IAEM and got habilitation to supervise doctoral research.

How to apply

In order to prepare a PhD thesis within the Lorraine Université d’Excellence Program, the interested candidate should consult the PhD topics offered in each social and economic challenges.
These PhD thesis topics are proposed by faculty members or researchers accredited to supervise research.

Candidate application period: according to graduate school schedule (visit each topic)
Each candidate may submit an application on up to three separate research topics.

Application analysis period by each graduate school
The graduate school reviews the applicants for a doctoral contract in the relevant disciplines. They check the level of supervision for each supervisor and the situation of trained doctors. Each candidate will meet the laboratory director, supervisor and a representative from the graduate school. This interview is to identify the candidate’s motivations and suitability as a candidate for the PhD project proposed by the supervisor. A recommendation will be made to the graduate school. This will summarize the strengths and/or weaknesses of the application.

PhD grants will include monthly income for the PhD student (roughly 1700 € for research only, complement can be provided for teaching missions) and environment for research in the research unit.

Please be aware that in order to offer a variety of subjects, more positions are posted here than available funding. The LUE executive committee will make the final choice on the granted funding (up to 12 positions), based on the recommendations by the doctoral schools.