“Perceptuo-motor relationships in speech communication”
- Workshop
organized by J.L. Schwartz, A.L. Giraud, A. Rochet-Capellan & P. Perrier
University of Geneva, Biotech Campus
January 30th, February 1st, 2018
It is now widely accepted that perceptuo-motor relationships are functionally required for both speech perception and speech production. On the one hand, auditory – and more generally sensory – specification of the task is essential for specifying targets and providing feedback in the course of speech motor control. On the other hand, motor knowledge seems to play an important role in the processing of speech inputs, particularly in adverse conditions. However, these two facets of speech communication remain largely separated among research teams and along meetings or conferences. The present Workshop will put the perceptuo-motor relationship at the heart of the discussions, grouping specialists of speech processing, speech motor control and speech neurocognitive architectures. The aim will be to discuss about common processes, representations and circuits, associating experts in speech, brain and computation.
Final Program
Tuesday, Speech |
||
9h-9h30 |
|
Intro. |
9h30-10h30 |
Modelling the role of sensory feedback in speaking with state feedback control. |
|
10h30-11h00 |
|
Coffee break. |
11h00-12h00 |
Altering the motor-sensory loop to understand speech motor control. |
|
12h00-13h30 |
|
Lunch. |
13h30-14h30 |
Lateralized sensory processing during speech production. |
|
14h30-15h30 |
Reaching goals with limited means: Production-perception relationships in typically developing children and sensory-deprived children. |
|
15h30-16h45 |
|
Posters and coffee break. |
16h45-17h30 |
Discussion 1 (Production). |
|
Wednesday, Speech
|
||
9h-10h00 |
A core speech circuit between primary motor, somatosensory, and auditory cortex: Evidence from connectivity and genetic descriptions. |
|
10h00-11h30 |
Posters and coffee break. |
|
11h30-12h30 |
COSMO, a Bayesian computational model of perceptuo- motor interactions in speech communication. |
|
12h30-14h00 |
Lunch. |
|
14h00 -15h00 |
Listening to speech induces coupling between auditory and motor cortices in an unexpectedly rate-restricted manner. |
|
15h00-16h00 |
Early language development: The emergence of the production- perception link? |
|
16h00-16h30 |
Coffee break. |
|
16h30-17h30 |
Discussion 2 (Perception) |
|
Thursday, Time and neural processes |
||
9h-10h00 |
Common cortico-subcortico-cortical ground for action and perception - Considerations for speech processing. | |
10h00-10h30 |
Coffee break. |
|
10h30-11h30 |
Motor origin of temporal predictions in auditory attention. |
|
11h30-13h00 |
Lunch. |
|
13h00-14h00 |
Speech processing in auditory cortex with and without oscillations. |
|
14h00-15h00 |
Discussion 3 (Time and neural processes) |
|
15h00-15h30 |
Coffee break. |
|
15h30-16h00 |
Final discussion. |
|
[Possible post-sessions ...??? Tools, corpora, methods, ...???] |
Posters Patri, Schwartz, Perrier, Diard: What drives the perceptual change resulting from speech motor adaptation? Evaluation of hypotheses in a Bayesian modeling framework Ito: Changes of somatosensory event-related potentials during speech production |
Ito: Event-related potentials associated with somatosensory effect in audio-visual speech perception |
Speakers
Judit Gervain
Judit Gervain works on early speech perception and language acquisition. She is particularly interested in how early perceptual abilities and experience with speech lays the foundations for the acquisition of grammar. She has been using behavioral as well as brain imaging techniques with newborns and young infants to address these questions. She current works as a Senior Research Scientist at the Laboratoire Psychologie de la Perception (CNRS & Université Paris Descartes), Paris, France. For more information, visit: http://lpp.parisdescartes.cnrs.fr/people/judit-gervain/
Talk title
Early language development: The emergence of the production-perception link?
Talk summary
It is currently almost entirely unknown when and how during development the link between producion and perception emerges. This talk will briefly review the few existing studies on how young infants might link speech production to speech perception during the first years of life. It will discuss why this issue is rarely considered when studying early language development (the developmental primacy of perception, methodological problems etc.), and will suggest a few avenues for how this question could be addressed in young infants.
Anne-Lise Giraud
Although single neuron spiking has important coding properties, collective neuronal behaviour as reflected by oscillatory activity signal phenomena reflecting the temporal/spatial integration of spiking activity. Anne-Lise Giraud explores how neural oscillations contribute to auditory processing with an emphasis on speech. Owing to its quasi-rhythmicity speech interacts with the oscillatory behaviour of neuronal cortical population, which provides an interesting way to mobilize collective neural activity. Her group analyses the multiple roles of neural oscillations in speech processing, in particular in speech parsing, speech coding, code transformation and directional multiplexing.
Talk title
Speech processing in auditory cortex with and without oscillations
Talk summary
Perception of connected speech relies on accurate syllabic segmentation and phonemic encoding. These processes are essential because they determine the building blocks that we can manipulate mentally to understand and produce speech. Segmentation and encoding might be underpinned by specific interactions between the acoustic rhythms of speech and coupled neural oscillations in the theta and low-gamma band. To address how neural oscillations interact with speech, we used a neurocomputational model of speech processing generating biophysically plausible coupled theta and gamma oscillations. We show that speech could be well decoded from this purely bottom-up artificial network’s low-gamma activity, when the phase of theta activity was taken into account. Because speech is not only a bottom-up process, we set out to develop another type of neurocomputational model that takes into account the influence of linguistic predictions on acoustic processing. I will present preliminary results obtained with such a model and discuss the advantage of incorporating neural oscillations in models of speech processing.
John Houde
Dr. Houde studies the role of feedback in speech production and the neural substrate of sensorimotor integration in speech. He did his graduate studies in the Department of Brain and Cognitive Sciences at MIT. For his PhD, he developed a realtime audio feedback alteration apparatus and examined how speakers respond to realtime perturbations of the formants of their ongoing speech. Since then, Dr. Houde had been in the Department of Otolaryngology at the University of California, San Francisco (UCSF), where he has studied the neural substrate of speech production, and especially the role of auditory feedback in the process. He is currently head of the UCSF Speech Neuroscience Lab.
Talk title
Modelling the role of sensory feedback in speaking with state feedback control
Talk summary
An important part of understanding the control of speaking is determining how sensory feedback is processed. The role of sensory feedback in speaking suggests a paradox: it need not be present for intelligible speech production, but if it is present, it needs to be correct or speech output will be affected. Here we show that a model of the control of speaking based on state feedback control can account not only for what is known about the behavioral role of sensory feedback in speaking, but also many of our recent findings about neural responses to auditory feedback.
Christian A. Kell
Christian Kell studied Medicine at Goethe University Frankfurt, performed a Postdoc in Cognitive Neuroscience at the Ecole Normale Supérieure in Paris (Anne-Lise Giraud) and was an independent group leader (Emmy Noether fellow) at the Department of Neurology at Goethe University Frankfurt. Nowadays he is co-directing the Brain Imaging Center Frankfurt and the Epilepsy Center Rhine Main and is a consultant in Neurology at the Goethe University. His research focusses on the lateralization of brain function, structure-function relationships and auditory-motor interactions.
Talk title
Lateralized sensory processing during speech production
Talk summary
It is general knowledge that, typically, only the left, but not the right half of the brain, is capable of producing speech. This capacity has been linked to tight interactions between left frontal production and left temporal perceptual systems. When during speech sensory feedback is externally perturbed, the right hemisphere shows activity increases. This has been interpreted as empirical evidence for a left hemispheric specizalization in feedforward control of speech production and a right hemispheric contribution to auditory and somatosensory feedback control. Such a neural organization would represent an unusual architecture in closed-loop systems in case sensorimotor interactions are assumed to occur primarily within the same hemisphere. Using brain imaging studies during overt and covert speech as well as behavioral studies in which auditory feedback is modulated dichotically, we show that the left hemisphere is sensitive to temporal changes in the auditory feedback while the right hemisphere processes preferentially the spectral content of the auditory feedback signal. During speaking, somatosensory feedback processing is biased to the left hemisphere. While the somatosensory feedback carries information on place of articulation, processing of auditory feedback in the left temporal lobe involves analyses of phonetic detail as finely grained as consonant voicing. Our results identify temporal and spectral sensory processing as determinants of functional lateralization instead of the previously proposed dichotomy between feedforward and feedback control."
Sonja A. Kotz
Sonja A. Kotz is a cognitive, affective, and translational neuroscientist who investigates the role of prediction in multimodal domains (perception, action, communication, music) in healthy and clinical populations using behavioural and modern neuroimaging techniques (E/MEG, s/fMRI). She holds a Chair in Translational Cognitive Neuroscience at Maastricht University in the Netherlands, is a Research Associate at the Max Planck Institute for Human Cognitive and Brain Sciences in Leipzig, Germany, has multiple honorary positions and professorships (Manchester & Glasgow Universities, UK; Leipzig University, Germany; Washington D.C., Georgetown University, USA; Montreal, BRAMS, Canada), and is currently the President of the European Society for Cognitive and Affective Neuroscience. She also works for multiple funding agencies in Europe including the ERC as well as a senior editor for leading journals in the field of cognitive neuroscience.
Talk title
Common cortico-subcortico-cortical ground for action and perception - Considerations for speech processing
Talk summary
While the role of forward models constituting cross-communication between cortical and subcortical areas is well established in the motor domain, there is now also more recent evidence that sensory encoding of time in basic and more complex auditory stimuli (e.g. sound and speech) similarly engages cortico-subcortico-cortical circuitry (Kotz & Schwartze, 2010; Schwartze and Kotz, 2013). For example, animal studies indicate that crossed cortico-cerebellar pathways, originating in superior posterior temporal regions, project towards paravermal cerebellar areas (e.g. Schmahmann et al. 1991). However, only a few studies have considered posterior temporo-cerebellar effective connectivity in humans (e.g. Pastor et al. 2006, 2008). Considering the functional relevance of a temporo-cerebellar-thalamo-cortical circuitry that aligns with well known cerebellar-thalamo-cortical connectivity patterns in the motor domain (e.g. Ramnani, 2006), one may consider that cerebellar computations apply similarly to temporally coded basic and complex auditory information as (i) they simulate cortical information processing, and (ii) cerebellar-thalamic output may provide a possible source for internally generated cortical activity that predicts the outcome of cortical information processing in cortical target areas that (iii) possibly provide a temporal frame for cortical temporal information processing (Knolle et al., 2013; Kotz et al., 2015). I will discuss our current conceptual thinking (Kotz & Schwartze, 2016; Schwartze & Kotz, 2016) as well as new empirical data in support of these considerations and present an extended cortico-subcortical network involved in the temporal processing of basic and complex auditory information such as speech.
Lucie Ménard
Lucie Ménard is full professor of Phonetics at the Department of Linguistics of the Université du Québec à Montréal (Canada), associate Professor at the School of Communication Sciences and Disorders of McGill University, and co-director of the Montreal's Center for Research on Brain, Language, and Music. She received her PhD in Speech Sciences from the Université Stendhal (Grenoble) and did her doctoral research at the Institut de la Communication Parlée (now GIPSA-Lab). She also completed a Postdoctoral Fellowship in Theoretical phonology at the Université du Québec à Montréal. Her research focuses on the development of speech production and perception in sensory-deprived populations
Talk title
Reaching goals with limited means: Production-perception relationships in typically developing children and sensory-deprived children
Talk summary
When learning a language, children face major challenges. They must cope with anatomical changes in the vocal apparatus, refinements of the perceptual system, motor control development, social development, cognitive and neuronal maturation, etc. Although most of the vowels and consonants in a child’s language can be produced by age four and understood by peers, changes continue to occur in production and perception into early adolescence. In this presentation, we will discuss the emergence and refinement of production-perception relationships through a series of studies conducted with typically developing children and sensory-deprived children (deaf or blind children). Acoustic, kinematic, and perceptual data collected in contexts representing various degrees of saliency requirements will be presented. We will show how sensory templates built from impoverished input influence production strategies.
Benjamin Morillon
Benjamin Morillon is a cognitive neuroscientist interested in auditory neuro-physiology and how information is sequentially encoded in the human brain. He investigates the role of slow cortical oscillations as instruments of sensory selection, and also studies the influence of the motor system in auditory perception and its close interdependency with temporal attention. His domain of expertise encompasses brain imaging, advanced signal processing and psychophysics.
Talk title
Motor origin of temporal predictions in auditory attention
Talk summary
Temporal predictions are fundamental instruments for facilitating sensory selection, allowing humans to exploit regularities in the world. It is proposed that the motor system instantiates predictive timing mechanisms, helping to synchronize temporal fluctuations of attention with the timing of events in a task-relevant stream. I will present a neurophysiological account for this theory in a paradigm where participants track a slow reference beat while extracting auditory target tones delivered on-beat and interleaved with distractors. At the behavioral level I will show that overt rhythmic movements sharpen the temporal selection of auditory stimuli, thereby improving performance. Capitalizing on magnetoencephalography recordings I will provide evidence that temporal predictions are reflected in Beta-band (~20Hz) energy fluctuations in sensorimotor cortex and modulate the encoding of auditory information in bilateral auditory and fronto-parietal regions. Together, these findings are compatible with Active Sensing theories, which emphasize the prominent role of motor activity in sensory processing.
![]() Pascal Perrier |
![]() Amélie Rochet-Capellan |
Pascal Perrier is a professor at Grenoble-INP, where he teaches signal processing and speech processing, and a researcher at Gipsa-lab. His research aims at a better understanding of the control of speech production in adults and children, in relation to the constraints imposed by the language and in interaction with speech perception process. For this he associates experimental work, involving in particular perturbation paradigms, with modeling work, involving in particular the design and use of biomechanical models of the orofacial sphere, in order to better disentangle in speech signals what emerges from the physics of the production system and what is explicitly controlled by the central nervous system.
Amélie Rochet-Capellan is a full-time researcher at CNRS in the GIPSA Laboratory in Grenoble, France. She has a multidisciplinary training in cognitive, computer and language sciences. She worked as a post-doc on speech motor control at McGill University in Montreal and on breathing at ZAS in Berlin. Her research focuses on the link between movements and cognition. She connects linguistics and motor control to address the properties of speech, hand and breathing movements, and their relationships to language and communication. She is particularly interested in speech adaptation and learning as an empirical path towards language understanding, and as a way to improve speech rehabilitation in specific population such as speakers with Down syndrome.
Talk title
Altering the motor-sensory loop to understand speech motor control.
Talk summary
Basic issues concerning speech motor control are related to the sensory and time specification of the motor goals and to the way humans learn to produce and to adapt them to variable conditions of production. Observing speech development in children is certainly a way to tackle these issues, with a limitation though linked to the fact that experimental observations are likely to reflect also language development. An alternative approach is to induce adaptation and learning in adults who master their language. This can be done using perturbation paradigms, in which the link between motor commands and their sensory consequences are altered. In this presentation we will review a number of experimental results using such perturbation paradigms, carried out in our group or extracted from the literature suggesting that (1) speech production is primarily guided by acoustic specifications; (2) somatosensory feedback introduces important constrains on speech motor control for consonants and for vowels, and can influence the achievement of the goals in the auditory domain and even alter them; (3) speech motor learning involves the generation of local models of the relation between motor commands and multi-modal sensory inputs; (4) speech motor learning is influenced by language (phonological and lexical) units.
David Poeppel
David Poeppel is the Director of the Department of Neuroscience at the Max-Planck-Institute (MPIEA) in Frankfurt, Germany, and a Professor of Psychology and Neural Science at NYU. Trained at MIT in cognitive science, linguistics, and neuroscience, Poeppel did his post-doctoral training at the University of California San Francisco, where he focused on functional brain imaging. Until 2008, he was a professor at the University of Maryland College Park, where he ran the Cognitive Neuroscience of Language laboratory. He has been a Fellow at the Wissenschaftskolleg (Institute for Advanced Studies Berlin), the American Academy Berlin, and a guest professor at many institutions. He is a Fellow of the American Association for the Advancement of Science.
Talk title
Listening to speech induces coupling between auditory and motor cortices in an unexpectedly rate-restricted manner
M. Florencia Assaneo, David Poeppel
Talk summary
The relation between perception and action remains a fundamental question for neuroscience. In the context of speech, existing data suggest an interaction between auditory and speech-motor cortices, but the underlying mechanisms remain incompletely characterized. We fill a basic gap in our understanding of the sensorimotor processing of speech by examining the synchronization between auditory and speech-motor regions over different speech rates, a fundamental parameter delimiting successful perception. First, using MEG we measure synchronization between auditory and speech-motor regions while participants listen to syllables at various rates. We show, surprisingly, that auditory-motor synchrony is significant only over a restricted range and is enhanced at ~4.5 Hz, a value compatible with the mean syllable rate across languages. Second, neural modeling reveals that this modulated coupling plausibly emerges as a consequence of the underlying neural architecture. The findings suggest that the auditory-motor interaction should be interpreted rather conservatively when considering phase space.
![]() Jean-Luc Schwartz |
![]() Julien Diard |
Jean-Luc Schwartz studies perceptual processing, perceptuo-motor interactions, audiovisual speech perception, phonetic bases of phonological systems and the emergence of language, with publications in cognitive psychology, neurosciences, signal processing and computational modelling and phonetics in relation with phonology. Research Director at CNRS, he has been leading ICP (Institut de la Communication Parlée, Grenoble France) from 2003 to 2006 and participated to the creation of GIPSA-lab. He is the PI of an ERC Advanced Grant called “Speech Unit(e)s - The multisensory-motor unity of speech”.
Julien DIARD is interested in Bayesian algorithmic modeling of cognitive functions, such as reading, writing, speech perception and production, and attentional control. In this modeling framework, models are probability distributions, which can be structured in an arbitrarily complex manner. Therefore, contrary to the current trend of Bayesian modeling as “optimal modeling”, his focus in on Bayesian models at the algorithmic level of Marr’s hierarchy. Since 2005, he is a CNRS researcher, at the Laboratoire de Psychologie et NeuroCognition (Grenoble, France). He has authored more than 50 peer-reviewed publications, has defended his habilitation in 2015, has been involved in several international (e.g. FP6, FP7, ERC European projects) and national projects (e.g. ANR, PIA E- FRAN project), and his past and current supervised students include 9 PhD and 15 MSc students.
Talk title
COSMO, a Bayesian computational model of perceptuo-motor interactions in speech communication
Talk summary
We present COSMO (Communicating Objects using Sensory-Motor Operations), a computational model enabling to analyze the functional role of sensory-motor interactions in speech perception and in speech production. We present three properties of COSMO in speech perception, respectively called redundancy, complementarity (with the “auditory-narrowband versus motor-wideband” framework) and “specificity” (according to which auditory cues would be more efficient for vowel decoding and motor cues for plosive articulation decoding). We sketch a possible neuroanatomical architecture for COSMO, and we capitalize on properties of the auditory vs. motor decoders to address various neurocognitive studies of the literature. We conclude on the interest of combining a complementary exogenous decoding system, optimally fitted on the environment stimuli, with an endogenous decoding system, equipped with generative properties.
Jeremy I. Skipper
Face-to-face communication is accompanied by an abundance of contextual information relevant to understanding, including both sensory information external to the listener (e.g., observed mouth movements and co-speech gestures) and knowledge or expectations internal to the listener (e.g., discourse context). Most behavioral and neurobiological research of language, however, discards context in favor of studying isolated speech sounds or words. In contrast, the long-term objective of Jeremy Skipper’s research is to understand the neural mechanisms of communication in real-world social settings in which the brain evolved, develops, and normally functions. This research is guided by a theoretical model of communication in which the brain actively makes use of context to aid in speech perception and language comprehension by using this information to generate predictions about forthcoming sensory patterns to constrain linguistic interpretation. He combines novel analysis techniques with behavioral and neuroimaging methods, including functional magnetic resonance imaging (fMRI) and source localized magneto- (MEG) and electroencephalography (EEG), to test and continue to elaborate this model. By doing so, his research has resulted in theoretical advances with respect to understanding how the brain makes use of naturally occurring context and methodological advances that permit the analysis multimodal data resulting from real-world stimuli.
Talk title
A core speech circuit between primary motor, somatosensory, and auditory cortex: Evidence from connectivity and genetic descriptions
Talk summary
What adaptations allow humans to produce and perceive speech so effortlessly? I will show that speech is supported by a largely undocumented core of structural and functional connectivity between the central sulcus (CS or primary motor and somatosensory cortex) and the transverse temporal gyrus (TTG or primary auditory cortex). Anatomically, I show that CS and TTG cortical thickness covary across individuals and that they are connected by white matter tracts. Neuroimaging network analyses confirm the functional relevance and specificity of these structural relationships. Specifically, the CS and TTG are functionally connected at rest, during natural audiovisual speech perception, and are coactive over a large variety of linguistic stimuli and tasks. Importantly, across structural and functional analyses, connectivity of regions immediately adjacent to the TTG are with premotor and prefrontal regions rather than the CS. Finally, I will show that this structural/functional CS-TTG relationship is mediated by a constellation of genes associated with vocal learning and disorders of efference copy. I suggest that this core circuit constitutes an interface for rapidly exchanging articulatory and acoustic information and discuss implications for current models of speech.