Results
2021
Theses
- titre
- Complexité et contrôle du geste linguo-palatal sous l'éclairage de sa variabilité. Le cas de la palatalisation en russe. Aspects phonétiques et phonologiques.
- auteur
- Ekaterina Biteeva Lecocq
- article
- Sciences de l'Homme et Société. Université Grenoble alpes, 2021. Français. ⟨NNT : ⟩
- resume
- La caractérisation du geste lingual et la question de son contrôle dans la production des consonnes [+palatal] font toujours l’objet de débat en phonétique et en phonologie. Nous avons examiné ici le geste linguo-palatal en nous attachant à décrire et à analyser sa variabilité dans la réalisation de différents types de consonne, en suggérant qu’en tant que geste primaire ou secondaire, il pouvait rendre compte du concept de complexité articulatoire et refléter des aspects du contrôle moteur de la langue dans la parole. Outre le type consonantique, les facteurs de variation que nous avons considérés sont le locuteur, la position de l’accent et la structure syllabique. Le russe a été choisi pour le rendement exceptionnel du contraste [±palatal] dans cette langue. Dans une approche expérimentale multitechnique, 3 études complémentaires ont été réalisées, en laboratoire et sur le terrain. La 1re a consisté à acquérir des données d’articulographie électromagnétique afin de mesurer la position et le timing de 4 zones linguales. La 2e a consisté à recueillir, sur leur lieu de vie en Russie, des données de locuteurs natifs monolingues, en utilisant l'imagerie par ultrasons, afin de mieux cerner la variabilité du geste linguo-palatal dans les dimensions diatopique et individuelle. La 3e est une analyse d’un corpus d’images d’IRM constitué des productions d’une locutrice qui a participé aux deux autres études. Le volet expérimental est complété d’une étude de la phonotaxe du russe à partir de données de première main qui a permis d’actualiser les tendances distributionnelles des consonnes [+palatal]. Elle confirme que les palatalisées sont plus marquées que les non palatalisées et suggère des degrés de marquage en fonction du lieu d'articulation et de la position de la consonne dans la syllabe. Les résultats expérimentaux nous ont permis de proposer une définition du geste linguo-palatal impliqué dans la réalisation du contraste palatalisé vs non palatalisé indépendamment des facteurs de variabilité observés et de discuter les différences de réalisation entre labiales, coronales et vélaires. Ils montrent aussi que les coronales palatalisées sont produites à partir d’une superposition de deux constrictions, primaire et secondaire, formées successivement lorsque la consonne est en attaque de syllabe accentuée ou atone. Si aucun effet de l’accent ou de la position sur la forme du contour lingual n'a été trouvé, un déplacement plus global de la langue a été observé en syllabe atone et en coda alors qu’une activation séquentielle des différentes zones linguales a pu être montrée en position d’attaque et en syllabe tonique. Ce résultat suggère un contrôle moteur différent en fonction de l’accent et de la position dans la syllabe. Toujours concernant la variation liée à la position des palatalisées dans la syllabe, le geste lingual semble présenter des caractéristiques spécifiques à une langue donnée. En revanche, d’éventuels effets sur l’articulation linguale liés à la dimension diatopique ou au contact linguistique des locuteurs n’ont pas pu être identifiés contrairement aux résultats d’études antérieures. La palatalisation recrutant les « muscles vocaliques », nos résultats montrent qu’elle bloque la coarticulation avec les voyelles adjacentes. La résistance à la coarticulation serait un facteur explicatif de l’apparente stabilité du système consonantique du russe malgré sa complexité articulatoire et son caractère marqué. La définition de classe naturelle unique pour les consonnes [+palatal] à partir des résultats obtenus a été mise à l’épreuve dans les cadres (1) de la Phonologie Articulatoire pour proposer une modélisation des coordinations gestuelles observées (2) de la Théorie des Éléments pour proposer une modélisation du système phonologique du russe qui unifie les segments palatalisées, palato-alvéolaires et le glide palatal à partir de la primitive |I|, résonance palatale qui peut être supplémentaire à une autre résonance.
- typdoc
- Theses
- Accès au texte intégral et bibtex
-
2020
Journal articles
- titre
- Contribution of sensory memory to speech motor learning
- auteur
- Takayuki Ito, Jiachuan Bai, David J Ostry
- article
- Journal of Neurophysiology, 2020, 124 (4), pp.1103-1109. ⟨10.1152/jn.00457.2020⟩
- resume
- Speech learning requires precise motor control but it likewise requires transient storage of information to enable the adjustment of upcoming movements based on the success or failure of previous attempts. The contribution of somatic sensory memory for limb position has been documented in work on arm movement, however, in speech, the sensory support for speech production comes both from somatosensory and auditory inputs and accordingly sensory memory for either or both of sounds and somatic inputs might contribute to learning. In the present study, adaptation to altered auditory feedback was used as an experimental model of speech motor learning. Participants also underwent tests of both auditory and somatic sensory memory. We found that although auditory memory for speech sounds is better than somatic memory for speech-like facial skin deformations, somatic sensory memory predicts adaptation, whereas auditory sensory memory does not. Thus, even though speech relies substantially on auditory inputs and in the present manipulation adaptation requires the minimization of auditory error, it is somatic inputs that provide the memory support for learning.
- typdoc
- Journal articles
- DOI
- DOI : 10.1152/jn.00457.2020
- Accès au bibtex
-
- titre
- An experimental device for multi-directional somatosensory perturbation and its evaluation in a pilot psychophysical experiment
- auteur
- Rintaro Ogane, Lynda Selila, Takayuki Ito
- article
- Journal of the Acoustical Society of America, 2020, 148 (3), pp.EL279-EL284. ⟨10.1121/10.0001942⟩
- resume
- Somatosensory stimulation associated with facial skin deformation has been developed and efficiently applied in the study of speech production and speech perception. However, the technique is limited to a simplified unidirectional pattern of stimulation, and cannot adapt to realistic stimulation patterns related to multidimensional orofacial gestures. To overcome this issue, a new multi-actuator system is developed enabling one to synchronously deform the facial skin in multiple directions. The first prototype involves stimulation in two directions and its efficiency is evaluated using a temporal order judgement test involving vertical and horizontal facial skin stretches at the sides of the mouth.
- typdoc
- Journal articles
- DOI
- DOI : 10.1121/10.0001942
- Accès au texte intégral et bibtex
-
- titre
- Speaking to a common tune: Between-speaker convergence in voice fundamental frequency in a joint speech production task
- auteur
- Vincent Aubanel, Noël Nguyen
- article
- PLoS ONE, 2020, 15 (5), pp.e0232209. ⟨10.1371/journal.pone.0232209⟩
- resume
- Recent research on speech communication has revealed a tendency for speakers to imitate at least some of the characteristics of their interlocutor's speech sound shape. This phenomenon, referred to as phonetic convergence, entails a moment-to-moment adaptation of the speaker's speech targets to the perceived interlocutor's speech. It is thought to contribute to setting up a conversational common ground between speakers and to facilitate mutual understanding. However, it remains uncertain to what extent phonetic convergence occurs in voice fundamental frequency (F0), in spite of the major role played by pitch, F0's perceptual correlate, as a conveyor of both linguistic information and communicative cues associated with the speaker's social/individual identity and emotional state. In the present work, we investigated to what extent two speakers converge towards each other with respect to variations in F0 in a scripted dialogue. Pairs of speakers jointly performed a speech production task, in which they were asked to alternately read aloud a written story divided into a sequence of short reading turns. We devised an experimental set-up that allowed us to manipulate the speakers' F0 in real time across turns. We found that speakers tended to imitate each other's changes in F0 across turns that were both limited in amplitude and spread over large temporal intervals. This shows that, at the perceptual level, speakers monitor slow-varying movements in their partner's F0 with high accuracy and, at the production level, that speakers exert a very fine-tuned control on their laryngeal vibrator in order to imitate these F0 variations. Remarkably, F0 convergence across turns was found to occur in spite of the large melodic variations typically associated with reading turns. Our study sheds new light on speakers' perceptual tracking of F0 in speech processing, and the impact of this perceptual tracking on speech production.
- typdoc
- Journal articles
- DOI
- DOI : 10.1371/journal.pone.0232209
- Accès au texte intégral et bibtex
-
- titre
- Mapping vowel sounds onto phonemic categories in two regional varieties of French: An ERP study
- auteur
- Jonathan Bucci, Coriandre Emmanuel Vilain, Noël Nguyen, Jean-Luc Schwartz, Sophie Dufour
- article
- Journal of Neurolinguistics, 2020, 54, pp.100891. ⟨10.1016/j.jneuroling.2020.100891⟩
- resume
- This study examines ERP correlates of the different processes associating two phones to one vs. two phonemic categories in two regional varieties of French. Two groups of French listeners are compared, respectively exploiting two regional varieties, with a contrast between the mid-low /ɛ/ and the mid-high /e/ for Northern French (NF) but not for Southern French (SF). It is expected that the competition between the two close categories /e/ vs. /ɛ/ in NF could induce an ERP modulation in the processing of /ɛ/ compared to a phoneme /a/ with no close competitor, serving as control. In contrast, there should be no difference in ERP response in /ɛ/ vs. /a/ in SF, where there is no competition between the close neighbors /e/ and /ɛ/. The participants of the two groups listened to words containing either /ɛ/ or /a/ in a go/no-go semantic categorization task in which the critical /ɛ/ and /a/ words did not require an overt behavioral response. We found a significant difference in the N400 amplitude between the two conditions in the NF but not in the SF variety. The fact that the ERP modulations appear on the N400 component suggests that lexical access is influenced by the regional variety of the speakers.
- typdoc
- Journal articles
- DOI
- DOI : 10.1016/j.jneuroling.2020.100891
- Accès au texte intégral et bibtex
-
- titre
- Speakers are able to categorize vowels based on tongue somatosensation
- auteur
- Jean-François Patri, David J. Ostry, Julien Diard, Jean-Luc Schwartz, Pamela Trudeau-Fisette, Christophe Savariaux, Pascal Perrier
- article
- Proceedings of the National Academy of Sciences of the United States of America, 2020, 117 (1), pp.6255-6263. ⟨10.1073/pnas.1911142117⟩
- resume
- Auditory speech perception enables listeners to access phonological categories from speech sounds. During speech production and speech motor learning, speakers’ experience matched auditory and somatosensory input. Accordingly, access to phonetic units might also be provided by somatosensory information. The present study assessed whether humans can identify vowels using somatosensory feedback, without auditory feedback. A tongue-positioning task was used in which participants were required to achieve different tongue postures within the /e, ε, a/ articulatory range, in a procedure that was totally nonspeech like, involving distorted visual feedback of tongue shape. Tongue postures were measured using electromagnetic articulography. At the end of each tongue-positioning trial, subjects were required to whisper the corresponding vocal tract configuration with masked auditory feedback and to identify the vowel associated with the reached tongue posture. Masked auditory feedback ensured that vowel categorization was based on somatosensory feedback rather than auditory feedback. A separate group of subjects was required to auditorily classify the whispered sounds. In addition, we modeled the link between vowel categories and tongue postures in normal speech production with a Bayesian classifier based on the tongue postures recorded from the same speakers for several repetitions of the /e, ε, a/ vowels during a separate speech production task. Overall, our results indicate that vowel categorization is possible with somatosensory feedback alone, with an accuracy that is similar to the accuracy of the auditory perception of whispered sounds, and in congruence with normal speech articulation, as accounted for by the Bayesian classifier.
- typdoc
- Journal articles
- DOI
- DOI : 10.1073/pnas.1911142117
- Accès au texte intégral et bibtex
-
- titre
- Orofacial somatosensory inputs modulate word segmentation in lexical decision
- auteur
- Rintaro Ogane, Jean-Luc Schwartz, Takayuki Ito
- article
- Cognition, 2020, 197, pp.104163. ⟨10.1016/j.cognition.2019.104163⟩
- resume
- There is accumulating evidence that articulatory/motor knowledge plays a role in phonetic processing, such as the recent finding that orofacial somatosensory inputs may influence phoneme categorization. We here show that somatosensory inputs also contribute at a higher level of the speech perception chain, that is, in the context of word segmentation and lexical decision. We carried out an auditory identification test using a set of French phrases consisting of a definite article “la” followed by a noun, which may be segmented differently according to the placement of accents within the phrase. Somatosensory stimulation was applied to the facial skin at various positions within the acoustic utterances corresponding to these phrases, which had been recorded with neutral accent, that is, with all syllables given similar emphasis. We found that lexical decisions reflecting word segmentation were significantly and systematically biased depending on the timing of somatosensory stimulation. This bias was not induced when somatosensory stimulation was applied to the skin other than on the face. These results provide evidence that the orofacial somatosensory system contributes to lexical perception in situations that would be disambiguated by different articulatory movements, and suggests that articulatory/motor knowledge might be involved in speech segmentation.
- typdoc
- Journal articles
- DOI
- DOI : 10.1016/j.cognition.2019.104163
- Accès au texte intégral et bibtex
-
Poster communications
- titre
- Orofacial somatosensory inputs enhance speech intelligibility in noisy environments
- auteur
- Rintaro Ogane, Jean-Luc Schwartz, Takayuki Ito
- article
- ISSP 2020 - 12th International Seminar on Speech Production, Dec 2020, Providence (virtual), United States
- typdoc
- Poster communications
- Accès au texte intégral et bibtex
-
2019
Journal articles
- titre
- Assessing the representation of phonological rules by a production study of non-words in Coratino
- auteur
- Jonathan Bucci, Paolo Lorusso, Silvain Gerber, Mirko Grimaldi, Jean-Luc Schwartz
- article
- Phonetica, 2019, ⟨10.1159/000504452⟩
- resume
- Phonological regularities in a given language can be described as a set of for- mal rules applied to logical expressions (e.g., the value of a distinctive feature) or alternatively as distributional properties emerging from the phonetic substance. An indirect way to assess how phonology is represented in a speaker’s mind consists in testing how phonological regularities are transferred to non-words. This is the objective of this study, focusing on Coratino, a dialect from southern Italy spoken in the Apulia region. In Coratino, a complex process of vowel reduction operates, transforming the /i e ɛ u o ɔ a/ system for stressed vowels into a system with a smaller number of vowels for unstressed configurations, characterized by four ma- jor properties: (1) all word-initial vowels are maintained, even unstressed; (2) /a/ is never reduced, even unstressed; (3) unstressed vowels /i e ɛ u o ɔ/ are protected against reduction when they are adjacent to a consonant that shares articulation (labiality and velarity for /u o ɔ/ and palatality for /i e ɛ/); (4) when they are reduced, high vowels are reduced to /ɨ/ and mid vowels to /ə/. A production experiment was carried out on 19 speakers of Coratino to test whether these properties were dis- played with non-words. The production data display a complex pattern which seems to imply both explicit/formal rules and distributional properties transferred statistically to non-words. Furthermore, the speakers appear to vary considerably in how they perform this task. Altogether, this suggests that both formal rules and distributional principles contribute to the encoding of Coratino phonology in the speaker’s mind.
- typdoc
- Journal articles
- DOI
- DOI : 10.1159/000504452
- Accès au texte intégral et bibtex
-
- titre
- Which way to the dawn of speech? Reanalyzing half a century of debates and data in light of speech science
- auteur
- Louis-Jean Boë, Thomas R. Sawallis, Joël Fagot, Pierre Badin, Guillaume Barbier, Guillaume Captier, Lucie Ménard, Jean-Louis Heim, Jean-Luc Schwartz
- article
- Science Advances , 2019, Science Advances, 5 (12), pp.eaaw3916. ⟨10.1126/sciadv.aaw3916⟩
- resume
- Recent articles on primate articulatory abilities are revolutionary regarding speech emergence, a crucial aspect of language evolution, by revealing a human-like system of proto-vowels in nonhuman primates and implicitly throughout our hominid ancestry. This article presents both a schematic history and the state of the art in primate vocalization research and its importance for speech emergence. Recent speech research advances allowmore incisive comparison of phylogeny and ontogeny and also an illuminating reinterpretation of vintage primate vocalization data. This review produces three major findings. First, even among primates, laryngeal descent is not uniquely human. Second, laryngeal descent is not required to produce contrasting formant patterns in vocalizations. Third, living nonhuman primates produce vocalizations with contrasting formant patterns. Thus, evidence now overwhelmingly refutes the longstanding laryngeal descent theory, which pushes back “the dawn of speech” beyond ~200 ka ago to over ~20 Ma ago, a difference of two orders of magnitude.
- typdoc
- Journal articles
- DOI
- DOI : 10.1126/sciadv.aaw3916
- Accès au texte intégral et bibtex
-
- titre
- Transfer of sensorimotor learning reveals phoneme representations in preliterate children
- auteur
- Tiphaine Caudrelier, Lucie Ménard, Pascal Perrier, Jean-Luc Schwartz, Silvain Gerber, Camille Vidou, Amélie Rochet-Capellan
- article
- Cognition, 2019, 192, pp.103973. ⟨10.1016/j.cognition.2019.05.010⟩
- resume
- Reading acquisition is strongly intertwined with phoneme awareness that relies on implicit phoneme representations. We asked whether phoneme representations emerge before literacy. We recruited two groups of children, 4 to 5-year-old preschoolers (N = 29) and 7 to 8-year-old schoolchildren (N = 24), whose phonological awareness was evaluated, and one adult control group (N = 17). We altered speakers' auditory feedback in real time to elicit persisting pronunciation changes, referred to as auditory-motor adaptation or learning. Assessing the transfer of learning at phoneme level enabled us to investigate the developmental time-course of phoneme representations. Significant transfer at phoneme level occurred in preschoolers, as well as schoolchildren and adults. In addition, we found a relationship between auditory-motor adaptation and phonological awareness in both groups of children. Overall, these results suggest that phoneme representations emerge before literacy acquisition, and that these sensorimotor representations may set the ground for phonological awareness.
- typdoc
- Journal articles
- DOI
- DOI : 10.1016/j.cognition.2019.05.010
- Accès au texte intégral et bibtex
-
- titre
- The role of production abilities in the perception of consonant category in infants
- auteur
- Anne Vilain, Marjorie M. Dole, Hélène Loevenbruck, Olivier Pascalis, Jean-Luc Schwartz
- article
- Developmental Science, 2019, 22 (6), pp.e12830. ⟨10.1111/desc.12830⟩
- resume
- The influence of motor knowledge on speech perception is well established, but the func‐tional role of the motor system is still poorly understood. The present study explores the hypothesis that speech production abilities may help infants discover phonetic categories in the speech stream, in spite of coarticulation effects. To this aim, we examined the influ‐ence of babbling abilities on consonant categorization in 6- and 9-month-old infants. Usingan intersensory matching procedure, we investigated the infants’ capacity to associate au‐ditory information about a consonant in various vowel contexts with visual information about the same consonant, and to map auditory and visual information onto a common phoneme representation. Moreover, a parental questionnaire evaluated the infants’ con‐sonantal repertoire. In a first experiment using /b/–/d/ consonants, we found that infants who displayed babbling abilities and produced the /b/ and/or the /d/ consonants in repeti‐tive sequences were able to correctly perform intersensory matching, while non‐babblers were not. In a second experiment using the /v/–/z/ pair, which is as visually contrasted as the /b/–/d/ pair but which is usually not produced at the tested ages, no significant match‐ing was observed, for any group of infants, babbling or not. These results demonstrate, for the first time, that the emergence of babbling could play a role in the extraction of vowel‐independent representations for consonant place of articulation. They have important implications for speech perception theories, as they highlight the role of sensorimotor in‐teractions in the development of phoneme representations during the first year of life
- typdoc
- Journal articles
- DOI
- DOI : 10.1111/desc.12830
- Accès au texte intégral et bibtex
-
- titre
- Modeling sensory preference in speech motor planning: a Bayesian modeling framework
- auteur
- Jean-François Patri, Julien Diard, Pascal Perrier
- article
- Frontiers in Psychology, 2019, 10, pp.Article 2339. ⟨10.3389/fpsyg.2019.02339⟩
- typdoc
- Journal articles
- DOI
- DOI : 10.3389/fpsyg.2019.02339
- Accès au texte intégral et bibtex
-
- titre
- Combining spectral and temporal modification techniques for speech intelligibility enhancement
- auteur
- Martin Cooke, Vincent Aubanel, María Luisa García Lecumberri
- article
- Computer Speech and Language, 2019, 55, pp.26-39. ⟨10.1016/j.csl.2018.10.003⟩
- typdoc
- Journal articles
- DOI
- DOI : 10.1016/j.csl.2018.10.003
- Accès au texte intégral et bibtex
-
- titre
- Cued Speech Enhances Speech-in-Noise Perception
- auteur
- Clémence Bayard, Laura Machart, Antje Strauss, Silvain Gerber, Vincent Aubanel, Jean-Luc Schwartz
- article
- Journal of Deaf Studies and Deaf Education, 2019, 24 (3), pp.223-233. ⟨10.1093/deafed/enz003⟩
- resume
- Speech perception in noise remains challenging for Deaf/Hard of Hearing people (D/HH), even fitted with hearing aids or cochlear implants. The perception of sentences in noise by 20 implanted or aided D/HH subjects mastering Cued Speech (CS), a system of hand gestures complementing lip movements, was compared with the perception of 15 typically hearing (TH) controls in three conditions: audio only, audiovisual and audiovisual + CS. Similar audiovisual scores were obtained for signal-to-noise ratios (SNRs) 11dB higher in D/HH participants compared with TH ones. Adding CS information enabled D/HH participants to reach a mean score of 83% in the audiovisual + CS condition at a mean SNR of 0 dB, similar to the usual audio score for TH participants at this SNR. This confirms that the combination of lipreading and Cued Speech system remains extremely important for persons with hearing loss, particularly in adverse hearing conditions.
- typdoc
- Journal articles
- DOI
- DOI : 10.1093/deafed/enz003
- Accès au texte intégral et bibtex
-
- titre
- Computer simulations of coupled idiosyncrasies in speech perception and speech production with COSMO, a perceptuo-motor Bayesian model of speech communication
- auteur
- Marie-Lou Barnaud, Jean-Luc Schwartz, Pierre Bessière, Julien Diard
- article
- PLoS ONE, 2019, 14 (1), pp.e0210302. ⟨10.1371/journal.pone.0210302⟩
- resume
- The existence of a functional relationship between speech perception and production systems is now widely accepted, but the exact nature and role of this relationship remains quite unclear. The existence of idiosyncrasies in production and in perception sheds interesting light on the nature of the link. Indeed, a number of studies explore inter-individual variability in auditory and motor prototypes within a given language, and provide evidence for a link between both sets. In this paper, we attempt to simulate one study on coupled idiosyncrasies in the perception and production of French oral vowels, within COSMO, a Bayesian computational model of speech communication. First, we show that if the learning process in COSMO includes a communicative mechanism between a Learning Agent and a Master Agent, vowel production does display idiosyncrasies. Second, we implement within COSMO three models for speech perception that are, respectively, auditory, motor and per-ceptuo-motor. We show that no idiosyncrasy in perception can be obtained in the auditory model, since it is optimally tuned to the learning environment, which does not include the motor variability of the Learning Agent. On the contrary, motor and perceptuo-motor models provide perception idiosyncrasies correlated with idiosyncrasies in production. We draw conclusions about the role and importance of motor processes in speech perception, and propose a perceptuo-motor model in which auditory processing would enable optimal processing of learned sounds and motor processing would be helpful in unlearned adverse conditions.
- typdoc
- Journal articles
- DOI
- DOI : 10.1371/journal.pone.0210302
- Accès au texte intégral et bibtex
-
Conference papers
- titre
- Posture stabilization of the tongue for speech: responses to mechanical perturbation
- auteur
- Takayuki Ito, Jean-Loup Caillet, Pascal Perrier
- article
- ICPhS 2019 - 19th International Congress of Phonetic Sciences, Aug 2019, Melbourne, Australia
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
Poster communications
- titre
- Orofacial Somatosensory Effects for the Word Segmentation Judgement
- auteur
- Rintaro Ogane, Jean-Luc Schwartz, Takayuki Ito
- article
- ICPhS 2019 - 19th International Congress of Phonetic Sciences (ICPhS 2019), Aug 2019, Melbourne, Australia. 2019
- resume
- Word segmentation is one of the initial processes for lexical perception. While visual inputs can help it in acoustically ambiguous situations, the effect of orofacial somatosensory inputs to this process is unknown. We here tested how orofacial somatosensory inputs affect word segmentation for lexical perception. We carried out identification tests using a French phrase consisting of a definitive and a noun, segmented differently according to the place of the accents in the phrase. In the test applying somatosensory stimulation at various timings along the phrase with neutral accent, we found that the lexical perception was significantly and systematically biased depending on the somatosensory stimulus timing. This bias effect was not seen when two somatosensory stimuli were applied to emphasize one accent position rather than the other by changing force amplitude between two positions. The results show and quantify the role the orofacial somatosensory system plays in lexical perception.
- typdoc
- Poster communications
- Accès au texte intégral et bibtex
-
2018
Journal articles
- titre
- Vowel Reduction in Coratino (South Italy): Phonological and Phonetic Perspectives
- auteur
- Jonathan Bucci, Pascal Perrier, Silvain Gerber, Jean-Luc Schwartz
- article
- Phonetica, 2018, ⟨10.1159/000490947⟩
- resume
- Vowel reduction may involve phonetic reduction processes, with non reached targets, and/or phonological processes in which a vowel target is changed for another target, possibly schwa. Coratino, a dialect of southern Italy, displays complex vowel reduction processes assumed to be phonological. We analyzed a corpus representative of vowel reduction in Coratino, based on a set of a hundred pairs of words contrasting a stressed and an unstressed version of a given vowel in a given consonant environment, produced by 10 speakers. We report vowel formants together with consonant-to-vowel formant trajectories and durations, and show that these data are rather in agreement with a change in vowel target from /i e ɛ o ɔ u/ to schwa when the vowel is a non-word-initial unstressed utterance, unless the vowel shares a place-of-articulation feature with the preceding or following consonant. Interestingly, it also appears that there are 2 targets for phonological reduction, differing in F1 values. A “higher schwa” – which could be considered as /ɨ/ – corresponds to reduction for high vowels /i u/ while a “lower schwa” – which could be considered as /ə/ – corresponds to reduction for mid- high and mid-low vowels /e ɛ o ɔ/. /a/ is probably not affected by phonological reduction, possibly due to the longer duration of the consonant to vowel trajectory for low vowels. Altogether, the Coratino vowel system appears to evolve from a 7-vowel system /i e ɛ o ɔ u a/ for stressed configurations to a 3-vowel system /ɨ ə a/ for the most reduced configurations.
- typdoc
- Journal articles
- DOI
- DOI : 10.1159/000490947
- Accès au bibtex
-
- titre
- Transfer of Learning: What Does It Tell Us About Speech Production Units?
- auteur
- Tiphaine Caudrelier, Jean-Luc Schwartz, Pascal Perrier, Silvain Gerber, Amélie Rochet-Capellan
- article
- Journal of Speech, Language, and Hearing Research, 2018, 61 (7), pp.1613-1625. ⟨10.1044/2018_JSLHR-S-17-0130⟩
- typdoc
- Journal articles
- DOI
- DOI : 10.1044/2018_JSLHR-S-17-0130
- Accès au texte intégral et bibtex
-
- titre
- Temporal factors in cochlea-scaled entropy and intensity-based intelligibility predictions
- auteur
- Vincent Aubanel, Martin Cooke, Chris Davis, Jeesun Kim
- article
- Journal of the Acoustical Society of America, 2018, 143 (6), pp.EL443 - EL448. ⟨10.1121/1.5041468⟩
- resume
- Cochlea-scaled entropy (CSE) was proposed as a signal-based metric for automatic detection of speech regions most important for intelligibility, but its proposed superiority over traditional linguistic and psychoacoustical characterisations was not subsequently confirmed. This paper shows that the CSE concept is closely related to intensity and as such captures similar speech regions. However, a slight but significant advantage of a CSE over an intensity-based characterisation was observed, associated with a time difference between the two metrics, suggesting that the CSE index may capture dynamical properties of the speech signal crucial for intelligibility.
- typdoc
- Journal articles
- DOI
- DOI : 10.1121/1.5041468
- Accès au texte intégral et bibtex
-
- titre
- Audiovisual Binding for Speech Perception in Noise and in Aging
- auteur
- Attigodu Chandrashekara Ganesh, Frédéric Berthommier, Jean-Luc Schwartz
- article
- Language Learning, 2018, 68 (S1), pp.193-220. ⟨10.1111/lang.12271⟩
- resume
- Speech Perception involves fusion of multiple sensory input and it doesn’t fuse automatically, perhaps it depends on numerous external/internal factors (e.g. attention, noise or age). In this paper, we exploit a specific paradigm in which a short audiovisual context made of coherent or incoherent speech material is displayed before an incongruent audiovisual target likely to provide fusion (McGurk effect, McGurk & MacDonald, 1976). We confirm that incoherent context leads to unbinding, that is a reduction in the amount of fusion. Importantly, adding acoustic noise in the context though not in the target increases fusion. This suggests that listeners systematically evaluate the reliability of their sensory channels and weight them accordingly in the fusion process. We also show that older subjects display more unbinding, and discuss the potential consequences concerning their ability to understand speech in adverse conditions. We relate all these data to a “Binding-and-Fusion” model of audiovisual speech perception.
- typdoc
- Journal articles
- DOI
- DOI : 10.1111/lang.12271
- Accès au texte intégral et bibtex
-
- titre
- Does the Visual Channel Improve the Perception of Consonants Produced by Speakers of French With Down Syndrome?
- auteur
- Alexandre Hennequin, Amélie Rochet-Capellan, Silvain Gerber, Marion Dohen
- article
- Journal of Speech, Language, and Hearing Research, 2018, 61 (4), pp.957-972. ⟨10.1044/2017_JSLHR-H-17-0112⟩
- resume
- Purpose: This work evaluates whether seeing the speaker's face could improve the speech intelligibility of adults with Down syndrome (DS). This is not straightforward because DS induces a number of anatomical and motor anomalies affecting the orofacial zone. Method: A speech-in-noise perception test was used to evaluate the intelligibility of 16 consonants (Cs) produced in a vowel–consonant–vowel context (Vo = /a/) by 4 speakers with DS and 4 control speakers. Forty-eight naïve participants were asked to identify the stimuli in 3 modalities: auditory (A), visual (V), and auditory–visual (AV). The probability of correct responses was analyzed, as well as AV gain, confusions, and transmitted information as a function of modality and phonetic features. Results: The probability of correct response follows the trend AV > A > V, with smaller values for the DS than the control speakers in A and AV but not in V. This trend depended on the C: the V information particularly improved the transmission of place of articulation and to a lesser extent of manner, whereas voicing remained specifically altered in DS. Conclusions: The results suggest that the V information is intact in the speech of people with DS and improves the perception of some phonetic features in Cs in a similar way as for control speakers. This result has implications for further studies, rehabilitation protocols, and specific training of caregivers.
- typdoc
- Journal articles
- DOI
- DOI : 10.1044/2017_JSLHR-H-17-0112
- Accès au texte intégral et bibtex
-
- titre
- Electrophysiological evidence for Audio-visuo-lingual speech integration
- auteur
- Avril Treille, Coriandre Emmanuel Vilain, Jean-Luc Schwartz, Thomas Hueber, Marc Sato
- article
- Neuropsychologia, 2018, 109, pp.126-133. ⟨10.1016/j.neuropsychologia.2017.12.024⟩
- resume
- Recent neurophysiological studies demonstrate that audio-visual speech integration partly operates through temporal expectations and speech-specific predictions. From these results, one common view is that the binding of auditory and visual, lipread, speech cues relies on their joint probability and prior associative audio-visual experience. The present EEG study examined whether visual tongue movements integrate with relevant speech sounds, despite little associative audio-visual experience between the two modalities. A second objective was to determine possible similarities and differences of audio-visual speech integration between unusual audio-visuo-lingual and classical audio-visuo-labial modalities. To this aim, participants were presented with auditory, visual, and audio-visual isolated syllables, with the visual presentation related to either a sagittal view of the tongue movements or a facial view of the lip movements of a speaker, with lingual and facial movements previously recorded by an ultrasound imaging system and a video camera. In line with previous EEG studies, our results revealed an amplitude decrease and a latency facilitation of P2 auditory evoked potentials in both audio-visual-lingual and audio-visuo-labial conditions compared to the sum of unimodal conditions. These results argue against the view that auditory and visual speech cues solely integrate based on prior associative audio-visual perceptual experience. Rather, they suggest that dynamic and phonetic informational cues are sharable across sensory modalities, possibly through a cross-modal transfer of implicit articulatory motor knowledge.
- typdoc
- Journal articles
- DOI
- DOI : 10.1016/j.neuropsychologia.2017.12.024
- Accès au bibtex
-
- titre
- What drives the perceptual change resulting from speech motor adaptation? Evaluation of hypotheses in a Bayesian modeling framework
- auteur
- Jean-François Patri, Pascal Perrier, Jean-Luc Schwartz, Julien Diard
- article
- PLoS Computational Biology, 2018, 14 (1), pp.e1005942. ⟨10.1371/journal.pcbi.1005942⟩
- resume
- Shifts in perceptual boundaries resulting from speech motor learning induced by perturbations of the auditory feedback were taken as evidence for the involvement of motor functions in auditory speech perception. Beyond this general statement, the precise mechanisms underlying this involvement are not yet fully understood. In this paper we propose a quantitative evaluation of some hypotheses concerning the motor and auditory updates that could result from motor learning, in the context of various assumptions about the roles of the auditory and somatosensory pathways in speech perception. This analysis was made possible thanks to the use of a Bayesian model that implements these hypotheses by expressing the relationships between speech production and speech perception in a joint probability distribution. The evaluation focuses on how the hypotheses can (1) predict the location of perceptual boundary shifts once the perturbation has been removed, (2) account for the magnitude of the compensation in presence of the perturbation, and (3) describe the correlation between these two behavioral characteristics. Experimental findings about changes in speech perception following adaptation to auditory feedback perturbations serve as reference. Simulations suggest that they are compatible with a framework in which motor adaptation updates both the auditory-motor internal model and the auditory characterization of the perturbed phoneme, and where perception involves both auditory and somatosensory pathways. Author summary Experimental evidence suggest that motor learning influences categories in speech perception. These observations are consistent with studies of arm motor control showing that motor learning alters the perception of the arm location in the space, and that these perceptual changes are associated with increased connectivity between regions of the motor cortex. Still, the interpretation of experimental findings is severely handicapped by a lack of precise hypotheses about underlying mechanisms. We reanalyze the results of the most PLOS Computational Biology | https://doi.
- typdoc
- Journal articles
- DOI
- DOI : 10.1371/journal.pcbi.1005942
- Accès au texte intégral et bibtex
-
- titre
- Auditory and Audiovisual Close-shadowing in Post-Lingually Deaf Cochlear-Implanted Patients and Normal-Hearing Elderly Adults
- auteur
- Lucie Scarbel, Denis Beautemps, Jean-Luc Schwartz, Marc Sato
- article
- Ear and Hearing, 2018, 39 (1), pp.139-149. ⟨10.1097/AUD.0000000000000474⟩
- resume
- Objectives: The goal of this study was to determine the impact of auditory deprivation and age-related speech decline on perceptuo-motor abilities during speech processing in post-lingually deaf cochlear-implanted participants and in normal-hearing elderly participants.Design: A close-shadowing experiment was carried out on ten cochlear-implanted patients and ten normal-hearing elderly participants, with two groups of normal-hearing young participants as controls. To this end, participants had to categorize auditory and audiovisual syllables as quickly as possible, either manually or orally. Reaction times and percentages of correct responses were compared depending on response modes, stimulus modalities and syllables. Results: Responses of cochlear-implanted subjects were globally slower and less accurate than those of both young and elderly normal-hearing people. Adding the visual modality was found to enhance performance for cochlear-implanted patients, whereas no significant effect was obtained for the normal-hearing elderly group. Critically, oral responses were faster than manual ones for all groups. In addition, for normal-hearing elderly participants, manual responses were more accurate than oral responses, as was the case for normal-hearing young participants when presented with noisy speech stimuli. Conclusions: Faster reaction times were observed for oral than for manual responses in all groups, suggesting that perceptuo-motor relationships were somewhat successfully functional after cochlear implantation, and remain efficient in the normal-hearing elderly group. These results are in agreement with recent perceptuo-motor theories of speech perception. They are also supported by the theoretical assumption that implicit motor knowledge and motor representations partly constrain auditory speech processing. In this framework, oral responses would have been generated at an earlier stage of a sensorimotor loop, whereas manual responses would appear late, leading to slower but more accurate responses. The difference between oral and manual responses suggests that the perceptuo-motor loop is still effective for normal-hearing elderly subjects, and also for cochlear-implanted participants despite degraded global performance.
- typdoc
- Journal articles
- DOI
- DOI : 10.1097/AUD.0000000000000474
- Accès au texte intégral et bibtex
-
Conference papers
- titre
- Stability in postural tongue control: response to transient mechanical perturbations
- auteur
- Takayuki Ito, Jean-Loup Caillet, Pascal Perrier
- article
- Neuroscience 2018 - Annual meeting of the Society for Neuroscience, Nov 2018, San Diego, Californie, United States
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Investigating the Role of Familiar Face and Voice Cues in Speech Processing in Noise
- auteur
- Jeesun Kim, Sonya Karisma, Vincent Aubanel, Chris Davis
- article
- Interspeech 2018 - 19th Annual Conference of the International Speech Communication Association, Sep 2018, Hyderabad, India. pp.2276-2279, ⟨10.21437/Interspeech.2018-1812⟩
- typdoc
- Conference papers
- DOI
- DOI : 10.21437/Interspeech.2018-1812
- Accès au bibtex
-
- titre
- Picture Naming or Word Reading: Does the Modality Affect Speech Motor Adaptation and Its Transfer?
- auteur
- Tiphaine Caudrelier, Pascal Perrier, Jean-Luc Schwartz, Amélie Rochet-Capellan
- article
- Interspeech 2018 - 19th Annual Conference of the International Speech Communication Association, Sep 2018, Hyderabad, India. pp.956-960, ⟨10.21437/Interspeech.2018-1760⟩
- resume
- Auditory-motor adaptation and transfer paradigms are increasingly used to explore speech motor control as well as phonological representations underlying speech production. Auditory-motor adaptation is generally assumed to occur at the sensory-motor level. However, few studies suggested that linguistic or contextual factors such as the modality of presentation of stimuli influences adaptation. The present study investigates the influence of the modality of stimuli presentation (written word vs. a picture representing the same word) on auditory-motor adaptation and transfer. In this speech production experiment, speakers' auditory feedback was altered online, inducing adaptation. We contrasted the magnitude of adaptation in these two different modalities and we assessed transfer from /pe/ to the French word /epe/ in the same vs. different modality of presentation, using a mixed 2*2 subject design. The magnitude of adaptation was not different between modalities. This observation contrasts with recent findings showing an effect of the modality (a written word vs. a go signal) on adaptation. Moreover, transfer did occur from one modality to the other, and transfer pattern depended on the modality of transfer stimuli. Overall, the results suggest that picture naming and word reading rely on sensory-motor representations that may be linked to contextual (or surface) characteristics.
- typdoc
- Conference papers
- DOI
- DOI : 10.21437/Interspeech.2018-1760
- Accès au texte intégral et bibtex
-
- titre
- Orofacial somatosensory inputs improve speech sound detection in noisy environments
- auteur
- Rintaro Ogane, Jean-Luc Schwartz, Takayuki Ito
- article
- SNL 2018 - 10th Annual Society for the Neurobiology of Language Conference, Aug 2018, Québec City, Canada
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- On the phonological processing of two French varieties
- auteur
- Jonathan Bucci, Dufour Sophie, Jean-Luc Schwartz
- article
- LabPhon 2018 - 16th Conference on Laboratory Phonology (LabPhon16), Jun 2018, Lisbonne, Portugal
- typdoc
- Conference papers
- Accès au bibtex
-
- titre
- Could elements in ET be articulatory-acoustic rather than solely acoustic? Arguments from phonetics and phonology
- auteur
- Jonathan Bucci, Jean-Luc Schwartz
- article
- Elements, Jun 2018, Nantes, France
- typdoc
- Conference papers
- Accès au bibtex
-
- titre
- Beneficial effects of 5-Hz tACS over auditory cortex for speech comprehension in noise
- auteur
- Antje Strauss, Vincent Aubanel, Jean-Luc Schwartz, Anne-Lise S Giraud
- article
- Perceptuo-motor relationships in speech communication, Jan 2018, Genève, Switzerland
- typdoc
- Conference papers
- Accès au bibtex
-
- titre
- Beneficial effects of 5-Hz tACS over auditory cortex for speech comprehension in noise
- auteur
- Antje Strauss, Vincent Aubanel, Anne-Lise S Giraud, Jean-Luc Schwartz
- article
- Perturbing and Enhancing Perception and Action using Oscillatory Neural Stimulation, Jan 2018, Cambridge, United Kingdom
- typdoc
- Conference papers
- Accès au bibtex
-
2017
Journal articles
- titre
- Reanalyzing neurocognitive data on the role of the motor system in speech perception within COSMO, a Bayesian perceptuo-motor model of speech communication
- auteur
- Marie-Lou Barnaud, Pierre Bessière, Julien Diard, Jean-Luc Schwartz
- article
- Brain and Language, 2017, 187, pp.19-32. ⟨10.1016/j.bandl.2017.12.003⟩
- resume
- While neurocognitive data provide clear evidence for the involvement of the motor system in speech perception, its precise role and the way motor information is involved in perceptual decision remain unclear. In this paper, we discuss some recent experimental results in light of COSMO, a Bayesian perceptuo-motor model of speech communication. COSMO enables us to model both speech perception and speech production with probability distributions relating phonological units with sensory and motor variables. Speech perception is conceived as a sensory-motor architecture combining an auditory and a motor decoder thanks to a Bayesian fusion process. We propose the sketch of a neuroanatomical architecture for COSMO, and we capitalize on properties of the auditory vs. motor decoders to address three neurocognitive studies of the literature. Altogether, this computational study reinforces functional arguments supporting the role of a motor decoding branch in the speech perception process .
- typdoc
- Journal articles
- DOI
- DOI : 10.1016/j.bandl.2017.12.003
- Accès au texte intégral et bibtex
-
- titre
- Effects of linear and nonlinear speech rate changes on speech intelligibility in stationary and fluctuating maskers
- auteur
- Martin Cooke, Vincent Aubanel
- article
- Journal of the Acoustical Society of America, 2017, 141 (6), pp.4126-4135. ⟨10.1121/1.4983826⟩
- resume
- Algorithmic modifications to the durational structure of speech designed to avoid intervals of intense masking lead to increases in intelligibility, but the basis for such gains is not clear. The current study addressed the possibility that the reduced information load produced by speech rate slowing might explain some or all of the benefits of durational modifications. The study also investigated the influence of masker stationarity on the effectiveness of durational changes. Listeners identified keywords in sentences that had undergone linear and nonlinear speech rate changes resulting in overall temporal lengthening in the presence of stationary and fluctuating maskers. Relative to unmodified speech, a slower speech rate produced no intelligibility gains for the stationary masker, suggesting that a reduction in information rate does not underlie intelligibility benefits of durationally modified speech. However, both linear and nonlinear modifications led to substantial intelligibility increases in fluctuating noise. One possibility is that overall increases in speech duration provide no new phonetic information in stationary masking conditions, but that temporal fluctuations in the background increase the likelihood of glimpsing additional salient speech cues. Alternatively, listeners may have benefitted from an increase in the difference in speech rates between the target and background.
- typdoc
- Journal articles
- DOI
- DOI : 10.1121/1.4983826
- Accès au texte intégral et bibtex
-
- titre
- The complementary roles of auditory and motor information evaluated in a Bayesian perceptuo-motor model of speech perception
- auteur
- Raphaël Laurent, Marie-Lou Barnaud, Jean-Luc Schwartz, Pierre Bessière, Julien Diard
- article
- Psychological Review, 2017, 124 (5), pp.572-602. ⟨10.1037/rev0000069⟩
- resume
- There is a consensus concerning the view that both auditory and motor representations intervene in the perceptual processing of speech units. However, the question of the functional role of each of these systems remains seldom addressed and poorly understood. We capitalized on the formal framework of Bayesian Programming to develop COSMO (Communicating Objects using Sensory-Motor Operations), an integrative model that allows principled comparisons of purely motor or purely auditory implementations of a speech perception task and tests the gain of efficiency provided by their Bayesian fusion. Here, we show three main results. (i) In a set of precisely defined “perfect conditions”, auditory and motor theories of speech perception are indistinguishable. (ii) When a learning process that mimics speech development is introduced into COSMO, it departs from these perfect conditions. Then auditory recognition becomes more efficient than motor recognition in dealing with learned stimuli, while motor recognition is more efficient in adverse conditions. We interpret this result as a general “auditory-narrowband vs. motor-wideband” property. (iii) Simulations of plosive-vowel syllable recognition reveal possible cues from motor recognition for the invariant specification of the place of plosive articulation in context, that are lacking in the auditory pathway. This provides COSMO with a second property, where auditory cues would be more efficient for vowel decoding and motor cues for plosive articulation decoding. These simulations provide several predictions, which are in good agreement with experimental data and suggest that there is natural complementarity between auditory and motor processing within a perceptuo-motor theory of speech perception.
- typdoc
- Journal articles
- DOI
- DOI : 10.1037/rev0000069
- Accès au texte intégral et bibtex
-
- titre
- The syllable in the light of motor skills and neural oscillations
- auteur
- Antje Strauss, Jean-Luc Schwartz
- article
- Language, Cognition and Neuroscience, 2017, 32 (5), pp.562-569. ⟨10.1080/23273798.2016.1253852⟩
- resume
- Recent advances in neuroscience have brought a great focus on how the auditory cortex tracks speech at certain time scales corresponding to pre-lexical speech units in order to achieve comprehension. In particular, it has been claimed that it is the syllabic rhythm to which slow neural oscillations in the auditory cortex entrain in order to chunk the speech stream into smaller informational units. However, the terms “syllable” and “rhythm” have been treated quite loosely in the current literature. We revisit classic approaches to show that both concepts do not necessarily have an acoustic or phonetic counterpart, which could be directly extracted by neural processes. We would like to suggest that the syllabic rhythm could emerge at the intersection of acoustic–phonetic and motor knowledge of speech. We furthermore propose that nesting of cortical oscillations might be the key mechanism to understand the timing constraints that lead to the emergence of the syllable.
- typdoc
- Journal articles
- DOI
- DOI : 10.1080/23273798.2016.1253852
- Accès au texte intégral et bibtex
-
- titre
- Electrophysiological evidence for a self-processing advantage during audiovisual speech integration
- auteur
- Avril Treille, Coriandre Emmanuel Vilain, Sonia Kandel, Marc Sato
- article
- Experimental Brain Research, 2017, 235 (9), pp.2867-2876. ⟨10.1007/s00221-017-5018-0⟩
- resume
- Previous electrophysiological studies have provided strong evidence for early multisensory integrative mechanisms during audiovisual speech perception. From these studies, one unanswered issue is whether hearing our own voice and seeing our own articulatory gestures facilitate speech perception, possibly through a better processing and integration of sensory inputs with our own sensory-motor knowledge. The present EEG study examined the impact of self-knowledge during the perception of auditory (A), visual (V) and audiovisual (AV) speech stimuli that were previously recorded from the participant or from a speaker he/she had never met. Audiovisual interactions were estimated by comparing N1 and P2 auditory evoked potentials during the bimodal condition (AV) with the sum of those observed in the unimodal conditions (A + V). In line with previous EEG studies, our results revealed an amplitude decrease of P2 auditory evoked potentials in AV compared to A + V conditions. Crucially, a temporal facilitation of N1 responses was observed during the visual perception of self speech movements compared to those of another speaker. This facilitation was negatively correlated with the saliency of visual stimuli. These results provide evidence for a temporal facilitation of the integration of auditory and visual speech signals when the visual situation involves our own speech gestures.
- typdoc
- Journal articles
- DOI
- DOI : 10.1007/s00221-017-5018-0
- Accès au texte intégral et bibtex
-
- titre
- Inside Speech: Multisensory and Modality-specific Processing of Tongue and Lip Speech Actions
- auteur
- Avril Treille, Coriandre Emmanuel Vilain, Thomas Hueber, Laurent Lamalle, Marc Sato
- article
- Journal of Cognitive Neuroscience, 2017, 29 (3), pp.448-466. ⟨10.1162/jocn_a_01057⟩
- resume
- Action recognition has been found to rely not only on sensory brain areas but also partly on the observer's motor system. However, whether distinct auditory and visual experiences of an action modulate sensorimotor activity remains largely unknown. In the present sparse sampling fMRI study, we determined to which extent sensory and motor representations interact during the perception of tongue and lip speech actions. Tongue and lip speech actions were selected because tongue movements of our interlocutor are accessible via their impact on speech acoustics but not visible because of its position inside the vocal tract, whereas lip movements are both " audible " and visible. Participants were presented with auditory, visual, and audiovisual speech actions, with the visual inputs related to either a sagittal view of the tongue movements or a facial view of the lip movements of a speaker, previously recorded by an ultrasound imaging system and a video camera. Although the neural networks involved in visual visuo-lingual and visuo-facial perception largely overlapped, stronger motor and somato-sensory activations were observed during visuo-lingual perception. In contrast, stronger activity was found in auditory and visual cortices during visuo-facial perception. Complementing these findings, activity in the left premotor cortex and in visual brain areas was found to correlate with visual recognition scores observed for visuo-lingual and visuo-facial speech stimuli , respectively, whereas visual activity correlated with RTs for both stimuli. These results suggest that unimodal and multi-modal processing of lip and tongue speech actions rely on common sensorimotor brain areas. They also suggest that visual processing of audible but not visible movements induces motor and visual mental simulation of the perceived actions to facilitate recognition and/or to learn the association between auditory and visual signals.
- typdoc
- Journal articles
- DOI
- DOI : 10.1162/jocn_a_01057
- Accès au texte intégral et bibtex
-
Conference papers
- titre
- Somatosensory information affects word segmentation and perception of lexical information
- auteur
- Rintaro Ogane, Jean-Luc Schwartz, Takayuki Ito
- article
- SNL 2017 - 9th Annual Society for the Neurobiology of Language Conference, Nov 2017, Baltimore, Maryland, United States
- resume
- In the framework of perceptuo-motor theories of speech perception, it has been argued that speech articulation could play a role in speech perception. Indeed, recent finding demonstrates that somatosensory inputs associated with speaking motion changes the perception of speech sounds. However, it is still unknown whether somatosensory effects could go up to the level of lexical access in speech comprehension. Access to lexical information depends on a complex word segmentation process. This study aims to examine whether segmentation and lexical information processing could be changed by somatosensory inputs associated with facial skin deformation. We here focus on “elisions” between definite article and noun in French (e.g. “l’affiche” [the poster] or “la fiche” [the form]), which have the same pronunciation, but can be differentiated by hyper-articulation for the production of the first vowel in each word. If somatosensory information plays a role in lexical information processing, we reasoned that the perception of such sequences could be changed if somatosensory stimulation was applied in an appropriate timing associated to the corresponding speech gesture. To test this, we applied somatosensory perturbation associated with facial skin deformation at different timings along the presentation of the auditory stimulation. We tested native speakers of French who performed a two-alternative forced choice task between the two percepts associated to a single speech sequence. The stimulation sound was presented through headphones within the carrier phrase “C’est ___ (This is ___)”. Participants identified which word was presented (i.e. “affiche” vs. “fiche”) by pressing the left or right arrow button on the keyboard as quickly as possible. We used 17 different ambiguous speech sentences recorded by a native French speaker. Utterances were pronounced neutrally without adding hyper-articulation in any single vowel, that is, removing as much as possible acoustic information for decision. For each audio sentence, we applied a somatosensory stimulation consisting in facial skin stretch perturbation, generated by a robotic device. We varied the time of presentation of this facial skin stretch perturbation, with 8 different temporal positions within an acoustic sentence. These temporal positions were set relative to the timing of the first vowel peak amplitude (i.e. “a” in the previous example), separated by 100 ms intervals from -400 ms (around the vowel in “c’est”) to 300 ms (around the second vowel, i.e. “i” in the previous example). In each combination of audio and somatosensory stimulation, four subject’s responses were recorded, with a total of 544 stimuli (17 sentences * 8 timing conditions * 4 repetitions) presented in a random order. The judgement probability (i.e. percentage of “la fiche” responses) was calculated for each subject and each timing condition. It appears that the percentage of judgement probability was reduced when somatosensory stimulation was ahead of the first vowel (more “l’affiche” responses), and increased when somatosensory stimulation was delayed between the first and second vowel (more “la fiche” responses). This suggests that somatosensory information intervenes in the processing of lexical information, which corresponds to a relatively higher level of processing of speech perception.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Assessing phonological learning in COSMO, a Bayesian model of speech communication
- auteur
- Marie-Lou Barnaud, Jean-Luc Schwartz, Julien Diard, Pierre Bessìère
- article
- ICDL-EpiRob 2017 - Joint IEEE International Conference on Development and Learning and Epigenetic Robotics, Oct 2017, Lisbonne, Portugal
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Modeling sensory preference in speech motor planning
- auteur
- Jean-François Patri, Pascal Perrier, Julien Diard
- article
- ISSP 2017 - 11th International Seminar on Speech Production, Oct 2017, Tianjin, China
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Perceptuo-motor speech units in the brain with COSMO, a Bayesian model of communication
- auteur
- Marie-Lou Barnaud, Julien Diard, Pierre Bessière, Jean-Luc Schwartz
- article
- ISSP 2017 - 11th International Seminar on Speech Production, Oct 2017, Tianjin, China
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- A phonological and phonetic approach of vowel reduction in Coratino: where is schwa in the acoustic signal?
- auteur
- Jonathan Bucci, Jean-Luc Schwartz
- article
- 50th Societas Linguistica Europaea, Sep 2017, Zurich, Switzerland
- typdoc
- Conference papers
- Accès au bibtex
-
- titre
- Contribution of visual rhythmic information to speech perception in noise
- auteur
- Vincent Aubanel, Cassandra Masters, Jeesun Kim, Chris Davis
- article
- AVSP 2017 - 14th International Conference on Auditory-Visual Speech Processing, Aug 2017, Stockholm, Sweden
- resume
- Visual speech information helps listeners perceive speech in noise. The cues underpinning this visual advantage appear to be global and distributed, and previous research hasn't succeeded in pinning down simple dimensions to explain the effect. In this study we focus on the temporal aspects of visual speech cues. In comparison to a baseline of auditory only sentences mixed with noise, we tested the effect of making available a visual speech signal that carries the rhythm of the spoken sentence, through a temporal visual mask function linked to the times of the auditory p-centers, as quantified by stressed syllable onsets. We systematically varied the relative alignment of the peaks of the maximum exposure of visual speech cues with the presumed anchors of sentence rhythm and contrasted these speech cues against an abstract visual condition, whereby the visual signal consisted of a stylised moving curve with its dynamics determined by the mask function. We found that both visual signal types provided a significant benefit to speech recognition in noise, with the speech cues providing the largest benefit. The benefit was largely independent of the amount of delay in relation to the auditory p-centers. Taken together, the results call for further inquiry into temporal dynamics of visual and auditory speech.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Perceptual learning of speech produced by a speaker with Down Syndrome
- auteur
- Alexandre Hennequin, Amélie Rochet-Capellan, Jean-Luc Schwartz, Marion Dohen
- article
- SMC 2017 - 7th International Conference on Speech Motor Control (SMC 2017), Jul 2017, Groningen, Netherlands
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- La réduction vocalique en coratin ? Une approche phonologique et phonétique : Y-a-t-il des traces de la composante phonologique dans le signal acoustique des schwas ?
- auteur
- Jonathan Bucci, Jean-Luc Schwartz
- article
- RFP 2017 - 15èmes Rencontres du Réseau Français de Phonologie, Jul 2017, Grenoble, France
- typdoc
- Conference papers
- Accès au bibtex
-
- titre
- Perception audio-visuelle de séquences VCV produites par des personnes porteuses de Trisomie 21
- auteur
- Alexandre Hennequin, Amélie Rochet-Capellan, Marion Dohen
- article
- JPC 2017 - 7èmes Journées de Phonétique Clinique, Jun 2017, Paris, France
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Modeling sensory preference in speech motor planning
- auteur
- Jean-François Patri, Julien Diard, Pascal Perrier
- article
- NCM 2017 - 27th Annual Meeting of the Society for Neural Control of Movement, May 2017, Dublin, Ireland
- resume
- Speech is a stream of specific sounds performed by gestures of articulators of the vocal tract. The sensory correlates of speech production are therefore both auditory (concerning sounds) and somatosensory (concerning the position and configuration of articulators of the vocal tract). Since sounds are a consequence of speech gestures, these two sensory correlates appear to be redundant in unperturbed conditions. This raises questions about their functional involvement in the monitoring of speech production: is only one useful, and if so, which one? Are they instead both useful, and if so, are they equivalent or complementary? Experimental studies of compensations for auditory and somatosensory perturbations indicate that both types of sensory information are taken into account during speech production. In addition, individual sensory preferences in speech production have been observed: subjects who compensate less for somatosensory perturbations compensate more for auditory perturbations, and vice versa. Our goal is to understand how sensory preferences can operate during speech production and influence it, by using our recently designed Bayesian model of speech motor planning. To our knowledge, models of speech motor control have generally not addressed this issue since they did not systematically evaluate the consequences of variations in the weight of each modality in the specification of the motor goals. In this work, we present extensions of our original Bayesian model of speech motor planning in which speech units are characterized both in auditory and somatosensory terms. We show that sensory preferences can be modeled in two ways. In the first variant, sensory preferences are attributed to the relative precision of sensory regions characterizing speech motor goals. This is inspired from classical models of multisensory fusion for perception. Under this approach, precisions of sensory regions correspond to their tolerance to perturbations: the smaller the region, the higher the precision and the lower the tolerance to perturbations. In other words, subjects who compensate more to auditory than somatosensory perturbations would have auditory target regions smaller than their somatosensory target regions. However, since auditory and somatosensory consequences of speech gestures are highly correlated, why would these motor goal regions differ so considerably?In the second variant of our model, sensory preferences are the consequence of the precision by which the predicted sensory consequences of motor commands are compared to the sensory characterizations of motor goals. We demonstrate that under specific assumptions, our two implementations of sensory preferences are formally equivalent. This reconciles these two approaches and suggests an alternative and original interpretation of sensory preferences.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
2016
Journal articles
- titre
- Phonology in the mirror
- auteur
- Jean-Luc Schwartz, Marie-Lou Barnaud, Pierre Bessière, Julien Diard, Clément Moulin-Frier
- article
- Physics of Life Reviews, 2016, 16, pp.93-95. ⟨10.1016/j.plrev.2016.01.007⟩
- resume
- The contribution by M.A. Arbib over the years and as it appears summarized and conceptualized in this paper is admirable, extremely impressive, and very convincing in many aspects. A key value of this work is that it systematically attempts to introduce formal conceptualization and modeling in the reasoning about facts and interpretations.
- typdoc
- Journal articles
- DOI
- DOI : 10.1016/j.plrev.2016.01.007
- Accès au texte intégral et bibtex
-
Conference papers
- titre
- Sensorimotor learning in a Bayesian computational model of speech communication
- auteur
- Marie-Lou Barnaud, Jean-Luc Schwartz, Julien Diard, Pierre Bessière
- article
- ICDL-EpiRob 2016 - 6th Joint IEEE International Conference Developmental Learning and Epigenetic Robotics, Sep 2016, Cergy-Pontoise, France
- resume
- Although sensorimotor exploration is a basic process within child development, clear views on the underlying computational processes remain challenging. We propose to compare eight algorithms for sensorimotor exploration, based on three components: " accommodation " performing a compromise between goal babbling and social guidance by a master, " local extrapolation " simulating local exploration of the sensorimotor space to achieve motor generalizations and " idiosyncratic babbling " which favors already explored motor commands when they are efficient. We will show that a mix of these three components offers a good compromise enabling efficient learning while reducing exploration as much as possible.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Audiovisual Speech Scene Analysis in the Context of Competing Sources
- auteur
- Attigodu C Ganesh, Frédéric Berthommier, Jean-Luc Schwartz
- article
- Interspeech 2016 - 17th Annual Conference of the International Speech Communication Association, Sep 2016, San Francisco, United States. pp.47 - 51, ⟨10.21437/Interspeech.2016-62⟩
- resume
- Audiovisual fusion in speech perception is generally conceived as a process independent from scene analysis, which is supposed to occur separately in the auditory and visual domain. On the contrary, we have been proposing in the last years that scene analysis such as what takes place in the cocktail party effect was an audiovisual process. We review here a series of experiments illustrating how audiovisual speech scene analysis occurs in the context of competing sources. Indeed, we show that a short contextual audiovisual stimulus made of competing auditory and visual sources modifies the perception of a following McGurk target. We interpret this in terms of binding, unbinding and rebinding processes, and we show how these processes depend on audiovisual correlations in time, attentional processes and differences between junior and senior participants.
- typdoc
- Conference papers
- DOI
- DOI : 10.21437/Interspeech.2016-62
- Accès au texte intégral et bibtex
-
- titre
- Bayesian Modeling in Speech Motor Control: A Principled Structure for the Integration of Various Constraints
- auteur
- Jean-François Patri, Pascal Perrier, Julien Diard
- article
- Interspeech 2016 - 17th Annual Conference of the International Speech Communication Association, Sep 2016, San Francisco, United States. pp.3588-3592, ⟨10.21437/Interspeech.2016-441⟩
- resume
- Speaking involves sequences of linguistic units that can be produced under different sets of control strategies. For instance, a given phoneme can be achieved with different acoustic properties , and a sequence of phonemes can be performed at different speech rates and with different prosodies. How does the Central Nervous System select a specific control strategy among all the available ones? In a previously published article we proposed a Bayesian model that addressed this question with respect to the multiplicity of acoustic realizations of a sequence of phonemes. One of the strengths of Bayesian modeling is that it is well adapted to the combination of multiple constraints. In the present paper we illustrate this feature by defining an extension of our previous model that includes force constraints related to the level of effort for the production of phoneme sequences , as it could be the case in clear versus casual speech. The integration of this additional constraint is used to model the control of articulation clarity. Pertinence of the results is illustrated by controlling a biomechanical model of the vocal tract for speech production.
- typdoc
- Conference papers
- DOI
- DOI : 10.21437/Interspeech.2016-441
- Accès au texte intégral et bibtex
-
- titre
- Auditory-Visual Perception of VCVs Produced by People with Down Syndrome: Preliminary Results
- auteur
- Alexandre Hennequin, Amélie Rochet-Capellan, Marion Dohen
- article
- Interspeech 2016 - 17th Annual Conference of the International Speech Communication Association, Sep 2016, San Francisco, United States. ⟨10.21437/Interspeech.2016-1198⟩
- resume
- Down Syndrome (DS) is a genetic disease involving a number of anatomical, physiological and cognitive impairments. More particularly it affects speech production abilities. This results in reduced intelligibility which has however only been evaluated auditorily. Yet, many studies have demonstrated that adding vision to audition helps perception of speech produced by people without impairments especially when it is degraded as is the case in noise. The present study aims at examining whether the visual information improves intelligibility of people with DS. 24 participants without DS were presented with VCV sequences (vowel-consonant-vowel) produced by four adults (2 with DS and 2 without DS). These stimuli were presented in noise in three modalities: auditory, auditory-visual and visual. The results confirm a reduced auditory intelligibility of speakers with DS. They also show that, for the speakers involved in this study, visual intelligibility is equivalent to that of speakers without DS and compensates for the auditory intelligibility loss. An analysis of the perceptual errors shows that most of them involve confusions between consonants. These results put forward the crucial role of multimodality in the improvement of the intelligibility of people with DS.
- typdoc
- Conference papers
- DOI
- DOI : 10.21437/Interspeech.2016-1198
- Accès au texte intégral et bibtex
-
- titre
- Does Auditory-Motor Learning of Speech Transfer from the CV Syllable to the CVCV Word?
- auteur
- Tiphaine Caudrelier, Pascal Perrier, Jean-Luc Schwartz, Amélie Rochet-Capellan
- article
- Interspeech 2016 - 17th Annual Conference of the International Speech Communication Association, Sep 2016, San Francisco, United States. pp.2095 - 2099, ⟨10.21437/Interspeech.2016-262⟩
- resume
- Speech is often described as a sequence of units associating linguistic, sensory and motor representations. Is the connection between these representations preferentially maintained at a specific level in terms of a linguistic unit? In the present study, we contrasted the possibility of a link at the level of the syllable (CV) and the word (CVCV). We modified the production of the syllable /be/ in French speakers using an auditory-motor adaptation paradigm that consists of altering the speakers' auditory feedback. After stopping the perturbation, we studied to what extent this modification would transfer to the production of the disyllabic word /bebe/ and compared it to the after-effect on /be/. The results show that changes in /be/ transfer partially to /bebe/. The partial influence of the somatosensory and motor representations associated with the syllable on the production of the disyllabic word suggests that both units may contribute to the specification of the motor goals in speech sequences. In addition, the transfer occurs to a larger extent in the first syllable of /bebe/ than in the second one. It raises new questions about a possible interaction between the transfer of auditory-motor learning and serial control processes.
- typdoc
- Conference papers
- DOI
- DOI : 10.21437/Interspeech.2016-262
- Accès au texte intégral et bibtex
-
- titre
- Assessing Idiosyncrasies in a Bayesian Model of Speech Communication
- auteur
- Marie-Lou Barnaud, Julien Diard, Pierre Bessière, Jean-Luc Schwartz
- article
- Interspeech 2016 - 17th Annual Conference of the International Speech Communication Association, Sep 2016, San Francisco, United States. ⟨10.21437/Interspeech.2016-396⟩
- resume
- Although speakers of one specific language share the same phoneme representations, their productions can differ. We propose to investigate the development of these differences in production , called idiosyncrasies, by using a Bayesian model of communication. Supposing that idiosyncrasies appear during the development of the motor system, we present two versions of the motor learning phase, both based on the guidance of an agent master: " a repetition model " where agents try to imitate the sounds produced by the master and " a communication model " where agents try to replicate the phonemes produced by the master. Our experimental results show that only the " communication model " provides production idiosyncrasies, suggesting that idiosyncrasies are a natural output of a motor learning process based on a communicative goal.
- typdoc
- Conference papers
- DOI
- DOI : 10.21437/Interspeech.2016-396
- Accès au texte intégral et bibtex
-
- titre
- The role of the premotor cortex in multisensory speech perception throughout adulthood: a rTMS study
- auteur
- Avril Treille, Marc Sato, Jean-Luc Schwartz, Coriandre Emmanuel Vilain, Pascale Tremblay
- article
- SNL 2016 - 8th Annual Meeting of the Society for the Neurobiology of Language, Society for the Neurobiology of Language, Aug 2016, Londres, United Kingdom
- resume
- Although neurobiological models of language argue for a left lateralization of the audio-motor dorsal pathway during speech perception [1], the question of the role of the right and left premotor ventral (PMv) areas in multisensory speech integration processes remains largely unknown. What is the contribution of the hemispheric differentiation in the integration processing and what role do premotor areas play in these mechanisms? Further, given the known differences in speech perception accuracy for young and older adults and decreasing sensorial acuities with age, it is possible that the lateralization of multimodal integration processes evolves over time. In the present study, we explored the impact of inhibitory transcranial magnetic stimulation (rTMS) on the right and left PMv during auditory (A), visual (V) and tactile (T) unimodal conditions as well as during audio-visual (AV) and audio-tactile (AT) bimodal conditions across age.The experiment consisted of 2 rTMS sessions (related to the left and right PMv) conducted for each of the 24 healthy right-handed participants (16 females; mean 46±19 [19-78] years). Following completion of each session, participants performed a force-choice identification task. They were told that they would be presented with /pa/, /ta/ or /ka/ syllables in 5 different sensory modalities (A, V, AV, T, AT) and had to identify, as quickly as possible, the perceived syllable by pressing on one of three keys on a response pad. To increase task-difficulty, half the trials were presented in quiet, while the other half was presented in noise. Mixed-model ANOVAs were conducted on the mean accuracy and median reaction time (RT) data with the target region (left PMv/right PMv), the acoustic environment (noise/no noise), and the modality (A/AV/AT/V/T for the accuracy analysis and A/AV/AT for the RT analysis) as the within-subjects factors, the order of the stimulation (left PMv first, right PMv first) as a between-subjects factor, and age as a continuous quantitative between-subjects co-variable.Our results demonstrate that multimodal integration is relatively preserved in aging, becoming slower but not less accurate. Importantly, we found a significant linear negative relationship between hemispheric difference and age in the auditory modality, i.e., with increasing age, hemispheric differences declined. Interestingly, no such difference occurred in the bimodal modalities. This suggests a larger recruitment of the right PMv to support auditory speech processes in elderly adults, possibly as a consequence of a reduced auditory acuity with age, or a de-differentiation of the phonemic categories. In contrast, the absence of an age effect in the bimodal conditions suggests that multisensory processing remains stable throughout adulthood. Together, these results demonstrate that multisensory integration mechanisms are, at least in part, maintained with age despite a decline in auditory acuity, and demonstrate the feasibility of using rTMS in healthy elderly adults to study speech and language processes.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Perception audio-visuelle de séquences VCV produites par des personnes porteuses de Trisomie 21 : une étude préliminaire
- auteur
- Alexandre Hennequin, Amélie Rochet-Capellan, Marion Dohen
- article
- JEP-TALN-RECITAL 2016 - conférence conjointe 31e Journées d'Études sur la Parole, 23e Traitement Automatique des Langues Naturelles, 18e Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, Jul 2016, Paris, France
- resume
- La parole des personnes avec trisomie 21 (T21) présente une altération systématique de l'intelligibilité qui n'a été quantifiée qu'auditivement. Or la modalité visuelle pourrait améliorer l'intelligibilité comme c'est le cas pour les personnes « ordinaires ». Cette étude compare la manière dont 24 participants ordinaires perçoivent des séquences VCV voyelle-consonne-voyelle) produites par quatre adultes (2 avec T21 et 2 ordinaires) et présentées dans le bruit en modalités auditive, visuelle et audiovisuelle. Les résultats confirment la perte d'intelligibilité en modalité auditive dans le cas de locuteurs porteurs de T21. Pour les deux locuteurs impliqués, l'intelligibilité visuelle est néanmoins équivalente à celle des deux locuteurs ordinaires et compensent le déficit d'intelligibilité auditive. Ces résultats suggèrent l'apport de la modalité visuelle vers une meilleure intelligibilité des personnes porteuses de T21.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Modélisation bayésienne de la planification motrice des gestes de parole : Évaluation du rôle des différentes modalités sensorielles
- auteur
- Jean-François Patri, Julien Diard, Pascal Perrier
- article
- JEP-TALN-RECITAL 2016 - conférence conjointe 31e Journées d'Études sur la Parole, 23e Traitement Automatique des Langues Naturelles, 18e Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, Jul 2016, Paris, France. pp.419-427
- resume
- La prise en compte des informations auditives et proprioceptives dans le contrôle de la parole est mise en évidence par un nombre croissant de résultats expérimentaux. Cependant, les modèles de production imposent le plus souvent l'une ou l'autre des modalités, ou n'offrent pas de cadre formel pour évaluer leurs contributions respectives. Nous proposons d'explorer le rôle de ces modalités sensorielles dans la planification des gestes de parole à partir d'un modèle bayésien représentant la structure des connaissances mises en jeu dans cette tâche. Le modèle permet d'envisager trois mécanismes de planification, reposant sur la modalité auditive, proprioceptive ou sur les deux conjointement. Nous comparons des simulations obtenues par les deux premiers mécanismes de planification. Les résultats indiquent des réalisations articulatoires différentes mais donnant néanmoins des réalisations auditives qualitativement similaires dans leur variabilité.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- De bé à bébé : le transfert d'apprentissage auditori-moteur pour interroger l'unité de production de la parole
- auteur
- Tiphaine Caudrelier, Pascal Perrier, Jean-Luc Schwartz, Christophe Savariaux, Amélie Rochet-Capellan
- article
- JEP-TALN-RECITAL 2016 - conférence conjointe 31e Journées d'Études sur la Parole, 23e Traitement Automatique des Langues Naturelles, 18e Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, AFCP, Jul 2016, Paris, France
- resume
- La parole est souvent décrite comme une mise en séquence d'unités associant des représentations linguistiques, sensorielles et motrices. Le lien entre ces représentations se fait-il de manière privilégiée sur une unité spécifique ? Par exemple, est-ce la syllabe ou le mot ? Dans cette étude, nous voulons contraster ces deux hypothèses. Pour cela, nous avons modifié chez des locuteurs du français la production de la syllabe « bé », selon un paradigme d'adaptation auditori-motrice, consistant à perturber le retour auditif. Nous avons étudié comment cette modification se transfère ensuite à la production du mot « bébé ». Les résultats suggèrent un lien entre représentations linguistiques et motrices à plusieurs niveaux, à la fois celui du mot et de la syllabe. Ils montrent également une influence de la position de la syllabe dans le mot sur le transfert, qui soulève de nouvelles questions sur le contrôle sériel de la parole.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Phoneme categorization depends on production abilities during the first year of life
- auteur
- Marjorie Dole, Hélène Loevenbruck, O Pascalis, Jean-Luc Schwartz, Anne Vilain
- article
- ICIS 2016 - International Conference on Infant Studies (ICIS 2016), May 2016, La Nouvelle Orléans, LA, United States
- resume
- An old-standing debate still not resolved in the field of speech communication concerns the nature of the representations underlying speech perception. On the one hand, auditory theories (e.g., Diehl et al., 2004) claim that the basic units in speech perception are purely auditory whereas the Motor Theory (Liberman et al., 1962; Galantucci et al., 2006) proposes that speech perception involves motor representations. Recently Schwartz et al. (2012), in a perceptuo-motor theory, claimed that perceptual and motor representations both play a role in the processing of speech units. To better understand the development of perceptuo-motor interactions during the first year of life, we examined the influence of speech production abilities on phonemic categorization in infants. We used an intersensory matching procedure in order to evaluate infants’ ability to bind auditory and visual information about a consonant category into a single representation. 6-to 12-month old French infants were familiarized with auditory syllables with different vowel contexts (e.g., /be/-/bi/-/bu). In the test phase, two side-by-side silent videos of faces repeatedly pronouncing consonants in a new vowel context (/ba/ on one side and /da/ on the other side) were presented and looking times (LTs) to each video were compared. In this protocol, infants who are able to extract the common (e.g., labial) gesture in the audio syllables, should be able to relate it to the same gesture in the visual stimuli and should show different LTs for the two test stimuli (/ba/ vs. /da/). Speech production abilities of each of the 6- to 12-month-old infants were assessed using a parental questionnaire. Infants were assigned to one of three production groups, Non Babbling, i.e. infants who did not produce the /b/-/d/ consonants, Canonical Babbling, i.e. infants who produced the consonants with only one vowel (e.g. ‘bababa’ or ‘dadada’), or Variegated Babbling, i.e. infants who produced the consonants with different vowels (e.g. ‘babibu’). We expected better categorization and better auditory-visual association in infants with greater production experience(i.e., infants in the Babbling phase), than in infants with fewer productions (Non Babbling infants). Results showed no main effect of age, however, 9-month old infants showed a significant categorization effect (one-sample t-test p<0.05) whereas 6- and 12-month olds did not. When taking production abilities into account, infants in the Variegated Babbling phase exhibited better categorization abilities that infants in the Canonical Babbling phase or Non Babbling infants. This suggests that greater production abilities are linked to better perception abilities, however this result could be linked to general language abilities. To eliminate this possibility and validate our hypothesis, we plan to test 6-to 12-month old infants using the same procedure with a /v/ vs. /z/ contrast, involving consonants that most French infants should not be able to produce yet. We expect an absence of audio-visual association with this contrast in all infants. The absence of audio-visual association with unproduced consonants together with the occurrence of audio-visual association with frequently produced consonants, would be a strong argument in favor of the development of a perceptuo-motor link during the first year of life. Taken together these studies should allow us to better assess the role of motor knowledge in the development of speech perception.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Tes mots me touchent : étude des apports de la modalité tactile dans la perception de la parole
- auteur
- Avril Treille, Coriandre Emmanuel Vilain, Marc Sato
- article
- INSHEA - Colloque intenational « Toucher pour apprendre, toucher pour communiquer », Mar 2016, Paris, France
- resume
- La parole est le fruit de la mise en fonctionnement d’articulateurs spécifiques plus ou moins visibles. C’est à travers ce processus complexe qu’émergent les sons destinés à la formation du message linguistique. On la dit souvent audio-visuelle, mais on oublie que ce sont des mouvements que l’on peut aussi toucher. C’est notamment grâce à cette propriété que des personnes sourdes et aveugles sont capables de communiquer. Ils emploient la méthode Tadoma qui consiste à placer une main sur le visage du locuteur, le pouce à la verticale des lèvres et les autres doigts le long de la mâchoire, afin de ressentir le mouvement des sons produits. Si les mécanismes de fusion des modalités auditive et visuelle ont été largement étudiés chez les sujets normo-entendants et normo-voyants aucune étude ne s’est portée sur la fusion des informations provenant de l’audition et du toucher, un des sens le plus utilisé dans la vie quotidienne, mais rarement employé pour percevoir de la parole. Sommes-nous capables de décoder un message linguistique à partir de nouvelles informations -tactiles- inconnues jusqu’alors ? Les mécanismes utilisés lors de l’intégration de ces deux modalités sont-ils semblables à ceux utilisés dans la perception audio-visuelle de la parole ? Afin de répondre à ces questions, nous nous sommes inspirés de la méthode Tadoma pour réaliser deux expériences en électroencéphalographie sur la perception audio-tactile des syllabes /pa/ & /ta/ (expérience 1, Treille et al., 2014a) et /pa/, /ta/ & /ka/ (expérience 2, Treille et al., 2014b) sur une population de sujets sains normo-entendants. Nos résultats montrent dans un premier temps que des sujets naïfs sont capables d’identifier tactilement les syllabes prononcées par l’expérimentatrice, suggérant ainsi une utilisation de leurs connaissances motrices liées à la production de la parole pour faciliter le décodage des informations tactiles des mouvements perçus. D’autre part, nous avons également montré l’existence de mécanismes d’intégration similaires à ceux utilisés pour fusionner les informations auditives et visuelles grâce à la mise en évidence de marqueurs électrophysiologiques spécifiques aux processus d’intégration, notamment une facilitation temporelle du traitement auditif lorsque les modalités visuelles ou tactiles sont ajoutées ainsi qu’une diminution de la réponse neuronale lors de la présentation de stimuli bimodaux par rapport à la condition auditive seule. Pris ensembles, ces résultats soulignent l’extraordinaire capacité de notre cerveau à faire appel à nos connaissances sensorielles et motrices pour traiter au mieux les informations inconnues, comme celles provenant du toucher, afin de parvenir à une forme de communication. Une nouvelle étude en préparation devrait permettre d’identifier, à l’aide d’une lésion virtuelle des régions motrices, si notre système moteur est effectivement impliqué dans les mécanismes d’intégration audio-tactile de la parole.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
2015
Journal articles
- titre
- Somatosensory Event-related Potentials from Orofacial Skin Stretch Stimulation
- auteur
- Takayuki Ito, David J Ostry, Vincent Gracco
- article
- Journal of visualized experiments : JoVE, 2015, 106, pp.e53621. ⟨10.3791/53621⟩
- resume
- Cortical processing associated with orofacial somatosensory function in speech has received limited experimental attention due to the difficulty of providing precise and controlled stimulation. This article introduces a technique for recording somatosensory event-related potentials (ERP) that uses a novel mechanical stimulation method involving skin deformation using a robotic device. Controlled deformation of the facial skin is used to modulate kinesthetic inputs through excitation of cutaneous mechanoreceptors. By combining somatosensory stimulation with electroencephalographic recording, somatosensory evoked responses can be successfully measured at the level of the cortex. Somatosensory stimulation can be combined with the stimulation of other sensory modalities to assess multisensory interactions. For speech, orofacial stimulation is combined with speech sound stimulation to assess the contribution of multi-sensory processing including the effects of timing differences. The ability to precisely control orofacial somatosensory stimulation during speech perception and speech production with ERP recording is an important tool that provides new insight into the neural organization and neural representations for speech. Video Link The video component of this article can be found at http://www.jove.com/video/53621
- typdoc
- Journal articles
- DOI
- DOI : 10.3791/53621
- Accès au texte intégral et bibtex
-
- titre
- On the cognitive nature of speech sound systems
- auteur
- Jean-Luc Schwartz, Clément Moulin-Frier, Pierre-Yves Oudeyer
- article
- Journal of Phonetics, 2015, Special Issue : "On the cognitive nature of speech sound systems", 53, pp.1-175. ⟨10.1016/j.wocn.2015.09.008⟩
- resume
- During the last 50 years, the question of the cognitive nature of phonological units has followed the rhythm of the persistent debate between auditory and motor theories of speech communication. Though recent advances in cognitive neuroscience and cognitive psychology have largely renewed this debate, a consensus is still out of reach, and the true nature of speech units in the human brain remains elusive.A dimension of importance in this debate is a systemic one: speech units are not isolated, they are part of a phonological system, and they obey structural principles regarding well-investigated properties as distinctiveness, compositionality, contextual dependencies or systemic regularities. The phonological system itself is also part of a complex network of interaction with low-level biomechanical and sensory-motor systems, with higher-level brain structures regulating cognition, emotion and motivation, and finally with the social structures in which all these systems are embedded.Connecting assumptions or theories about the nature of speech units with a structuralist view about the relationship between phonetic properties and phonological systems has given rise to a number of major breakthroughs in speech science, for instance Lindblom’s bridges between the Variable Adaptive Theory (or its Hyper-Hypo variant) of speech communication (Lindblom, 1990) and the Dispersion Theory of vowel systems (Lindblom, 1986); or Stevens’ Quantal Theory (Stevens, 1972, 1989) addressing both the invariance issue and the search for the origins of distinctiveness and phonetic features; or the tandem between the Motor Theory of Speech Perception (Liberman & Mattingly, 1985) and Articulatory Phonology (Browman & Goldstein, 1992) in the Haskins Labs. This Special Issue is centered around a target paper by Moulin-Frier et al. that aims at relating the question of the auditory vs. motor vs. perceptuo-motor nature of speech units with simulations of vowel, plosive and syllable systems of human languages emerging from agent interactions, in a computational Bayesian framework. In this context, the papers in the special issue explore further the systemic perspective, studying how various dimensions of physical, cognitive, motivational and interactional systems can inform our understanding of the origins of speech forms.
- typdoc
- Journal articles
- DOI
- DOI : 10.1016/j.wocn.2015.09.008
- Accès au texte intégral et bibtex
-
- titre
- Audio-visual speech scene analysis: Characterization of the dynamics of unbinding and rebinding the McGurk effect
- auteur
- Olha Nahorna, Frédéric Berthommier, Jean-Luc Schwartz
- article
- Journal of the Acoustical Society of America, 2015, 137 (1), pp.362-377. ⟨10.1121/1.4904536⟩
- resume
- While audiovisual interactions in speech perception have long been considered as automatic, recentdata suggest that this is not the case. In a previous study, Nahorna et al. [(2012). J. Acoust. Soc.Am. 132, 1061–1077] showed that the McGurk effect is reduced by a previous incoherentaudiovisual context. This was interpreted as showing the existence of an audiovisual binding stagecontrolling the fusion process. Incoherence would produce unbinding and decrease the weight ofthe visual input in fusion. The present paper explores the audiovisual binding system to characterizeits dynamics. A first experiment assesses the dynamics of unbinding, and shows that it is rapid: Anincoherent context less than 0.5 s long (typically one syllable) suffices to produce a maximalreduction in the McGurk effect. A second experiment tests the rebinding process, by presenting ashort period of either coherent material or silence after the incoherent unbinding context.Coherence provides rebinding, with a recovery of the McGurk effect, while silence providesno rebinding and hence freezes the unbinding process. These experiments are interpreted in theframework of an audiovisual speech scene analysis process assessing the perceptual organization ofan audiovisual speech input before decision takes place at a higher processing stage.
- typdoc
- Journal articles
- DOI
- DOI : 10.1121/1.4904536
- Accès au texte intégral et bibtex
-
- titre
- COSMO (“Communicating about Objects using Sensory–Motor Operations”): A Bayesian modeling framework for studying speech communication and the emergence of phonological systems
- auteur
- Clément Moulin-Frier, Julien Diard, Jean-Luc Schwartz, Pierre Bessière
- article
- Journal of Phonetics, 2015, 53, pp.5-41. ⟨10.1016/j.wocn.2015.06.001⟩
- resume
- While the origin of language remains a somewhat mysterious process, understanding how human language takes specific forms appears to be accessible by the experimental method. Languages, despite their wide variety, display obvious regularities. In this paper, we attempt to derive some properties of phonological systems (the sound systems for human languages) from speech communication principles. We introduce a model of the cognitive architecture of a communicating agent, called COSMO (for “Communicating about Objects using Sensory–Motor Operations') that allows a probabilistic expression of the main theoretical trends found in the speech production and perception literature. This enables a computational comparison of these theoretical trends, which helps us to identify the conditions that favor the emergence of linguistic codes. We present realistic simulations of phonological system emergence showing that COSMO is able to predict the main regularities in vowel, stop consonant and syllable systems in human languages.
- typdoc
- Journal articles
- DOI
- DOI : 10.1016/j.wocn.2015.06.001
- Accès au texte intégral et bibtex
-
- titre
- Multisensory and sensorimotor interactions in speech perception
- auteur
- Kaisa Tiippana, Riikka Möttönen, Jean-Luc Schwartz
- article
- Frontiers in Psychology, 2015, ⟨10.3389/fpsyg.2015.00458⟩
- resume
- This research topic presents speech as a natural, well-learned, multisensory communication signal, processed by multiple mechanisms. Reflecting the general status of the field, most articles focus on audiovisual speech perception and many utilize the McGurk effect, which arises when discrepant visual and auditory speech stimuli are presented (McGurk and MacDonald, 1976). Tiippana (2014) argues that the McGurk effect can be used as a proxy for multisensory integration provided it is not interpreted too narrowly. Several articles shed new light on audiovisual speech perception in special populations. It is known that individuals with autism spectrum disorder (ASD, e.g., Saalasti et al., 2012) or language impairment (e.g., Meronen et al., 2013) are generally less influenced by the talking face than peers with typical development. Here Stevenson et al. (2014) propose that a deficit in multisensory integration could be a marker of ASD, and a component of the associated deficit in communication. However, three studies suggest that integration is not deficient in some communication disorders. Irwin and Brancazio (2014) show that children with ASD looked less at the mouth region, resulting in poorer visual speech perception and consequently weaker visual influence. Leybaert et al. (2014) report that children with specific language impairment recognized visual and auditory speech less accurately than their controls, affecting audiovisual speech perception, while audiovisual integration per se seemed unimpaired. In a similar vein, adult patients with aphasia showed unisensory deficits but still integrated audiovisual speech information (Andersen and Starrfelt, 2015). Multisensory information can influence response accuracy and processing speed (e.g., Molholm et al., 2002; Klucharev et al., 2003). Scarbel et al. (2014) show that oral responses to speech in noise were faster but less accurate than manual responses, suggesting that oral responses are planned at an earlier stage than manual responses. Sekiyama et al. (2014) show that older adults were more influenced by visual speech than younger adults and correlated this fact to their slower reaction times to auditory stimuli. Altieri and Hudock (2014) report variation in reaction time and accuracy benefits for audiovisual speech in hearing-impaired observers, emphasizing the importance of individual differences in integration. Finally, Heald and Nusbaum (2014) show that when there were two possible talkers instead of just one, audiovisual information appeared to distract the observer from the task of word recognition and slowed down their performance. This finding demonstrates that multisensory stimulation does not always facilitate performance. While multisensory stimulation is thought to be beneficial for learning (Shams and Seitz, 2008), evidence for this is still scarce. In the current research topic, the overall utility of multisensory learning is brought under question. In a paradigm training to associate novel words and pictures , Bernstein et al. (2014) show no benefit of audiovisual presentation compared with auditory presentation for normal hearing individuals, and even a degradation for adults with hearing impairment. In a study of cued speech, i.e., specific hand-signs for different speech sounds, Bayard et al. (2014) demonstrate that individuals with hearing impairment used the visual cues differently from their controls, even though both groups were experts in cued speech. Kelly et al. (2014)
- typdoc
- Journal articles
- DOI
- DOI : 10.3389/fpsyg.2015.00458
- Accès au texte intégral et bibtex
-
- titre
- Optimal speech motor control and token-to-token variability: a Bayesian modeling approach
- auteur
- Jean-François Patri, Julien Diard, Pascal Perrier
- article
- Biological Cybernetics (Modeling), 2015, 109 (6 (2015)), pp.611--626. ⟨10.1007/s00422-015-0664-4⟩
- resume
- The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the Central Nervous System selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.
- typdoc
- Journal articles
- DOI
- DOI : 10.1007/s00422-015-0664-4
- Accès au texte intégral et bibtex
-
Conference papers
- titre
- From Sensorimotor Experience To Speech Unit -Adaptation to altered auditory feedback in speech to assess transfer of learning in complex serial movements
- auteur
- Tiphaine Caudrelier, Jean-Luc Schwartz, Pascal Perrier, Christophe Savariaux, Amélie Rochet-Capellan
- article
- Neuroscience 2015 - Annual meeting of the Society for Neuroscience, Oct 2015, Chicago, United States
- resume
- Using bird song as a model to understand generalization in motor learning, Hoffman and Sober recently found that adaptation to pitch-shift of birds’ vocal output transfered to the production of the same sounds embedded in a different serial context (J. Neurosc 2014). In humans, speech learning has been found to transfer as a function of the acoustical similarity between the training and the testing utterances (Cai et al. 2010, Rochet-Capellan et al. 2011) but it is unclear if transfer of learning is sensitive to serial order. We investigate the effects of serial order on transfer of speech motor learning using non-words sequences of CV syllables. Three groups of native speakers of French were trained to produce the syllable /be/ repetitively while their auditory feedback was altered in real time toward /ba/. They were then tested for transfer toward /be/ (control), /bepe/ or /pebe/ under normal feedback conditions. The training utterance was then produced again to test for after-effects. The auditory shift was achieved in real time using Audapter software (Cai et al. 2008). Adaptation and transfer effects were quantified in terms of changes in formants frequencies of the vowel /e/, as a function of its position and the preceding consonant in the utterance. Changes in formant frequencies in a direction opposite to the shift were significant for ~80% of the participants. Adaptation was still significant for the three groups in the after-effect block. Transfer effects in the /bepe/ and /pebe/ groups were globally smaller than that of the control group, particularly when the vowel /e/ came after /p/ and/or was in second position in the utterance. Taken together, the results suggest that transfer of speech motor learning is not homogenous and as observed by Hoffman and Sober, depends on the serial context of a sound within the utterance.Cai S, Boucek M, Ghosh SS, Guenther FH, Perkell JS. (2008). A system foronline dynamic perturbation of formant frequencies and results from perturbation of the Mandarin triphthong /iau/. In Proceedings of the 8th Intl. Seminar on Speech Production, Strasbourg, France, Dec. 8-12, 2008. pp. 65Cai, S., Ghosh, S. S., Guenther, F. H., & Perkell, J. S. (2010). Adaptive auditory feedback control of the production of formant trajectories in the Mandarin triphthong/iau/and its pattern of generalization. The Journal of the Acoustical Society of America, 128(4), 2033-2048.Hoffmann, L. A., & Sober, S. J. (2014). Vocal generalization depends on gesture identity and sequence. The Journal of Neuroscience, 34(16), 5564-5574.Rochet-Capellan, A., Richer, L., & Ostry, D. J. (2012). Nonhomogeneous transfer reveals specificity in speech motor learning. Journal of neurophysiology, 107(6), 1711-1717.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Speech in the mirror? Neurobiological correlates of self speech perception
- auteur
- Avril Treille, Coriandre Emmanuel Vilain, Sonia Kandel, Jean-Luc Schwartz, Marc Sato
- article
- SNL 2015 - 7th Annual Society for the Neurobiology of Language Conference, Oct 2015, Chicago, United States
- resume
- Self-awareness and self-recognition during action observation may partly result from a functional matching between action and perception systems. This perception-action interaction enhances the integration between sensory inputs and our own sensory-motor knowledge. We present combined EEG and fMRI studies examining the impact of self-knowledge on multisensory integration mechanisms. More precisely, we investigated this impact during auditory, visual and audio-visual speech perception. Our hypothesis was that hearing and/or viewing oneself talk would facilitate the bimodal integration process and activate sensory-motor maps to a greater extent than observing others. In both studies, half of the stimuli presented the participants’ own productions (self condition) and the other half presented an unknown speaker (other condition). For the “self” condition, we recorded videos of each participant producing/pa/, /ta/ and /ka/ syllables. In the “other” condition, we recorded videos of a speaker the participants had never met producing the same syllables. These recordings were then presented in different modalities: auditory only (A), visual only (V), audio-visual (AV) and incongruent audiovisual (AVi – incongruency referred to different speakers for the audio and video components). In the EEG experiment, 18 participants had to categorize the syllables. In the fMRI experiment, 12 participants had listen to and/or view passively the syllables.In the EEG session, audiovisual interactions were estimated by comparing auditory N1/P2 ERPs during bimodal responses (AV) with the sum of the responses in A and V only conditions (A+V). The amplitude of P2 ERPs was lower for AV than A+V. Importantly, latencies for N1 ERPs were shorter for the “Visual-self” condition than the “Visual-other”, regardless of signal type. In the fMRI session, the presentation modality had an impact on brain activation: activation was stronger for audio or audiovisual stimuli in the superior temporal auditory regions (A= AV=AVi> V), and for video or audiovisual stimuli in MT/V5 and in the premotor cortices (V=AV=AVi> A). In addition, brain activity was stronger in the “self” than the “other” condition both at the left posterior inferior frontal gyrus and cerebellum (lobules I-IV). In line with previous studies on multimodal speech perception, our results point to the existence of integration mechanisms of auditory and visual speech signals. Critically, they further demonstrate a processing advantage when the perceptual situation involves our own speech production. In addition, hearing and/or viewing oneself talk increased activation in the left posterior IFG and cerebellum. These regions are generally responsible for predicting sensory outcomes of action generation. Altogether, these results suggest that viewing our own utterances leads to a temporal facilitation of auditory and visual speech integration. Moreover, processing afferent and efferent signals in sensory-motor areas leads to self -awareness during speech perception.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Integration of auditory, labial and manual signals in cued speech perception by deaf adults : an adaptation of the McGurk paradigm
- auteur
- Clémence Bayard, Jacqueline Leybaert, Cécile Colin
- article
- FAAVSP 2015 - 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing, Sep 2015, Vienne, Austria
- resume
- Among deaf individuals fitted with a cochlear implant, some use Cued Speech (CS; a system in which each syllable is uttered with a complementary manual gesture) and therefore, have to combine auditory, labial and manual information to perceive speech. We examined how audio-visual (AV) speech integration is affected by the presence of manual cues and on which form of information (auditory, labial or manual) the CS receptors primarily rely depending on labial ambiguity. To address this issue, deaf CS users (N=36) and deaf CS naïve (N=35) participants were submitted to an identification task of two AV McGurk stimuli (either with a plosive or with a fricative consonant). Manual cues were congruent with either auditory information, lip information or the expected fusion. Results revealed that deaf individuals can merge audio and labial information into a single unified percept. Without manual cues, participants gave a high proportion of fusion response (particularly with ambiguous plosive McGurk stimuli). Results also suggested that manual cues can modify the AV integration and that their impact differs between plosive and fricative McGurk stimuli.
- typdoc
- Conference papers
- Accès au bibtex
-
- titre
- Auditory-visual Perception of VCVs Produced by People with Down Syndrome: a Preliminary Study
- auteur
- Alexandre Hennequin, Amélie Rochet-Capellan, Marion Dohen
- article
- FAAVSP 2015 - 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing, Sep 2015, Vienne, Austria
- resume
- Down syndrome (DS) is the most frequent genetic disorder in humans and is present throughout society. When questioned about their child’s speech, all parents of a child with DS report speech intelligibility issues [Kumin, 2006]. People with DS actually have better receptive than expressive speech abilities [Kumin, 2006]. Improving speech production of people with DS is an important aspect of their quality of life. Understanding how perception of speech produced by people with DS could be improved could also have positive effects on their social integration.Speech difficulties in people with DS originate from anatomical and physiological specificities as well as motor impairments and appear in early childhood. For example, people with DS have a smaller vocal tract and their tongue is bigger relatively to the size of their oral cavity. Other anatomical and perceptual specificities affect their ability to produce speech (see [Kent and Vorperian, 2013] for a review). All these specificities must not only have acoustical consequences but also visual ones.To our knowledge no study has explored auditory-visual perception of speech produced by people with DS whereas it is well known that speech perception benefits from the addition of vision especially in disturbed conditions (for example in noise: [Sumby and Pollack, 1954]). This study aims at exploring if and how vision can improve the perception, by “ordinary” people, of speech produced by people with DS.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Audiovisual binding in speech perception
- auteur
- Jean-Luc Schwartz
- article
- FAAVSP 2015 - 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing, Sep 2015, Vienne, Austria
- resume
- We have been elaborating in the last years in Grenoble a series of experimental works in which we attempt to show that audiovisual speech perception comprises an “audiovisual binding” stage before fusion and decision. This stage would be in charge to extract and associate the auditory and visual cues corresponding to a given speech source, before further categorisation processes could take place at a higher stage. We developed paradigms to characterize audiovisual binding in terms of both “streaming” and “chunking” adequate pieces of information. This can lead to elements of a possible computational model, in relation with a larger theoretical perceptuo-motor framework for speech perception, the “Perception-for-Action-Control” Theory.
- typdoc
- Conference papers
- Accès au bibtex
-
- titre
- Visual lip information supports auditory word segmentation
- auteur
- Antje Strauss, Christophe Savariaux, Sonia Kandel, Jean-Luc Schwartz
- article
- FAAVSP 2015 - 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing, Sep 2015, Vienne, Austria
- resume
- Word segmentation is one of the initial processes that needs to be solved when acquiring the first or learning a second language. Acoustic cues like the fundamental frequency and segment durations have been shown to facilitate the detection of word boundaries. The role of visual speech and in particular of lip movements in word segmentation is still rather unknown. In French, liaisons, e.g. between determiner and noun, often pose a problem of several segmentation possibilities (e.g., the sequence /lafiS/ with liaison ("l’affiche") means the poster whereas without liaison ("la fiche") it means the file.). Here, we use 17 ambiguous French sequences with and without liaison. They were presented in carrier sentences either with clear acoustic cues for the first or the second segmentation possibility or with ambiguous acoustic cues. The three audio conditions were combined with lip movements hyper-articulating either the first or the second segmentation possibility in order to observe the influence of visual information on segmentation. Theparticipants had to indicate as quickly as possible which of the two versions they understood (e.g., "l’affiche" or "la fiche"?). Results show that lip information indeed biases the word segmentation decision. These data provide important implications for audiovisual integration processes.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- A Bayesian framework for speech motor control
- auteur
- Jean-François Patri, Julien Diard, Pascal Perrier, Jean-Luc Schwartz
- article
- Workshop: Probabilistic Inference and the Brain, Stanislas Dehaene (CEA, Inserm, Collège de France - France) Alain Destexhe (EITN CNRS-UNIC - France) Wolfgang Maass (Graz University - Austria) Florent Meyniel (CEA - France), Sep 2015, Paris, France
- resume
- The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the Central Nervous System selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Modeling the concurrent development of speech perception and production in a Bayesian framework
- auteur
- Marie-Lou Barnaud, Julien Diard, Pierre Bessière, Jean-Luc Schwartz
- article
- ICDL-EpiRob 2015 - 5th International Conference on Development and Learning and on Epigenetic Robotics, Aug 2015, Providence, United States
- resume
- It is now widely accepted that there is a functional relationship between the speech perception and production systems in the human brain. However, the precise mechanisms and role of this relationship still remain debated. The question of invariance and robustness in categorization are set at the center of the debate: how is stable information extracted from the variable sensory input in order to achieve speech comprehension? In this context, auditory (resp. motor, perceptuo-motor) theories propose that speech is categorized thanks to auditory (resp. motor, perceptuo-motor) processes. However, experimental evidence is still scarce and does not allow to clearly distinguish between the current theories and determine whether invariance in speech perception is of an auditory or motor type. This is why we developed COSMO, a Bayesian model comparing sensory and motor processes in the form of probability distributions which enable both theoretical developments and quantitative simulations. A first significant result in COSMO is an indistinguishability theorem: it is only by simulations of adverse conditions or partial learning that the specificity of sensory vs. motor processing can emerge and provide a basis for evaluation of the specific role of each sub-system. We present the COSMO model, and how its sensory and motor sub-systems are learned, then we describe simulations exploring the way these sub-systems differ during speech categorization. We discuss the experimental results in the light of a “narrowband vs. wideband” interpretation: the sensory sub-system is more precisely tuned to the frequently learned sensory input and hence more efficient in recognizing these inputs, providing a “narrowband” system. Conversely, the motor sub-system is less accurate to recognize learned sensory inputs but it has better generalization properties, making it more robust to unexpected variability which would provide it with “wideband” characteristics.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Le liage audiovisuel en perception de la parole, données et questions pour la modélisation neuronale
- auteur
- Jean-Luc Schwartz
- article
- 2èmes Journées Neurostic, Benoît Miramond et Martial Mermillod, Jul 2015, Paris, France
- resume
- On a longtemps considéré que les systèmes sensoriels constituaient des modules de traitement cortical autonomes, avant que les processus d’interaction et fusion multisensorielle ne prennent place au niveau des aires associatives, puis au-delà. On sait maintenant que les interactions multisensorielles sont précoces et massives, et mettent en jeu des mécanismes de liage neuronal probablement basés sur des principes de synchronisation et de codage multiplexe. J’introduirai les enjeux dans le domaine du traitement de la parole. Je décrirai les résultats comportementaux que nous avons obtenus récemment mettant en évidence des processus de liage audiovisuel avant fusion. Je décrirai l’architecture à deux étages, « liage et fusion » que nous avons proposée, et je la mettrai en regard des résultats récents portant sur les mécanismes de traitement neuronal des signaux audiovisuels de la parole.
- typdoc
- Conference papers
- Accès au bibtex
-
- titre
- Seeing our own voice: an electrophysiological study of audiovisual speech integration during self perception
- auteur
- Avril Treille, Coriandre Emmanuel Vilain, Sonia Kandel, Marc Sato
- article
- IMRF 2015 - 16th International Multisensory Research Forum, Jun 2015, Pise, Italy
- resume
- Recent studies suggest that better recognition of one's actions may result from the integration of sensory inputs with our own sensory-motor knowledge. However, whether hearing our voice and seeing our articulatory gestures facilitate audiovisual speech integration is still debated.The present EEG study examined the impact of self-knowledge during the perception of auditory, visual and audiovisual syllables that were previously recorded by a participant or a speaker he/she had never met. Audiovisual interactions were estimated on eighteen participants by comparing the EEG responses to the multisensory stimuli (AV) to the combination of responses to the stimuli presented in isolation (A+V). An amplitude decrease of early P2 auditory evoked potentials was observed during AV compared to A+V. Moreover, shorter latencies of N1 auditory evoked potentials were also observed for self-related visual stimuli compared to those of an unknown speaker.In line with previous EEG studies on multimodal speech perception, our results point to the existence of early integration mechanisms of auditory and visual speech information. Crucially, they also provide evidence for a processing advantage when the perceptual situation involves our own speech productions. Viewing our own utterances leads to a temporal facilitation of the integration of auditory and visual speech signals.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Neural correlates of auditory-somatosensory interaction in speech perception
- auteur
- Takayuki Ito, Vincent Gracco, David J Ostry
- article
- IMRF 2015 - 16th International Multisensory Research Forum, Jun 2015, Pise, Italy
- resume
- Speech perception is known to rely on both auditory and visual information. However, sound specific somatosensory input has been shown also to influence speech perceptual processing (Ito et al., 2009). In the present study we addressed further the relationship between somatosensory information and speech perceptual processing by addressing the hypothesis that the temporal relationship between orofacial movement and sound processing contributes to somatosensory-auditory interaction in speech perception. We examined the changes in event-related potentials in response to multisensory synchronous (simultaneous) and asynchronous (90 ms lag and lead) somatosensory and auditory stimulation compared to individual unisensory auditory and somatosensory stimulation alone. We used a robotic device to apply facial skin somatosensory deformations that were similar in timing and duration to those experienced in speech production. Following synchronous multisensory stimulation the amplitude of the event-related potential was reliably different from the two unisensory potentials. More importantly, the magnitude of the event-related potential difference varied as a function of the relative timing of the somatosensory-auditory stimulation. Event-related activity change due to stimulus timing was seen between 160-220 ms following somatosensory onset, mostly around the parietal area. The results demonstrate a dynamic modulation of somatosensory-auditory convergence and suggest the contribution of somatosensory information for speech processing process is dependent on the specific temporal order of sensory inputs in speech production.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Electrophysiological evidence for audio-visuo-lingual speech integration
- auteur
- Coriandre Emmanuel Vilain, Avril Treille, Marc Sato
- article
- IMRF 2015 - 16th International Multisensory Research Forum, Jun 2015, Pise, Italy
- resume
- Audio-visual speech perception is a special case of multisensory processing that interfaces with the linguistic system. One important issue is whether cross-modal interactions only depend on well-known auditory and visuo-facial modalities or, rather, might also be triggered by other sensory sources less common in speech communication. The present EEG study aimed at investigating cross-modal interactions not only between auditory, visuo-facial and audio-visuo-facial syllables but also between auditory, visuo-lingual and audio-visuo-lingual syllables.Eighteen adults participated in the study, none of them being experienced with visuo-lingual stimuli. The stimuli were acquired by means of a camera and an ultrasound system, synchronized with the acoustic signal. At the behavioral level, visuo-lingual syllables were recognized far above chance, although to a lower degree than visuo-labial syllables. At the brain level, audiovisual interactions were estimated by comparing the EEG responses to the multisensory stimuli (AV) to the combination of responses to the stimuli presented in isolation (A+V). For both visuo-labial and visuo-lingual syllables, a reduced latency and a lower amplitude of P2 auditory evoked potentials were observed for AV compared to A+V. Apart from this sub-additive effect, a reduced amplitude of N1 and a higher amplitude of P2 were also observed for lingual compared to labial movements.Although participants were not experienced with visuo-lingual stimuli, our results demonstrate that they were able to recognize them and provide the first evidence for audio-visuo-lingual speech interactions. These results further emphasize the multimodal nature of speech perception and likely reflect the impact of listener's knowledge of speech production.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Perceptual abilities in relation with motor development during the first year of life
- auteur
- Marjorie Dole, Hélène Loevenbruck, Olivier Pascalis, Jean-Luc Schwartz, Anne Vilain
- article
- WILD 2015 - 2nd Workshop on Infant Language Development, Jun 2015, Stockholm, Sweden
- resume
- To better understand the development of perceptuo-motor interactions during the first year of life we designed two studies evaluating the influence of speech production abilities on phonemic categorization. In a first study we use a visual fixation paradigm to evaluate infants’ consonant categorization in different vowel contexts. Auditory stimuli are presented via a loudspeaker located behind a screen. A /d/-/g/ contrast is employed; infants are habituated with one member of the pair associated with different vowels (/do/-/di/-/du/). When reaching the criterion of 60% of the mean looking time (LT) for the first three trials, they are presented with consonants in a new context (/da/ and /ga/). We compare LTs between familiar and novel consonants. Infants who are able to extract the common consonant (here /d/) in the different vocalic contexts should show different LTs for the two test stimuli. In a second study infants’ ability to link auditory and visual information on a consonant category into a single representation will be tested using an intersensory matching procedure. Infants will be familiarized with auditory syllables with different vowel contexts (/bo/-/bi/-/bu). In the test phase, two side-by-side silent videos of faces repeatedly pronouncing consonants in a new vowel context (/ba/ on one side and /da/ on the other) will be presented and LTs to each video will be compared. Infants who are able to extract the common gesture in the audio syllables should be able to relate it to the same gesture in the visual stimuli and show different LTs for the two test stimuli (/ba/ vs /da/). For both studies the speech production abilities of each of the 6- to 12-month-old infants are assessed using a parental questionnaire. We expect better categorization and better auditory-visual association in infants who can produce the target consonants than in those who cannot. These studies will allow us assess the role of motor knowledge in the development of speech perception.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-
- titre
- Modeling concurrent development of speech perception and production in a Bayesian framework
- auteur
- Marie-Lou Barnaud, Raphaël Laurent, Pierre Bessière, Julien Diard, Jean-Luc Schwartz
- article
- WILD 2015 - 2nd Workshop on Infant Language Development, Jun 2015, Stockholm, Sweden
- resume
- It is widely accepted that motor and auditory processes interact in speech perception, but little is known about the functional role motor processes play in the development of speech perception. To address this question we consider a Bayesian model of speech perception development based on three sets of variables: motor representations M, sensory representations S and objects O (e.g. phonological units such as phonemes). The model comprises two internal branches. Firstly, an auditory identification sub-system connects S and O. Secondly, a motor sub-system connecting M and O and a sensori-motor model connecting M and S can be combined to provide “motor identification” of sounds S, from S to M and from M to O, in an analysis-by-synthesis process. Development is modeled as a learning process in which a master iteratively produces a sensory percept S associated with an object O. The learning agent updates its auditory sub-system by observing S and O. Update of the two other branches is more complex and based on an imitation phase. The learning agent estimates a likely motor action M from input S, produces this M and observes the resulting sound S’. M, S’ and O are used to update both the motor sub-system (M, O) and the sensori-motor model (S, M). We show that the auditory identification sub-system learns rapidly, and becomes efficient for stimuli close to those provided by the master, although it generalizes poorly. By contrast, the two other sub-systems evolve more slowly, and in consequence the motor identification system performs less accurately. However, motor identification happens to have captured more variable situations during learning, and generalizes better (e.g. in noise). This is in line with a developmental schedule in which auditory processing is mature before motor knowledge (Kuhl et al, 2008) and is exploited by infants after 11 months of age for analysis-by-synthesis of unusual speech stimuli (Kuhl et al., 2014).
- typdoc
- Conference papers
- Accès au bibtex
-
Poster communications
- titre
- Speech in the mirror? Neurobiological correlates of self-speech perception
- auteur
- Avril Treille, Coriandre Emmanuel Vilain, Sonia Kandel, Jean-Luc Schwartz, Marc Sato
- article
- Seventh Annual Society for the Neurobiology of Language Conference, Oct 2015, Chicago, United States
- resume
- Self-awareness and self-recognition during action observation may partly result from a functional matching between action and perception systems. This perception-action interaction enhances the integration between sensory inputs and our own sensory-motor knowledge. We present combined EEG and fMRI studies examining the impact of self-knowledge on multisensory integration mechanisms. More precisely, we investigated this impact during auditory, visual and audio-visual speech perception. Our hypothesis was that hearing and/or viewing oneself talk would facilitate the bimodal integration process and activate sensory-motor maps to a greater extent than observing others. In both studies, half of the stimuli presented the participants’ own productions (self condition) and the other half presented an unknown speaker (other condition). For the “self” condition, we recorded videos of each participant producing/pa/, /ta/ and /ka/ syllables. In the “other” condition, we recorded videos of a speaker the participants had never met producing the same syllables. These recordings were then presented in different modalities: auditory only (A), visual only (V), audio-visual (AV) and incongruent audiovisual (AVi – incongruency referred to different speakers for the audio and video components). In the EEG experiment, 18 participants had to categorize the syllables. In the fMRI experiment, 12 participants had listen to and/or view passively the syllables. In the EEG session, audiovisual interactions were estimated by comparing auditory N1/P2 ERPs during bimodal responses (AV) with the sum of the responses in A and V only conditions (A+V). The amplitude of P2 ERPs was lower for AV than A+V. Importantly, latencies for N1 ERPs were shorter for the “Visual-self” condition than the “Visual-other”, regardless of signal type. In the fMRI session, the presentation modality had an impact on brain activation: activation was stronger for audio or audiovisual stimuli in the superior temporal auditory regions (A= AV=AVi> V), and for video or audiovisual stimuli in MT/V5 and in the premotor cortices (V=AV=AVi> A). In addition, brain activity was stronger in the “self” than the “other” condition both at the left posterior inferior frontal gyrus and cerebellum (lobules I-IV). In line with previous studies on multimodal speech perception, our results point to the existence of integration mechanisms of auditory and visual speech signals. Critically, they further demonstrate a processing advantage when the perceptual situation involves our own speech production. In addition, hearing and/or viewing oneself talk increased activation in the left posterior IFG and cerebellum. These regions are generally responsible for predicting sensory outcomes of action generation. Altogether, these results suggest that viewing our own utterances leads to a temporal facilitation of auditory and visual speech integration. Moreover, processing afferent and efferent signals in sensory-motor areas leads to self -awareness during speech perception. Part of this research was supported by a grant from the European Research Council (FP7/2007-2013 Grant Agreement no. 339152, "Speech Unit(e)s")
- typdoc
- Poster communications
- Accès au texte intégral et bibtex
-
- titre
- Through the looking-glass: Neural basis of self representation during speech perception A V AV AVi AV AVi Results -Self
- auteur
- Marc Sato, Avril Treille, Coriandre Emmanuel Vilain, Jean-Luc Schwartz
- article
- 21th Annual Meeting of the Organization for Human Brain Mapping (OHBM), Jun 2015, honolulu, United States
- resume
- Introduction: To recognize one's own face and voice is key for our self-awareness and for our ability to communicate effectively with others. Interestingly, several theories and studies suggest that self-recognition during action observation may partly result from a functional coupling between action and perception systems and a better integration of sensory inputs with our own sensory-motor knowledge (Apps & Tsakiris, 2014). The present fMRI study aimed at further investigating the neural basis of self representation during auditory, visual and audio-visual speech perception. Our working hypothesis was that hearing and/or viewing oneself talk might activate sensory-motor plans to a greater degree than does observing others. Methods: • Participants were 12 healthy adults (25±6years, 9 females). • A total of 1176 stimuli were created. During the scanning session, participants were asked to passively listening and/or viewing auditory (A), visual (V), audio-visual (AV) and incongruent audio-visual (AVi) syllables. Half of the stimuli were related to themselves, the other half to an unknown speaker, and they were either presented with or without noise. In addition, a resting face of the participant or of the unknown speaker, presented with and without acoustic noise, served as baseline. • Functional MRI images were acquired with a sparse-sampling acquisition used to minimize scanner noise (53 axial slices, 3 mm3; TR = 8 sec, delay in TR = 5 sec). • BOLD responses were analyzed using a general linear model, including 16 regressors of interest (4 modalities x 2 speakers x 2 noise levels) and the 4 corresponding baselines (2 speakers x 2 noise levels). A second-level random effect group analysis was carried-out, with the modality, the speaker, and the noise level as within-subject factors and the subjects treated as a random factor. All effects and interactions were calculated with a significance level set at p < .001 uncorrected. Results: • In line with previous brain-imaging studies on multimodal speech perception, the main effect of modality revealed stronger activity in the superior temporal auditory regions during A, AV and AVi compared to V, in the middle temporal visual motion area MT/V5 during V, AV, and AVi compared to A, as well as in the premotor cortices during V, AV and AVi compared to A. The main effect of noise and the modality by noise interaction also showed stronger activity in the primary and secondary auditory cortices for the stimuli presented without noise during A, AV and AVi compared to V. • Crucially, the main effect of the speaker showed stronger activity of the left posterior inferior frontal gyrus as well as of the left cerebellum during the observation of self-related stimuli compared to those related to an unknown speaker. In addition, the speaker by noise interaction revealed stronger activity in the ventral superior parietal lobules and the dorsal extrastriate cortices during the observation of other-related compared to self-related stimuli presented without noise, while the opposite pattern of activity was observed for noisy stimuli. Finally, the speaker by modality interaction showed stronger activity for self-related compared to other-related stimuli during A in the right auditory cortex, as well as stronger activity for other-related compared to self-related stimuli for V, AV and AVi in the left posterior temporal sulcus. Conclusions: Listening and/or viewing oneself talk was found to activate to a greater extent the left posterior inferior frontal gyrus and cerebellum, two regions thought to be responsible for predicting sensory outcomes of action generation constraining perceptual recognition. In addition, activity in associative auditory and visual brain areas was also found to be modulated by the speaker identity depending on the modality of presentation and the acoustic noise. Altogether these results suggest that self-awareness during speech perception is partly driven by afferent and efferent signals in sensory-motor areas.
- typdoc
- Poster communications
- Accès au texte intégral et bibtex
-
2014
Conference papers
- titre
- To integrate the unknown: touching your lips, hearing your tongue, seeing my voice
- auteur
- Avril Treille, Coriandre Emmanuel Vilain, Jean-Luc Schwartz, Marc Sato
- article
- ICAC 2014 - 5th International Conference on Auditory Cortex, Sep 2014, Magdeburg, Germany
- resume
- Seeing the articulatory gestures of the speaker significantly enhances auditory speech perception. A key issue is whether cross-modal speech interactions only depend on well-known auditory and visual modalities or, rather, might also be triggered by other sensory sources less common in speech communication. The present electro-encephalographic (EEG) and functional magnetic resonance imaging (fMRI) studies aimed at investigating cross-modal interactions between auditory, haptic, visuo-facial and visuo-lingual speech signals during the perception of other’s and our own production. In a first EEG study (n=16), auditory evoked potentials were compared during auditory, audio-visual and audio-haptic speech perception through natural dyadic interactions between a listener and a speaker. Shortened latencies and reduced amplitude of early auditory evoked potentials were observed during both audio-visual and audio-haptic speech perception compared to auditory speech perception, providing evidence for early integrative mechanisms between auditory, visual and haptic information. In a second fMRI study (n=12), the neural substrates of cross-modal binding during auditory, visual and audio-visual speech perception in relation to either facial or tongue movements of a speaker (recorded by a camera and an ultrasound system, respectively) were determined. In line with a sensorimotor nature of speech perception, common overlapping activity was observed for both facial and tongue-related speech stimuli in the posterior part of the superior temporal gyrus/sulcus as well as in the premotor cortex and in the inferior frontal gyrus. In a third EEG study (n=17), auditory evoked potentials were compared during the perception of auditory, visual and audio-visual stimuli related to our own speech gestures or those of a stranger. Apart from a reduced amplitude of early auditory evoked potentials during audio-visual compared to auditory and visual speech perception, a self-advantage was also observed with shortened latencies of early auditory evoked potentials for self-related speech stimuli.Altogether our results provide evidence for bimodal interactions between auditory, haptic, visuo-facial and visuo-lingual speech signals. They further emphasize the multimodal nature of speech perception and demonstrate that multisensory speech perception is partly driven by sensory predictability and by the listener’s knowledge of speech production.
- typdoc
- Conference papers
- Accès au texte intégral et bibtex
-