Loading...
5 results
Search Results
Now showing 1 - 5 of 5
- The BioVisualSpeech corpus of words with sibilants for speech therapy games developmentPublication . Cavaco, Sofia; Guimarães, Isabel; Ascensão, Mariana; Abad, Alberto; Anjos, Ivo; Oliveira, Francisco; Martins, Sofia; Marques, Nuno; Eskenazi, Maxine; Magalhães, João; Grilo, Ana MargaridaAbstract: In order to develop computer tools for speech therapy that reliably classify speech productions, there is a need for speech production corpora that characterize the target population in terms of age, gender, and native language. Apart from including correct speech productions, in order to characterize the target population, the corpora should also include samples from people with speech sound disorders. In addition, the annotation of the data should include information on the correctness of the speech productions. Following these criteria, we collected a corpus that can be used to develop computer tools for speech and language therapy of Portuguese children with sigmatism. The proposed corpus contains European Portuguese children’s word productions in which the words have sibilant consonants. The corpus has productions from 356 children from 5 to 9 years of age. Some important characteristics of this corpus, that are relevant to speech and language therapy and computer science research, are that (1) the corpus includes data from children with speech sound disorders; and (2) the productions were annotated according to the criteria of speech and language pathologists, and have information about the speech production errors. These are relevant features for the development and assessment of speech processing tools for speech therapy of Portuguese children. In addition, as an illustration on how to use the corpus, we present three speech therapy games that use a convolutional neural network sibilants classifier trained with data from this corpus and a word recognition module trained on additional children data and calibrated and evaluated with the collected corpus.
- Detection of voicing and place of articulation of fricatives with deep learning in a virtual speech and language therapy tutorPublication . Anjos, Ivo; Maxine, Eskenazi; Marques, Nuno; Grilo, Ana Margarida; Guimarães, Isabel; Magalhães, João; Cavaco, SofiaChildren with fricative distortion errors have to learn how to correctly use the vocal folds, and which place of articulation to use in order to correctly produce the different fricatives. Here we propose a virtual tutor for fricatives distortion correction. This is a virtual tutor for speech and language therapy that helps children understand their fricative production errors and how to correctly use their speech organs. The virtual tutor uses log Mel filter banks and deep learning techniques with spectral-temporal convolutions of the data to classify the fricatives in children’s speech by place of articulation and voicing. It achieves an accuracy of 90:40% for place of articulation and 90:93% for voicing with children’s speech. Furthermore, this paper discusses a multidimensional advanced data analysis of the first layer convolutional kernel filters that validates the usefulness of performing the convolution on the log Mel filter bank.
- Sibilant consonants classification with deep neural networksPublication . Anjos, Ivo; Marques, Nuno; Grilo, Ana Margarida; Guimarães, Isabel; Magalhães, João; Cavaco, SofiaAbstract. Many children su ering from speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game that is controlled by the children's voices in real time and that allows children to practice the European Portuguese sibilant consonants. For this, the game uses a sibilant consonant classi er. Since the game does not require any type of adult supervision, children can practice the production of these sounds more often, which may lead to faster improvements of their speech. Recently, the use of deep neural networks has given considerable improvements in classi cation for a variety of use cases, from image classication to speech and language processing. Here we propose to use deep convolutional neural networks to classify sibilant phonemes of European Portuguese in our serious game for speech and language therapy. We compared the performance of several diferent arti cial neural networks that used Mel frequency cepstral coefcients or log Mel lterbanks. Our best deep learning model achieves classi cation scores of 95:48% using a 2D convolutional model with log Mel lterbanks as input features.
- The BioVisualSpeech european portuguese sibilants corpusPublication . Grilo, Ana Margarida; Guimarães, Isabel; Ascensão, Mariana; Abad, Alberto; Anjos, Ivo; Magalhães, João; Cavaco, SofiaAbstract. The development of reliable speech therapy computer tools that automatically classify speech productions depends on the quality of the speech data set used to train the classi cation algorithms. The data set should characterize the population in terms of age, gender and native language, but it should also have other important properties that characterize the population that is going to use the tool. Thus, apart from including samples from correct speech productions, it should also have samples from people with speech disorders. Also, the annotation of the data should include information on whether the phonemes are correctly or wrongly pronounced. Here, we present a corpus of European Portuguese children's speech data that we are using in the development of speech classi ers for speech therapy tools for Portuguese children. The corpus includes data from children with speech disorders and in which the labelling includes information about the speech production errors. This corpus, which has data from 356 children from 5 to 9 years of age, focuses on the European Portuguese sibilant consonants and can be used to train speech recognition models for tools to assist the detection and therapy of sigmatism.
- A criança com respiração oral crónica: emissão de ar nasal, motricidade orofacial e impacto na qualidade de vidaPublication . Bom, Rita; Nogueira Leitão Lima Grilo, Ana Margarida; Guimarães, IsabelIntrodução: A obstrução sistemática das vias aéreas superiores (VAS) é frequente em idades pediátricas, tem implicações na emissão de ar nasal e apresenta sintomas como, predomínio da respiração oral e alteração da motricidade orofacial, com impacto na qualidade de vida da criança. Objetivos: Medir a emissão de ar nasal em crianças com obstrução das VAS. Especificamente, verificar a relação entre a emissão de ar nasal, idade, sexo e motricidade orofacial e determinar o impacto dos sintomas nasais na qualidade de vida. Material e Métodos:Estudo transversal exploratório. Foi realizada a avaliação funcional da emissão de ar nasal (recurso a placa metálica), avaliação oromotora (utilizado o Protocolo de Avaliação da motricidade OroFacial, versão 2, PAOF-2) e perceção do impacto dos sintomas nasais na qualidade de vida (versão portuguesa do Nasal Obstruction Symptom Evaluation NOSE). Resultados: Participaram 62 crianças entre os 4;00 e os 9;11 anos. O valor médio total da emissão de ar nasal foi de 8.10 cm2, sem diferenças significativas quanto à idade, mas significativamente inferior no sexo masculino para a narina esquerda. Verificou-se correlação moderada significativa entre a emissão de ar nasal e a motricidade orofacial aos 4 anos e no sexo masculino. As crianças com mais sintomas evidenciaram maior impacto negativo significativo na qualidade de vida do que as com menos sintomas. Conclusão: A obstrução das VAS na criança relacionou-se com a motricidade orofacial tendo efeito na idade (4 anos) e no sexo masculino. O impacto negativo na qualidade de vida relacionou-se com maior número de sintomas.