Repository logo
 
Loading...
Profile Picture
Person

Nogueira Leitão Lima Grilo, Ana Margarida

Search Results

Now showing 1 - 10 of 11
  • The BioVisualSpeech corpus of words with sibilants for speech therapy games development
    Publication . Cavaco, Sofia; Guimarães, Isabel; Ascensão, Mariana; Abad, Alberto; Anjos, Ivo; Oliveira, Francisco; Martins, Sofia; Marques, Nuno; Eskenazi, Maxine; Magalhães, João; Grilo, Ana Margarida
    Abstract: In order to develop computer tools for speech therapy that reliably classify speech productions, there is a need for speech production corpora that characterize the target population in terms of age, gender, and native language. Apart from including correct speech productions, in order to characterize the target population, the corpora should also include samples from people with speech sound disorders. In addition, the annotation of the data should include information on the correctness of the speech productions. Following these criteria, we collected a corpus that can be used to develop computer tools for speech and language therapy of Portuguese children with sigmatism. The proposed corpus contains European Portuguese children’s word productions in which the words have sibilant consonants. The corpus has productions from 356 children from 5 to 9 years of age. Some important characteristics of this corpus, that are relevant to speech and language therapy and computer science research, are that (1) the corpus includes data from children with speech sound disorders; and (2) the productions were annotated according to the criteria of speech and language pathologists, and have information about the speech production errors. These are relevant features for the development and assessment of speech processing tools for speech therapy of Portuguese children. In addition, as an illustration on how to use the corpus, we present three speech therapy games that use a convolutional neural network sibilants classifier trained with data from this corpus and a word recognition module trained on additional children data and calibrated and evaluated with the collected corpus.
  • Sibilant consonants classification with deep neural networks
    Publication . Anjos, Ivo; Marques, Nuno; Grilo, Ana Margarida; Guimarães, Isabel; Magalhães, João; Cavaco, Sofia
    Abstract. Many children su ering from speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game that is controlled by the children's voices in real time and that allows children to practice the European Portuguese sibilant consonants. For this, the game uses a sibilant consonant classi er. Since the game does not require any type of adult supervision, children can practice the production of these sounds more often, which may lead to faster improvements of their speech. Recently, the use of deep neural networks has given considerable improvements in classi cation for a variety of use cases, from image classication to speech and language processing. Here we propose to use deep convolutional neural networks to classify sibilant phonemes of European Portuguese in our serious game for speech and language therapy. We compared the performance of several diferent arti cial neural networks that used Mel frequency cepstral coefcients or log Mel lterbanks. Our best deep learning model achieves classi cation scores of 95:48% using a 2D convolutional model with log Mel lterbanks as input features.
  • Detection of voicing and place of articulation of fricatives with deep learning in a virtual speech and language therapy tutor
    Publication . Anjos, Ivo; Maxine, Eskenazi; Marques, Nuno; Grilo, Ana Margarida; Guimarães, Isabel; Magalhães, João; Cavaco, Sofia
    Children with fricative distortion errors have to learn how to correctly use the vocal folds, and which place of articulation to use in order to correctly produce the different fricatives. Here we propose a virtual tutor for fricatives distortion correction. This is a virtual tutor for speech and language therapy that helps children understand their fricative production errors and how to correctly use their speech organs. The virtual tutor uses log Mel filter banks and deep learning techniques with spectral-temporal convolutions of the data to classify the fricatives in children’s speech by place of articulation and voicing. It achieves an accuracy of 90:40% for place of articulation and 90:93% for voicing with children’s speech. Furthermore, this paper discusses a multidimensional advanced data analysis of the first layer convolutional kernel filters that validates the usefulness of performing the convolution on the log Mel filter bank.
  • A criança com respiração oral crónica: emissão de ar nasal, motricidade orofacial e impacto na qualidade de vida
    Publication . Bom, Rita; Nogueira Leitão Lima Grilo, Ana Margarida; Guimarães, Isabel
    Introdução: A obstrução sistemática das vias aéreas superiores (VAS) é frequente em idades pediátricas, tem implicações na emissão de ar nasal e apresenta sintomas como, predomínio da respiração oral e alteração da motricidade orofacial, com impacto na qualidade de vida da criança. Objetivos: Medir a emissão de ar nasal em crianças com obstrução das VAS. Especificamente, verificar a relação entre a emissão de ar nasal, idade, sexo e motricidade orofacial e determinar o impacto dos sintomas nasais na qualidade de vida. Material e Métodos:Estudo transversal exploratório. Foi realizada a avaliação funcional da emissão de ar nasal (recurso a placa metálica), avaliação oromotora (utilizado o Protocolo de Avaliação da motricidade OroFacial, versão 2, PAOF-2) e perceção do impacto dos sintomas nasais na qualidade de vida (versão portuguesa do Nasal Obstruction Symptom Evaluation NOSE). Resultados: Participaram 62 crianças entre os 4;00 e os 9;11 anos. O valor médio total da emissão de ar nasal foi de 8.10 cm2, sem diferenças significativas quanto à idade, mas significativamente inferior no sexo masculino para a narina esquerda. Verificou-se correlação moderada significativa entre a emissão de ar nasal e a motricidade orofacial aos 4 anos e no sexo masculino. As crianças com mais sintomas evidenciaram maior impacto negativo significativo na qualidade de vida do que as com menos sintomas. Conclusão: A obstrução das VAS na criança relacionou-se com a motricidade orofacial tendo efeito na idade (4 anos) e no sexo masculino. O impacto negativo na qualidade de vida relacionou-se com maior número de sintomas.
  • 3D facial video retrieval and management for decision support in speech and language therapy
    Publication . Carrapiço, Ricardo; Guimarães, Isabel; Grilo, Ana Margarida; Cavaco, Sofia; Magalhães, João
    3D video is introducing great changes in many health related areas. The realism of such information provides health professionals with strong evidence analysis tools to facilitate clinical decision processes. Speech and language therapy aims to help subjects in correcting several disorders. The assessment of the patient by the speech and language therapist (SLT), requires several visual and audio analysis procedures that can interfere with the patient's production of speech. In this context, the main contribution of this paper is a 3D video system to improve health information management processes in speech and language therapy. The 3D video retrieval and management system supports multimodal health records and provides the SLTs with tools to support their work in many ways: (i) it allows SLTs to easily maintain a database of patients' orofacial and speech exercises; (ii) supports three-dimensional orofacial measurement and analysis in a non-intrusive way; and (iii) search patient speech-exercises by similar facial characteristics, using facial image analysis techniques. The second contribution is a dataset with 3D videos of patients performing orofacial speech exercises. The whole system was evaluated successfully in a user study involving 22 SLTs. The user study illustrated the importance of the retrieval by similar orofacial speech exercise.
  • A serious mobile game with visual feedback for training sibilant consonants
    Publication . Anjos, Ivo; Grilo, Ana Margarida; Ascensão, Mariana; Guimarães, Isabel; Magalhães, João; Cavaco, Sofia
    Abstract. The distortion of sibilant sounds is a common type of speech sound disorder (SSD) in Portuguese speaking children. Speech and language pathologists (SLP) frequently use the isolated sibilants exercise to assess and treat this type of speech errors. While technological solutions like serious games can help SLPs to motivate the children on doing the exercises repeatedly, there is a lack of such games for this specic exercise. Another important aspect is that given the usual small number of therapy sessions per week, children are not improving at their maximum rate, which is only achieved by more intensive therapy. We propose a serious game for mobile platforms that allows children to practice their isolated sibilants exercises at home to correct sibilant distortions. This will allow children to practice their exercises more frequently, which can lead to faster improvements. The game, which uses an automatic speech recognition (ASR) system to classify the child sibilant productions, is controlled by the child's voice in real time and gives immediate visual feedback to the child about her sibilant productions. In order to keep the computation on the mobile platform as simple as possible, the game has a client-server architecture, in which the external server runs the ASR system. We trained it using raw Mel frequency cepstral coe cients, and we achieved very good results with an accuracy test score of above 91% using support vector machines.
  • Fidedignidade inter e intra-juízes na medição da taxa diadococinética oral em crianças
    Publication . Macedo, Filipa; Grilo, Ana Margarida
    Objetivo: O objetivo deste estudo é o de verificar a fidedignidade intra e inter-juízes na avaliação da taxa diadococinética oral, em dois momentos de avaliação (com duas semanas de intervalo). Métodos: Cinco terapeutas da fala avaliaram registos áudio (através do programa Audacity™ e de auscultadores SENNHEISER HD201) com cinco tarefas diadococinéticas (três ciclos monossilábicos, um ciclo dissilábico e um ciclo trissilábico) de trinta e duas crianças, num primeiro e num segundo momento. Os resultados para a fidedignidade inter-juízes foram obtidos através do Alfa de Cronbach e para a fidedignidade intra-juízes foi utilizado o coeficiente de correlação intraclasse para a obtenção dos resultados. Resultados: Embora os resultados da variável “duração” não tenham sido todos ótimos (α entre 0.54 e 0.98), é possível constatar que nas variáveis “número de sílabas” (α entre 0.96 e 1) e “taxa diadococinésia” (α entre 0.94 e 0.99) existe concordância inter e intra-juízes com qualidade excelente. O coeficiente de correlação intraclasse obteve sobretudo resultados de fidedignidade excelente em todas as variáveis, apresentando também alguns resultados de fidedignidade satisfatória. Conclusão: Os resultados obtidos são discrepantes ao nível da avaliação inter-juízes na variável “duração”, mas foi observado que a “taxa diadococinética” e o “número de ciclos”, seguindo as regras padronizadas no presente estudo, apresentaram uma excelente fidedignidade.
  • A model for sibilant distortion detection in children
    Publication . Anjos, Ivo; Grilo, Ana Margarida; Ascensão, Mariana; Guimarães, Isabel; Magalhães, João; Cavaco, Sofia
    The distortion of sibilant sounds is a common type of speech sound disorder in European Portuguese speaking children. Speech and language pathologists (SLP) use different types of speech production tasks to assess these distortions. One of these tasks consists of the sustained production of isolated sibilants. Using these sound productions, SLPs usually rely on auditory perceptual evaluation to assess the sibilant distortions. Here we propose to use an isolated sibilant machine learning model to help SLPs assessing these distortions. Our model uses Mel frequency cepstral coefficients of the isolated sibilant phones and it was trained with data from 145 children. The analysis of the false negatives detected by the model can give insight into whether the child has a sibilant production distortion. We were able to confirm that there exist some relation between the model classification results and the distortion assessment of professional SLPs. Approximately 66% of the distortion cases identified by the model are confirmed by an SLP as having some sort of distortion or are perceived as being the production of a different sound.
  • Sibilant consonants classification with deep neural networks
    Publication . Anjos, Ivo; Marques, Nuno; Grilo, Ana Margarida; Guimarães, Isabel; Magalhães, João; Cavaco, Sofia
    Abstract. Many children su ering from speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game that is controlled by the children's voices in real time and that allows children to practice the European Portuguese sibilant consonants. For this, the game uses a sibilant consonant classi er. Since the game does not require any type of adult supervision, children can practice the production of these sounds more often, which may lead to faster improvements of their speech. Recently, the use of deep neural networks has given considerable improvements in classi cation for a variety of use cases, from image classication to speech and language processing. Here we propose to use deep convolutional neural networks to classify sibilant phonemes of European Portuguese in our serious game for speech and language therapy. We compared the performance of several diferent arti cial neural networks that used Mel frequency cepstral coefcients or log Mel lterbanks. Our best deep learning model achieves classi cation scores of 95:48% using a 2D convolutional model with log Mel lterbanks as input features.
  • The BioVisualSpeech european portuguese sibilants corpus
    Publication . Grilo, Ana Margarida; Guimarães, Isabel; Ascensão, Mariana; Abad, Alberto; Anjos, Ivo; Magalhães, João; Cavaco, Sofia
    Abstract. The development of reliable speech therapy computer tools that automatically classify speech productions depends on the quality of the speech data set used to train the classi cation algorithms. The data set should characterize the population in terms of age, gender and native language, but it should also have other important properties that characterize the population that is going to use the tool. Thus, apart from including samples from correct speech productions, it should also have samples from people with speech disorders. Also, the annotation of the data should include information on whether the phonemes are correctly or wrongly pronounced. Here, we present a corpus of European Portuguese children's speech data that we are using in the development of speech classi ers for speech therapy tools for Portuguese children. The corpus includes data from children with speech disorders and in which the labelling includes information about the speech production errors. This corpus, which has data from 356 children from 5 to 9 years of age, focuses on the European Portuguese sibilant consonants and can be used to train speech recognition models for tools to assist the detection and therapy of sigmatism.