Repository logo
 
Loading...
Profile Picture
Person

Nogueira Leitão Lima Grilo, Ana Margarida

Search Results

Now showing 1 - 4 of 4
  • Sibilant consonants classification with deep neural networks
    Publication . Anjos, Ivo; Marques, Nuno; Grilo, Ana Margarida; Guimarães, Isabel; Magalhães, João; Cavaco, Sofia
    Abstract. Many children su ering from speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game that is controlled by the children's voices in real time and that allows children to practice the European Portuguese sibilant consonants. For this, the game uses a sibilant consonant classi er. Since the game does not require any type of adult supervision, children can practice the production of these sounds more often, which may lead to faster improvements of their speech. Recently, the use of deep neural networks has given considerable improvements in classi cation for a variety of use cases, from image classication to speech and language processing. Here we propose to use deep convolutional neural networks to classify sibilant phonemes of European Portuguese in our serious game for speech and language therapy. We compared the performance of several diferent arti cial neural networks that used Mel frequency cepstral coefcients or log Mel lterbanks. Our best deep learning model achieves classi cation scores of 95:48% using a 2D convolutional model with log Mel lterbanks as input features.
  • Detection of voicing and place of articulation of fricatives with deep learning in a virtual speech and language therapy tutor
    Publication . Anjos, Ivo; Maxine, Eskenazi; Marques, Nuno; Grilo, Ana Margarida; Guimarães, Isabel; Magalhães, João; Cavaco, Sofia
    Children with fricative distortion errors have to learn how to correctly use the vocal folds, and which place of articulation to use in order to correctly produce the different fricatives. Here we propose a virtual tutor for fricatives distortion correction. This is a virtual tutor for speech and language therapy that helps children understand their fricative production errors and how to correctly use their speech organs. The virtual tutor uses log Mel filter banks and deep learning techniques with spectral-temporal convolutions of the data to classify the fricatives in children’s speech by place of articulation and voicing. It achieves an accuracy of 90:40% for place of articulation and 90:93% for voicing with children’s speech. Furthermore, this paper discusses a multidimensional advanced data analysis of the first layer convolutional kernel filters that validates the usefulness of performing the convolution on the log Mel filter bank.
  • 3D facial video retrieval and management for decision support in speech and language therapy
    Publication . Carrapiço, Ricardo; Guimarães, Isabel; Grilo, Ana Margarida; Cavaco, Sofia; Magalhães, João
    3D video is introducing great changes in many health related areas. The realism of such information provides health professionals with strong evidence analysis tools to facilitate clinical decision processes. Speech and language therapy aims to help subjects in correcting several disorders. The assessment of the patient by the speech and language therapist (SLT), requires several visual and audio analysis procedures that can interfere with the patient's production of speech. In this context, the main contribution of this paper is a 3D video system to improve health information management processes in speech and language therapy. The 3D video retrieval and management system supports multimodal health records and provides the SLTs with tools to support their work in many ways: (i) it allows SLTs to easily maintain a database of patients' orofacial and speech exercises; (ii) supports three-dimensional orofacial measurement and analysis in a non-intrusive way; and (iii) search patient speech-exercises by similar facial characteristics, using facial image analysis techniques. The second contribution is a dataset with 3D videos of patients performing orofacial speech exercises. The whole system was evaluated successfully in a user study involving 22 SLTs. The user study illustrated the importance of the retrieval by similar orofacial speech exercise.
  • Sibilant consonants classification with deep neural networks
    Publication . Anjos, Ivo; Marques, Nuno; Grilo, Ana Margarida; Guimarães, Isabel; Magalhães, João; Cavaco, Sofia
    Abstract. Many children su ering from speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game that is controlled by the children's voices in real time and that allows children to practice the European Portuguese sibilant consonants. For this, the game uses a sibilant consonant classi er. Since the game does not require any type of adult supervision, children can practice the production of these sounds more often, which may lead to faster improvements of their speech. Recently, the use of deep neural networks has given considerable improvements in classi cation for a variety of use cases, from image classication to speech and language processing. Here we propose to use deep convolutional neural networks to classify sibilant phonemes of European Portuguese in our serious game for speech and language therapy. We compared the performance of several diferent arti cial neural networks that used Mel frequency cepstral coefcients or log Mel lterbanks. Our best deep learning model achieves classi cation scores of 95:48% using a 2D convolutional model with log Mel lterbanks as input features.