Name: | Description: | Size: | Format: | |
---|---|---|---|---|
1.33 MB | Adobe PDF |
Advisor(s)
Abstract(s)
Abstract. The development of reliable speech therapy computer tools that automatically classify speech productions depends on the quality of the speech data set used to train the classi cation algorithms. The
data set should characterize the population in terms of age, gender and native language, but it should also have other important properties that characterize the population that is going to use the tool. Thus, apart
from including samples from correct speech productions, it should also have samples from people with speech disorders. Also, the annotation of the data should include information on whether the phonemes are
correctly or wrongly pronounced. Here, we present a corpus of European Portuguese children's speech data that we are using in the development of speech classi ers for speech therapy tools for Portuguese children. The
corpus includes data from children with speech disorders and in which the labelling includes information about the speech production errors. This corpus, which has data from 356 children from 5 to 9 years of age,
focuses on the European Portuguese sibilant consonants and can be used to train speech recognition models for tools to assist the detection and therapy of sigmatism.
Description
Keywords
Sibilants European Portuguese corpus Speech sound disorders
Citation
Publisher
Springer International Publishing