Grilo, Ana MargaridaGuimarães, IsabelAscensão, MarianaAbad, AlbertoAnjos, IvoMagalhães, JoãoCavaco, Sofia2022-04-052022-04-052020-03-04http://hdl.handle.net/10400.26/40074Abstract. The development of reliable speech therapy computer tools that automatically classify speech productions depends on the quality of the speech data set used to train the classi cation algorithms. The data set should characterize the population in terms of age, gender and native language, but it should also have other important properties that characterize the population that is going to use the tool. Thus, apart from including samples from correct speech productions, it should also have samples from people with speech disorders. Also, the annotation of the data should include information on whether the phonemes are correctly or wrongly pronounced. Here, we present a corpus of European Portuguese children's speech data that we are using in the development of speech classi ers for speech therapy tools for Portuguese children. The corpus includes data from children with speech disorders and in which the labelling includes information about the speech production errors. This corpus, which has data from 356 children from 5 to 9 years of age, focuses on the European Portuguese sibilant consonants and can be used to train speech recognition models for tools to assist the detection and therapy of sigmatism.engSibilantsEuropean Portuguese corpusSpeech sound disordersThe BioVisualSpeech european portuguese sibilants corpusconference objecthttps://doi.org/10.1007/978-3-030-41505-1_3