Repository logo
 
Loading...
Profile Picture

Search Results

Now showing 1 - 9 of 9
  • A Comparison Study of Deep Learning Methodologies for Music Emotion Recognition
    Publication . Louro, Pedro; Redinho, Hugo; Malheiro, Ricardo; Paiva, Rui Pedro; Panda, Renato
    Classical machine learning techniques have dominated Music Emotion Recognition. However, improvements have slowed down due to the complex and time-consuming task of handcrafting new emotionally relevant audio features. Deep learning methods have recently gained popularity in the field because of their ability to automatically learn relevant features from spectral representations of songs, eliminating such necessity. Nonetheless, there are limitations, such as the need for large amounts of quality labeled data, a common problem in MER research. To understand the effectiveness of these techniques, a comparison study using various classical machine learning and deep learning methods was conducted. The results showed that using an ensemble of a Dense Neural Network and a Convolutional Neural Network architecture resulted in a state-of-the-art 80.20% F1 score, an improvement of around 5% considering the best baseline results, concluding that future research should take advantage of both paradigms, that is, combining handcrafted features with feature learning.
  • Improving Deep Learning Methodologies for Music Emotion Recognition
    Publication . Louro, Pedro Lima; Redinho, Hugo; Malheiro, Ricardo; Paiva, Rui Pedro; Panda, Renato
    Music Emotion Recognition (MER) has traditionally relied on classical machine learning techniques. Progress on these techniques has plateaued due to the demanding process of crafting new, emotionally-relevant audio features. Recently, deep learning (DL) methods have surged in popularity within MER, due to their ability of automatically learning features from the input data. Nonetheless, these methods need large, high-quality labeled datasets, a well-known hurdle in MER studies. We present a comparative study of various classical and DL techniques carried out to evaluate these approaches. Most of the presented methodologies were developed by our team, if not stated otherwise. It was found that a combination of Dense Neural Networks (DNN) and Convolutional Neural Networks (CNN) achieved an 80.20% F1-score, marking an improvement of approximately 5% over the best previous results. This indicates that future research should blend both manual feature engineering and automated feature learning to enhance results.
  • "Back in my day...": A Preliminary Study on the Differences in Generational Groups Perception of Musically-evoked Emotion
    Publication . Louro, Pedro; Panda, Renato
    The increasingly globalized world we live in today and the wide availability of music at our fingertips have led to more diverse musical tastes within younger generations than in older generations. Moreover, these disparities are still not well understood, and the extent to which they affect listeners' preferences and perception of music. Focusing on the latter, this study explores the differences in emotional perception of music between the Millennials and Gen Z generations. Interviews were conducted with six participants equally distributed between both generations by recording their listening experience and emotion perception on two previously compiled sets of songs representing each group. Significant differences between generations and possible contributing factors were found in the analysis of the conducted interviews. Findings point to differences in the perception of energy of songs with specific messages of suffering for love, as well as a tendency from the younger group to perceive a well-defined emotion in songs representing their generation in contrast to neutral responses from the other group. These findings are preliminary, and further studies are needed to understand their extent. Nevertheless, valuable insights can be extracted to improve music recommendation systems.
  • MERGE App: A Prototype Software for Multi-User Emotion-Aware Music Management
    Publication . Louro, Pedro; Branco, Guilherme; Redinho, Hugo; Santos, Ricardo Correia Nascimento Dos; Malheiro, Ricardo; Panda, Renato; Paiva, Rui Pedro
    We present a prototype software for multi-user music library management using the perceived emotional content of songs. The tool offers music playback features, song filtering by metadata, and automatic emotion prediction based on arousal and valence, with the possibility of personalizing the predictions by allowing each user to edit these values based on their own emotion assessment. This is an important feature for handling both classification errors and subjectivity issues, which are inherent aspects of emotion perception. A path-based playlist generation function is also implemented. A multi-modal audio-lyrics regression methodology is proposed for emotion prediction, with accompanying validation experiments on the MERGE dataset. The results obtained are promising, showing higher overall performance on train-validate-test splits (73.20% F1-score with the best dataset/split combination).
  • Exploring Song Segmentation for Music Emotion Variation Detection
    Publication . Ferreira, Tomas; Redinho, Hugo; Louro, Pedro L.; Malheiro, Ricardo; Paiva, Rui Pedro; Panda, Renato
    This paper evaluates the impact of song segmentation on Music Emotion Variation Detection (MEVD). In particular, the All-In-One song-structure segmentation system was employed to this end and compared to a fixed 1.5-sec window approach. Acoustic features were extracted for each obtained segment/window, which were classified with SVMs. The attained results (best F1-score of 55.9%) suggest that, despite its promise, the potential of this song segmentation approach was not fully exploited, possibly due to the small employed dataset. Nevertheless, preliminary results are encouraging.
  • Exploring Deep Learning Methodologies for Music Emotion Recognition
    Publication . Louro, Pedro; Redinho, Hugo; Malheiro, Ricardo; Paiva, Rui Pedro; Panda, Renato
    Classical machine learning techniques have dominated Music Emotion Recognition (MER). However, improvements have slowed down due to the complex and time-consuming task of handcrafting new emotionally relevant audio features. Deep Learning methods have recently gained popularity in the field because of their ability to automatically learn relevant features from spectral representations of songs, eliminating such necessity. Nonetheless, there are limitations, such as the need for large amounts of quality labeled data, a common problem in MER research. To understand the effectiveness of these techniques, a comparison study using various classical machine learning and deep learning methods was conducted. The results showed that using an ensemble of a Dense Neural Network and a Convolutional Neural Network architecture resulted in a state-of-the-art 80.20% F1-score, an improvement of around 5% considering the best baseline results, concluding that future research should take advantage of both paradigms, that is, conbining handcrafted features with feature learning.
  • BEE-MER: Bimodal Embeddings Ensemble for Music Emotion Recognition
    Publication . Lima Louro, Pedro Miguel; Ribeiro, Tiago F. R.; Malheiro, Ricardo; Panda, Renato; Pinto de Carvalho e Paiva, Rui Pedro
    Static music emotion recognition systems typically focus on audio for classification, although some research has explored the potential of analyzing lyrics as well. Both approaches face challenges when it comes to accurately discerning emotions that have similar energy but differing valence, and vice versa, depending on the modality used. Previous studies have introduced bimodal audio-lyrics systems that outperform single-modality solutions by combining information from standalone systems and conducting joint classification. In this study, we propose and compare two bimodal approaches: one strictly based on embedding models (audio and word embeddings) and another one following a standard spectrogram-based deep learning method for the audio part. Additionally, we explore various information fusion strategies to leverage both modalities effectively. The main conclusions of this work are the following: i) the two approaches show comparable overall classification performance; ii) the embedding-only approach leads to a higher confusion between quadrants 3 and 4 of Russell’s circumplex model; iii) and this approach requires significantly less computational cost for training. We discuss the insights gained from the approaches we experimented with and highlight promising avenues for future research.
  • Improving Music Emotion Recognition by Leveraging Handcrafted and Learned Features
    Publication . Lima Louro, Pedro Miguel; Redinho, Hugo; Malheiro, Ricardo; Panda, Renato; Pinto de Carvalho e Paiva, Rui Pedro
    Music Emotion Recognition was dominated by classical machine learning, which relies on traditional classifiers and feature engineering (FE). Recently, deep learning approaches have been explored, aiming to remove the need for handcrafted features by automatic feature learning (FL), albeit at the expense of requiring large volumes of data to fully exploit their capabilities. A hybrid approach fusing information from handcrafted and learned features was previously proposed, outperforming separate FE and FL approaches on the 4QAED dataset (900 audio clips). The results suggested that, in smaller datasets, FE and FL could complement each other rather than act as competitors. In the present study, these experiments are extended to the larger MERGE dataset (3554 audio clips) to analyze the impact of the significant increase in data. The best obtained results, 77.62% F1-score, continue to surpass the standalone FE and FL paradigms, reinforcing the potential of hybrid approaches
  • Percussion and Instrumentation in Music Emotion Recognition: a Feature Engineering Approach
    Publication . Redinho, Hugo; Lima Louro, Pedro Miguel; Santos, André C.; Malheiro, Ricardo; Pinto de Carvalho e Paiva, Rui Pedro; Panda, Renato
    We propose a new set of features for audio-based Music Emotion Recognition (MER) that are related to percussion and individual instrument information. One limitation of current feature engineering approaches in MER is that they primarily focus on melodic elements. However, the percussive elements and instrumentation are also essential for conveying and recognizing emotions in music. Our approach leverages the Demucs framework for music source separation (which enables drum channel separation) and the MT3 framework for automatic music transcription and instrument recognition. Building on the results of these frameworks, we created a new set of features that primarily capture information about musical texture, rhythm, dynamics, expressivity, tone color, and musical form. To validate our work, we utilized the MERGE dataset, which comprises over 3000 30-second audio clips annotated with Russell's emotion quadrants. To evaluate the impact of the new features, we compared classification results with those obtained using current state-of-the-art features, demonstrating statistically significant improvements in F1 score (from 71.1\% to 74.2\%). Moreover, the novel features helped to reduce the confusion between quadrants 3 and 4 (a common difficulty in MER models). The most significant finding of the present study is the impact of separately analyzing the drum channel, whose features proved particularly relevant.