Supervisions

2017

Organizing acoustic scene excerpts into 2D map with t-SNE

Abstract

The aim of this project was to develop a Python program, that uses t-Distributed Stochastic Neighbor Embedding, ”t-SNE”, to visualize high dimensional audio scene feature vector data as a 2D-map. This can be used to visualize audio scene feature vectors, and see how well the data is separable using the gathered features, and the t-SNE method.

Clients

Tuomas Virtanen, Toni Heittola

2014

Real-Time Audio Analysis

Abstract

The application areas of audio analysis have been gaining popularity over the last decades because of its support to numerous industrial products. In the conventional approaches, audio analysis algorithms, which are based on the pattern recognition approach, are often worked in a non-real-time situation. Most of the non-real-time audio analysis systems are designed for the rapid development, readability and maintainability of code, moreover, provides cross-platform functionality, efficient audio data analysis, and less latency. Hence, these requirements allow the overall development cost with same portability while notably improving the performance of a system. Generally, these programs are written using poor programming styles or using programming languages not suitable for real-time applications such as Java. This implies a high difficulty in changing the existing source code according to real-time requirements hard and tedious work of extending it. This forces research to deal with programming problems instead of speech and audio analysis innovations. In addition to deal with prior issues real-time audio analysis system also provides such platform to test and research the audio analysis algorithms. Purpose of this study is to research APIs that offer a low-latency, high efficiency option for developing real-time audio analysis system. Basic component of pattern recognition are block framing, windowing and mel-frequency cepstral coefficients (MFCC). The presented program is implemented in real-time using efficient APIs such as PortAudio and LibXtract. The program uses mel-frequency cepstral coefficients (MFCC) to process the small frame size of an audio signal without loss of audio signal power for improving the performance of audio analysis system. Audio analysis system can also be used in numerous products, that are not only useful for audio content analysis, audio classiffication, pattern recognition system and music information retrieval. But, also advantageous from practical engineering viewpoint for real-time input applications such as automatic sound event detection system. The system also indicate that it is successfully portable for Linux, Ubuntu, and major platforms for real-time audio input that is usually restricted in some audio analysis systems involves conventional approaches.

Music Video Analysis Using Signal Processing Tools

Abstract

Visual cuts points in music videos are often aligned with the musical beat, and on the higher level with musical structural change points (e.g. chorus-verse). The idea of this study is to investigate this relation more closely by using automatic video cut point detection and automatic musical structure analysis.

Clients

Tuomas Virtanen, Toni Heittola, Joni Kämäräinen, and Katariina Mahkonen

Real-time sound classification system using Python

Abstract

Python has gained recent years wide popularity in the research community. There is already a wide range of pattern recognition related toolbox available for Python. The aim of this project was to investigate possibilities to use Python for acoustic pattern recognition and develop system capable for real-time sound classification.

Clients

Tuomas Virtanen, Toni Heittola

Acoustic context recognition using i-vector

Abstract

The aim of this project was to study i-vector approach for audio context recognition.

Clients

Tuomas Virtanen, Toni Heittola

2013

Semi-supervised musical instrument recognition

Abstract

The application areas of music information retrieval have been gaining popularity over the last decades. Musical instrument recognition is an example of a specific research topic in the field. In this thesis, semi-supervised learning techniques are explored in the context of musical instrument recognition. The conventional approaches employed for musical instrument recognition rely on annotated data, i.e. example recordings of the target instruments with associated information about the target labels in order to perform training. This implies a highly laborious and tedious work of manually annotating the collected training data. The semi-supervised methods enable incorporating additional unannotated data into training. Such data consists of merely the recordings of the instruments and is therefore significantly easier to acquire. Hence, these methods allow keeping the overall development cost at the same level while notably improving the performance of a system. The implemented musical instrument recognition system utilises the mixture model semi-supervised learning scheme in the form of two EM-based algorithms. Furthermore, upgraded versions, namely, the additional labelled data weighting and class-wise retraining, for the improved performance and convergence criteria in terms of the particular classification scenario are proposed. The evaluation is performed on sets consisting of four and ten instruments and yields the overall average recognition accuracy rates of 95.3 and 68.4%, respectively. These correspond to the absolute gains of 6.1 and 9.7% compared to the initial, purely supervised cases. Additional experiments are conducted in terms of the effects of the proposed modifications, as well as the investigation of the optimal relative labelled dataset size. In general, the obtained performance improvement is quite noteworthy, and future research directions suggest to subsequently investigate the behaviour of the implemented algorithms along with the proposed and further extended approaches.

PDF

Classification of the Sounds of Footsteps and Person Identification

Abstract

The sound of footsteps contains a wide range of information about the person producing them. Humans are quite often using this information to identify persons in situations without visual contact. For example, they can tell how fast a person is walking, what kind of shoes a person is wearing, how tall a person is, or even the mood of a person. The combination of these features will make the sounds of the footsteps characteristic for certain person. The aim of the project is to study the automatic classification of the sound of footsteps and see how reliably one can do automatic identification of persons based on it.

Clients

Tuomas Virtanen, and Toni Heittola

Organizing a Database of Sound Samples

Abstract

In modern sample based music production, the management of large sample libraries intuitively is challenging problem. The aim of the project is study various ways to organize sample library according to the acoustic properties of samples.

Clients

Tuomas Virtanen, and Toni Heittola

2012

Automatic Guitar Chord Detection

Abstract

Automatic guitar chord detection is a process that attempts to detect a guitar chord from a piece of audio. Generally, automatic chord detection is considered to be a part of a large problem termed as automatic transcription. Although there has been a lot of research in the field of automatic transcription, but having a reliable transcription system is still a distant prospect. Chord detection becomes interesting as chords have comparatively stable structure and they completely describe the occurring harmonies in a piece of music. This thesis presents a novel approach for detecting the correctness of musical chords played by guitar. The approach is based on pattern matching technique applied to the database of chords and their typical mistakes. Mistakes are the versions of a chord where typical playing errors are made. Transient of a chord is skipped and its spectrum is whitened. A certain region of whitened spectra is chosen as a feature vector. Cosine distance is computed between the extracted features and the data present in a reference chord database. Finally, the system detects the correctness of a played chord based on k-Nearest Neighbor (k-NN) classifier. The developed system uses two types of spectral whitening techniques: one is based on Linear Predictive Coding (LPC) and the other is based on Phase Transform-beta (PHAT-beta). The average accuracy shown by LPC based system is 72% while that of PHAT-beta is 82.5%. The system was also evaluated under different noise conditions.

PDF

Classification of Insects Based on Sound

Abstract

Insect borne diseases kill a million people and destroy tens of billions of euros worth of crops annually. At the same time, beneficial insects pollinate the majority of crop species, and it has been estimated that approximately one third of all food consumed by humans is directly pollinated by bees alone. If we could inexpensively count and classify insects, we could plan interventions more accuracy, thus saving lives in the case of insect vectored disease, and growing more food in the case of insect crop pests. The aim of the project is to classify the insect based on the sound they produce while flying.

Clients

Tuomas Virtanen, and Toni Heittola

2010

Parameter Adaptation in Nonlinear Loudspeaker Models

Abstract

Loudspeaker is a device that converts electric input signal to acoustic output. The most common type of loudspeaker is a moving-coil transducer. The behaviour of a moving-coil transducer can be considered to be linear only when displacement of the coil-diaphragm assembly is small. When input signal level rises, nonlinearities start to cause audible distortion. In this thesis we examine microspeaker, a small loudspeaker used in mobile phones. The electro-mechanical process which converts the electrical signal into sound waves is exaplained. Based on this, we present a continuous-time, linear model of a loudspeaker mounted in a closed box. The model describes the loudspeaker's small-signal behaviour using only few parameters. We then consider the main sources of nonlinearities and how to model them. Two major sources nonlinearities are added to the continuous-time model. Then transformations from continuous-time models to discrete-time models are considered. The nonlinear model is converted to discrete-time while taking into account the properties of the microspeaker. The main purpose of this thesis is to study performance of a algorithm that finds the parameter values of the nonlinear loudspeaker model. Performance of the algorithm is compared to performance of an earlier algorithm for the linear loudspeaker model. The parameter values are found and changes in them are tracked using an adaptive signal processing method called system identification. The parameter values are updated using LMS algorithm. Since the discrete-time mechanical model of the microspeaker is based on a recursive filter, LMS algorithm for recursive filters is presented. We also review previous research related to parameter identification in linear and nonlinear loudspeaker models. Based on results from the experiments the studied algorithm is deemed to be yet incomplete. Linear parameters adapt in general quickly whereas the nonlinear parameters adapt too slowly and sometimes erroneously. The difference between the output predicted by the nonlinear loudspeaker model and the actual output of the loudspeaker (prediction error) is too high, meaning the parameters do not adapt to their true values. The model is also prone to instability. The algorithm requires further development regarding adaptation speed and prevention of instability. Other development considering initial parameter values and operation during silent moments should also be conducted in the future.