Method for Creating Location-Specific Audio Textures


Audio examples

Publication info

Method for Creating Location-Specific Audio Textures

Toni Heittola, Annamaria Mesaros, Dani Korpi, Antti Eronen and Tuomas Virtanen

EURASIP Journal on Audio, Speech and Music Processing, 2014

Abstract

An approach for creating location-specific audio textures by reusing audio recorded at the given location is proposed. The approach provides a cost-effective way to produce versatile an audio signal to represent the location for virtual location-exploration services. This avoids the problem of collecting a lot of audio from each location, which is reguired in order that the audio is non-repetitive and conserves the location-specific characteristics of each auditory scene. The method consists of two main steps: analysis and synthesis. In the analysis stage, the source audio recording is segmented into homogeneous segments. In the synthesis stage, the audio texture is created by randomly drawing segments from the source audio so that the consecutive segments will have timbral similarity near the segment boundaries. Results obtained in a listening experiments show that there is no statistically significant difference in the audio quality or location-specificity of audio when the created audio textures are compared to excerpts of the original recordings.

Audio Texture

By an audio texture we mean a new, unique, audio signal created based on source audio signal acquired from a certain location. The texture should provide an audio representation of the location from which the source signal comes from, by having the same generic acoustic properties and types of sound events that are characteristic to the location.

The proposed audio texture creation method consists of two stages: analysis of the source audio signal and synthesis of the audio texture. The analysis stage performs clustering and segmentation of the source audio in an unsupervised manner. The goal of the clustering analysis is to automatically find homogeneous audio segments from the source signal. Ideally, the segments would be representative of individual sound events from the scene. In the synthesis stage, an audio texture is constructed by shuffling and concatenating these segments. The shuffling will be done in a way that takes into account timbral continuity at the segment boundaries.

Demonstration

Examples of generated audio texture for various locations are presented. Open demonstration by clicking image on the left. Segments used for the syhthesis are presented in the lower panel.

Listening Tests

Listening tests were conducted to evaluate the quality of the synthesized audio textures and the possibility of using synthesized audio textures for representing the auditory scene of various locations.

Examples of audio samples used in the listening tests are available below.

Pub

Sample 1

Real sample

Synthesized Audio Texture

Sample 2

Real sample

Synthesized Audio Texture

Restaurant

Sample 1

Real sample

Synthesized Audio Texture

Sample 2

Real sample

Synthesized Audio Texture

Street

Sample 1

Real sample

Synthesized Audio Texture

Sample 2

Real sample

Synthesized Audio Texture

Track & Field Stadium

Sample 1

Real sample

Synthesized Audio Texture

Sample 2

Real sample

Synthesized Audio Texture