Auditory Scene Synthesis


Virtual location-exploration services, such as Here Maps 3D and Google Street View, provide ways to virtually explore different locations on the globe. In these services, a user is able to see a 360-degree view at various locations. Currently, all the available location-exploration services provide only visual description of the location, however, the studies have shown that presenting visual and audio information together enhances perception of the location. Adding audio ambiance for the specific location, even having different ambiances depending on the time of the day, will enhance the user experience, giving the service a more 'real' feeling. In addition to this, audio ambiance would provide a rich source of information characterizing a location. A location could be e.g a busy street corner in New York or a quiet back alley in Tokyo.

One of the reasons for not using audio information in location-exploration services is the cost of collecting a comprehensive audio database: to record audio for such a service, one would need to stay at each location for a certain period of time in order to collect enough audio data for non-repeating audio ambiance. Current data collection methods involve driving by with a car while recording images and other type of information. Such method is not suitable for collecting audio material, as the sound produced by the car would interfere with the audio content. As crowdsourced data is a feasible solution to data collection, it is attractive to be able to create location-specific audio ambiances in a cost-effective way, using a small amount of audio. We have researched cost-effective means for obtaining location-specific audio ambiances.


Open the demos by clicking the image on the left.

Audio Textures

Here are demo samples created using method proposed in [Heittola2014]. Open the sample page by clicking the image on the left.

Audio Textures

We have proposed a method [Heittola2014] to create a new arbitrary long and representative audio ambiance for a location by using a small amount of audio recorded at the exact location. The method aims to retain the location-specific characteristics of the auditory scene while providing a new and versatile audio signal to represent the location.

The steps involved in the method are the segmentation of the source audio signal into homogeneous segments, finding a new segment sequence with smooth timbral transitions, and creation of a new signal by concatenating segments in such a way that the segment boundaries do not create perceivable cuts. This new generated signal is an audio texture. The audio texture is constructed from the segments of the source audio signal by repeating them from time to time but never in their original order, resulting in a continuous, infinitely varying stream of audio. Despite the repeated segments, the audio texture should have a non-repetitive feel to the listener. In order to create location-specific audio texture, the source signal has to be recorded at the given location.

Overview of audio texture creation application. Figure 1. Overview of audio texture creation application.