Neural Network based Time-Frequency Masking Separation Algorithm with Spatial Features - Samples

Sound separation samples related to the article "Distant Speech Separation using Predicted Time-Frequency Masks from Spatial Features", Pertilä, Nikunen, Elsevier Speech Communication, 2015, [URL]. The samples are recorded with an 8-channel circular microphone array from two distances (Near = 110 cm) and Far (170 cm) from the array center in two different rooms. Room 1 is a small labotatory, and it has low reverberation (T60=260ms), while Room 2 is a large meeting room with moderate reverberation (T60=370ms). The mixtures are generated by mixing the RMS normalized source signals.

Two speakersMixtureSource 1Source 2
Room 1, Near (1.1 m)
Room 1, Far (1.7 m)
Room 2, Near (1.1 m)
Room 2, Far (1.7 m)

Three speakersMixtureSource 1Source 2Source 3
Room 1, Near (1.1 m)

Room 1, Far (1.7 m)

Room 2, Near (1.1 m)

Room 2, Far (1.7 m)

Audio Slides

Microphone array

The 8-channel 10 cm radius microphone array is used.

Samples from the multitarget tracking based separation method (MTT-SEP) are available here.
© Pasi Pertilä, 2015.