Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation


Audio examples

Publication info

Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation

Toni Heittola, Anssi Klapuri and Tuomas Virtanen

In International Conference on Music Information Retrieval (ISMIR), 327–332. Kobe, Japan, 2009.

Abstract

The paper proposes a novel approach to musical instrument recognition in polyphonic audio signals by using a source-filter model and an augmented non-negative matrix factorization algorithm for sound separation. The mixture signal is decomposed into a sum of spectral bases modeled as a product of excitations and filters. The excitations are restricted to harmonic spectra and their fundamental frequencies are estimated in advance using a multipitch estimator, whereas the filters are restricted to have smooth frequency responses by modeling them as a sum of elementary functions on the Mel-frequency scale. The pitch and timbre information are used in organizing individual notes into sound sources. The method is evaluated with polyphonic signals, randomly generated from 19 instrument classes.

Acoustic Data

Polyphonic signals are generated by linearly mixing samples of isolated notes from the RWC musical instrument sound database. Nineteen instrument classes are selected for the evaluations (accordion, bassoon, clarinet, contrabass, electric bass, electric guitar, electric piano, flute, guitar, harmonica, horn, oboe, piano piccolo, recorder, saxophone, trombone, trumpet, tuba).

Four-second polyphonic signals are generated by randomly selecting instrument instances and generating random note sequences for them. For each instrument, the first note in a note sequence is taken randomly from the uniform distribution specified by the available notes in the RWC database for the instrument instance. The next notes in the sequence are taken from a normal distribution having a previous note as the mean and the standard deviation being 6 semitones. Unisonal notes are excluded from the note sequence. The notes are randomly truncated to have length between 100 ms and one second. Signals from each instrument are mixed with equal mean-square levels.

Sound Examples

Polyphony 4

Example #1

Mixture

Tracks / No streaming

Tracks / Streaming (est. F0s)

Tracks / Streaming (timbre)

Example #2

Mixture

Tracks / No streaming

Tracks / Streaming (est. F0s)

Tracks / Streaming (timbre)

Example #3

Mixture

Tracks / No streaming

Tracks / Streaming (est. F0s)

Tracks / Streaming (timbre)

Example #4

Mixture

Tracks / No streaming

Tracks / Streaming (est. F0s)

Tracks / Streaming (timbre)

Example #5

Mixture

Tracks / No streaming

Tracks / Streaming (est. F0s)

Tracks / Streaming (timbre)

Example #6

Mixture

Tracks / No streaming

Tracks / Streaming (est. F0s)

Tracks / Streaming (timbre)

Example #7

Mixture

Tracks / No streaming

Tracks / Streaming (est. F0s)

Tracks / Streaming (timbre)

Example #8

Mixture

Tracks / No streaming

Tracks / Streaming (est. F0s)

Tracks / Streaming (timbre)

Polyphony 6

Example #1

Mixture

Tracks / Streaming (timbre)

Example #2

Mixture

Tracks / Streaming (timbre)

Example #3

Mixture

Tracks / Streaming (timbre)

Example #4

Mixture

Tracks / Streaming (timbre)