Combining Pitch-Based Inference and Non-Negative Spectrogram Factorization in Separating Vocals from Polyphonic Music

The following audio examples demonstrate the performance of the algorithm presented at the SAPA workshop. Vocal signals were separated from each mixture in blind manner using the proposed method and also a reference method based on sinusoidal modeling. The following randomily selected signals illustrate cases where the pitch estimation has been succesfull. The performance of both methods decreases slightly if there are pitch estimation errors (some minor errors can be heard in the samples). The signals are excerpts from the RWC Music Database. The paper can be found from here.

mixture signal
separated vocals (proposed method) separated vocals (binary masking) separated vocals (sinusoidal modeling)
musicorig1.wav musicsep1.wav binarysep1.wav musicsep1sin.wav
musicorig2.wav musicsep2.wav binarysep2.wav musicsep2sin.wav
musicorig3.wav musicsep3.wav binarysep3.wav musicsep3sin.wav
musicorig4.wav musicsep4.wav binarysep4.wav musicsep4sin.wav
musicorig5.wav musicsep5.wav binarysep5.wav musicsep5sin.wav
musicorig6.wav musicsep6.wav binarysep6.wav musicsep6sin.wav
musicorig7.wav musicsep7.wav binarysep7.wav musicsep7sin.wav

The proposed method was also briefly tested in separating speech from background music.
mixture signal
separated vocals (proposed method)
speechorig1.wav speechsep1.wav
speechorig2.wav speechsep2.wav
speechorig3.wav speechsep3.wav

Demonstrations main page

- Tuomas Virtanen,