Automatic transcription of singing, examples

Author: Matti Ryynänen, homepage
Last modified: Wed May 5 13:57:00 EEST 2004

Transcription method

All the melodies are transcribed with the method described in [1]. The method takes an acoustic input file (in .wav format) and produces a MIDI file, including notes with velocity values, a tempo estimate, and a key signature estimate. The tempo is estimated with the method proposed in [2], and the key estimation method was originally proposed in [3], but simplified in [1]. The transcription method uses

In addition, vibrato in sung notes is detected and coded as pitch bend messages in MIDI files. The note velocities are determined by the average of RMS energies (mapped to range 0-1) of the voiced frames during notes (variable E) and compressed with mu-law mapping, log(1 + mu * E) / log( 1 + mu ), where mu = 10.

The following figures of the scores were generated by importing the transcribed MIDI files to the Emagic Logic Audio software and then using the graphic export function of the Logic Audio score editor. Notice that the transcriptions were not manually processed in any way, except that the MIDI sequences were shifted in time (with eighth-note shifts) to match the bar lines with the performance, e.g., in cases of upbeat starts in melodies. The score editor quantises note starts and interprets note lengths by given accuracy. In most cases, an eighth-note and an eighth-note triplet score quantiser was set on. Tempo, key signature, or notes were not modified with the sequencer, i.e., Logic Audio was used only for creating the figures, not the synthesised MIDI files in the following.


[1] M. Ryynänen, "Probabilistic Modelling of Note Events in the Transcription of Monophonic Melodies," Master's thesis, Tampere University of Technology, March 2004. Available: .pdf .ps
[2] A. P. Klapuri, A. J. Eronen, and J. T. Astola, "Analysis of the meter of acoustic music signals," IEEE Transactions on Speech and Audio Processing, to appear.
[3] T. Viitaniemi, A. Klapuri and A. Eronen. "A probabilistic model for the transcription of single-voice melodies," in Proceedings of the 2003 Finnish Signal Processing Symposium, FINSIG'03, May 2003.


In the following examples, acoustic inputs and the corresponding transcriptions (synthesised directly from the MIDI files) are provided in mp3 format.

Song: En etsi valtaa loistoa

Song: Lännen lokari

Song: Brother, can you spare me a dime

Song: Pieni tytön tylleröinen

Song: Over The Rainbow

Song: Oravan pesä

Song: Yesterday

Song: Puhelinlangat laulaa

Song: Hopeinen kuu (Guarda che luna)

Song: Sävel rakkauden

Song: Telefoni afrikassa

Song: Pieni ankanpoikanen

Song: Taivas on sininen ja valkoinen

Song: Brother, can you spare me a dime

Song: Ranskalaiset korot