In this work, we propose a deep learning based method, namely, variational, convolutional recurrent autoencoders (VCRAE), for musical instrument synthesis. This method utilizes the higher level time-frequency representations extracted by the convolutional and recurrent layers to learn a Gaussian distribution in the training stage, which will be later used to infer unique samples through interpolation of multiple instruments in the usage stage. The reconstruction performance of VCRAE is evaluated by proxy through an instrument classifier, and provides significantly better accuracy than two other baseline autoencoder methods.

This companion webpage consists of composition, instrument synthesis, instrument morphing, and reconstruction of the test samples sections. Each section includes the audio samples that are created through the process explained in the section definition. You can open the navigation bar by clicking


Figure 1: VCRAE system overview.



Composition

This is a 30-second version of the song "Speak softly, love", famously known as the love theme from the cult movie "The Godfather". For this experiment, different VCRAEs have been trained for each individual note to be used in the composition, and the notes have been concatenated based on a hand-made manual. For each note in the composition, the manual has an entry that specifies the pitch, the length of the note in seconds, and the latent domain interpolation proportion of both instruments used in the note.

Godfather theme song

Figure 2: Godfather theme song re-composed with morphing instruments. The labels represent the morphing between the instruments. Please note the time tick labels in seconds in the upper part of the figure.



Instrument Synthesis

Below you can find the synthesized instruments. The notes are played at C4.

Accordion (Button)
Banjo
Bassoon (Fagotto)
Cello
Clarinet
Clavinet
Contrabass (Wood bass)
Cornet
Electric Piano (Soft Tone)
English Horn
Horn
Mandolin
Oboe
Pianoforte
Trombone
Trumpet
Viola
Violin


Instrument Morphing

In this section, the instruments morphed with in-sample morphingand the sample sequence morphing methods are presented. In-sample morphing samples consist of 2.4 second samples that start as the first instrument and morphs to the second instrument during the sample. Sample sequence morphing consists of 11 concatenated 2.4 second samples. For each sample, the proportion of the instruments change by 10% steps (i.e. 90%-10%, 80%-20% and so on). Below you can also find the instrument symbol and name mappings.


Table 1: Instrument symbols.

['AC', 'BJ'] insample morphing
['AC', 'BJ'] sample sequence morphing
['AC', 'CB'] insample morphing
['AC', 'CB'] sample sequence morphing
['AC', 'CL'] insample morphing
['AC', 'CL'] sample sequence morphing
['AC', 'CR'] insample morphing
['AC', 'CR'] sample sequence morphing
['AC', 'CV'] insample morphing
['AC', 'CV'] sample sequence morphing
['AC', 'EH'] insample morphing
['AC', 'EH'] sample sequence morphing
['AC', 'EP'] insample morphing
['AC', 'EP'] sample sequence morphing
['AC', 'FG'] insample morphing
['AC', 'FG'] sample sequence morphing
['AC', 'HR'] insample morphing
['AC', 'HR'] sample sequence morphing
['AC', 'MD'] insample morphing
['AC', 'MD'] sample sequence morphing
['AC', 'OB'] insample morphing
['AC', 'OB'] sample sequence morphing
['AC', 'PF'] insample morphing
['AC', 'PF'] sample sequence morphing
['AC', 'TB'] insample morphing
['AC', 'TB'] sample sequence morphing
['AC', 'TR'] insample morphing
['AC', 'TR'] sample sequence morphing
['AC', 'VC'] insample morphing
['AC', 'VC'] sample sequence morphing
['AC', 'VL'] insample morphing
['AC', 'VL'] sample sequence morphing
['AC', 'VN'] insample morphing
['AC', 'VN'] sample sequence morphing
['BJ', 'CB'] insample morphing
['BJ', 'CB'] sample sequence morphing
['BJ', 'CL'] insample morphing
['BJ', 'CL'] sample sequence morphing
['BJ', 'CR'] insample morphing
['BJ', 'CR'] sample sequence morphing
['BJ', 'CV'] insample morphing
['BJ', 'CV'] sample sequence morphing
['BJ', 'EH'] insample morphing
['BJ', 'EH'] sample sequence morphing
['BJ', 'EP'] insample morphing
['BJ', 'EP'] sample sequence morphing
['BJ', 'FG'] insample morphing
['BJ', 'FG'] sample sequence morphing
['BJ', 'HR'] insample morphing
['BJ', 'HR'] sample sequence morphing
['BJ', 'MD'] insample morphing
['BJ', 'MD'] sample sequence morphing
['BJ', 'OB'] insample morphing
['BJ', 'OB'] sample sequence morphing
['BJ', 'PF'] insample morphing
['BJ', 'PF'] sample sequence morphing
['BJ', 'TB'] insample morphing
['BJ', 'TB'] sample sequence morphing
['BJ', 'TR'] insample morphing
['BJ', 'TR'] sample sequence morphing
['BJ', 'VC'] insample morphing
['BJ', 'VC'] sample sequence morphing
['BJ', 'VL'] insample morphing
['BJ', 'VL'] sample sequence morphing
['BJ', 'VN'] insample morphing
['BJ', 'VN'] sample sequence morphing
['CB', 'CL'] insample morphing
['CB', 'CL'] sample sequence morphing
['CB', 'CR'] insample morphing
['CB', 'CR'] sample sequence morphing
['CB', 'CV'] insample morphing
['CB', 'CV'] sample sequence morphing
['CB', 'EH'] insample morphing
['CB', 'EH'] sample sequence morphing
['CB', 'EP'] insample morphing
['CB', 'EP'] sample sequence morphing
['CB', 'FG'] insample morphing
['CB', 'FG'] sample sequence morphing
['CB', 'HR'] insample morphing
['CB', 'HR'] sample sequence morphing
['CB', 'MD'] insample morphing
['CB', 'MD'] sample sequence morphing
['CB', 'OB'] insample morphing
['CB', 'OB'] sample sequence morphing
['CB', 'PF'] insample morphing
['CB', 'PF'] sample sequence morphing
['CB', 'TB'] insample morphing
['CB', 'TB'] sample sequence morphing
['CB', 'TR'] insample morphing
['CB', 'TR'] sample sequence morphing
['CB', 'VC'] insample morphing
['CB', 'VC'] sample sequence morphing
['CB', 'VL'] insample morphing
['CB', 'VL'] sample sequence morphing
['CB', 'VN'] insample morphing
['CB', 'VN'] sample sequence morphing
['CL', 'CR'] insample morphing
['CL', 'CR'] sample sequence morphing
['CL', 'CV'] insample morphing
['CL', 'CV'] sample sequence morphing
['CL', 'EH'] insample morphing
['CL', 'EH'] sample sequence morphing
['CL', 'EP'] insample morphing
['CL', 'EP'] sample sequence morphing
['CL', 'FG'] insample morphing
['CL', 'FG'] sample sequence morphing
['CL', 'HR'] insample morphing
['CL', 'HR'] sample sequence morphing
['CL', 'MD'] insample morphing
['CL', 'MD'] sample sequence morphing
['CL', 'OB'] insample morphing
['CL', 'OB'] sample sequence morphing
['CL', 'PF'] insample morphing
['CL', 'PF'] sample sequence morphing
['CL', 'TB'] insample morphing
['CL', 'TB'] sample sequence morphing
['CL', 'TR'] insample morphing
['CL', 'TR'] sample sequence morphing
['CL', 'VC'] insample morphing
['CL', 'VC'] sample sequence morphing
['CL', 'VL'] insample morphing
['CL', 'VL'] sample sequence morphing
['CL', 'VN'] insample morphing
['CL', 'VN'] sample sequence morphing
['CR', 'CV'] insample morphing
['CR', 'CV'] sample sequence morphing
['CR', 'EH'] insample morphing
['CR', 'EH'] sample sequence morphing
['CR', 'EP'] insample morphing
['CR', 'EP'] sample sequence morphing
['CR', 'FG'] insample morphing
['CR', 'FG'] sample sequence morphing
['CR', 'HR'] insample morphing
['CR', 'HR'] sample sequence morphing
['CR', 'MD'] insample morphing
['CR', 'MD'] sample sequence morphing
['CR', 'OB'] insample morphing
['CR', 'OB'] sample sequence morphing
['CR', 'PF'] insample morphing
['CR', 'PF'] sample sequence morphing
['CR', 'TB'] insample morphing
['CR', 'TB'] sample sequence morphing
['CR', 'TR'] insample morphing
['CR', 'TR'] sample sequence morphing
['CR', 'VC'] insample morphing
['CR', 'VC'] sample sequence morphing
['CR', 'VL'] insample morphing
['CR', 'VL'] sample sequence morphing
['CR', 'VN'] insample morphing
['CR', 'VN'] sample sequence morphing
['CV', 'EH'] insample morphing
['CV', 'EH'] sample sequence morphing
['CV', 'EP'] insample morphing
['CV', 'EP'] sample sequence morphing
['CV', 'FG'] insample morphing
['CV', 'FG'] sample sequence morphing
['CV', 'HR'] insample morphing
['CV', 'HR'] sample sequence morphing
['CV', 'MD'] insample morphing
['CV', 'MD'] sample sequence morphing
['CV', 'OB'] insample morphing
['CV', 'OB'] sample sequence morphing
['CV', 'PF'] insample morphing
['CV', 'PF'] sample sequence morphing
['CV', 'TB'] insample morphing
['CV', 'TB'] sample sequence morphing
['CV', 'TR'] insample morphing
['CV', 'TR'] sample sequence morphing
['CV', 'VC'] insample morphing
['CV', 'VC'] sample sequence morphing
['CV', 'VL'] insample morphing
['CV', 'VL'] sample sequence morphing
['CV', 'VN'] insample morphing
['CV', 'VN'] sample sequence morphing
['EH', 'EP'] insample morphing
['EH', 'EP'] sample sequence morphing
['EH', 'FG'] insample morphing
['EH', 'FG'] sample sequence morphing
['EH', 'HR'] insample morphing
['EH', 'HR'] sample sequence morphing
['EH', 'MD'] insample morphing
['EH', 'MD'] sample sequence morphing
['EH', 'OB'] insample morphing
['EH', 'OB'] sample sequence morphing
['EH', 'PF'] insample morphing
['EH', 'PF'] sample sequence morphing
['EH', 'TB'] insample morphing
['EH', 'TB'] sample sequence morphing
['EH', 'TR'] insample morphing
['EH', 'TR'] sample sequence morphing
['EH', 'VC'] insample morphing
['EH', 'VC'] sample sequence morphing
['EH', 'VL'] insample morphing
['EH', 'VL'] sample sequence morphing
['EH', 'VN'] insample morphing
['EH', 'VN'] sample sequence morphing
['EP', 'FG'] insample morphing
['EP', 'FG'] sample sequence morphing
['EP', 'HR'] insample morphing
['EP', 'HR'] sample sequence morphing
['EP', 'MD'] insample morphing
['EP', 'MD'] sample sequence morphing
['EP', 'OB'] insample morphing
['EP', 'OB'] sample sequence morphing
['EP', 'PF'] insample morphing
['EP', 'PF'] sample sequence morphing
['EP', 'TB'] insample morphing
['EP', 'TB'] sample sequence morphing
['EP', 'TR'] insample morphing
['EP', 'TR'] sample sequence morphing
['EP', 'VC'] insample morphing
['EP', 'VC'] sample sequence morphing
['EP', 'VL'] insample morphing
['EP', 'VL'] sample sequence morphing
['EP', 'VN'] insample morphing
['EP', 'VN'] sample sequence morphing
['FG', 'HR'] insample morphing
['FG', 'HR'] sample sequence morphing
['FG', 'MD'] insample morphing
['FG', 'MD'] sample sequence morphing
['FG', 'OB'] insample morphing
['FG', 'OB'] sample sequence morphing
['FG', 'PF'] insample morphing
['FG', 'PF'] sample sequence morphing
['FG', 'TB'] insample morphing
['FG', 'TB'] sample sequence morphing
['FG', 'TR'] insample morphing
['FG', 'TR'] sample sequence morphing
['FG', 'VC'] insample morphing
['FG', 'VC'] sample sequence morphing
['FG', 'VL'] insample morphing
['FG', 'VL'] sample sequence morphing
['FG', 'VN'] insample morphing
['FG', 'VN'] sample sequence morphing
['HR', 'MD'] insample morphing
['HR', 'MD'] sample sequence morphing
['HR', 'OB'] insample morphing
['HR', 'OB'] sample sequence morphing
['HR', 'PF'] insample morphing
['HR', 'PF'] sample sequence morphing
['HR', 'TB'] insample morphing
['HR', 'TB'] sample sequence morphing
['HR', 'TR'] insample morphing
['HR', 'TR'] sample sequence morphing
['HR', 'VC'] insample morphing
['HR', 'VC'] sample sequence morphing
['HR', 'VL'] insample morphing
['HR', 'VL'] sample sequence morphing
['HR', 'VN'] insample morphing
['HR', 'VN'] sample sequence morphing
['MD', 'OB'] insample morphing
['MD', 'OB'] sample sequence morphing
['MD', 'PF'] insample morphing
['MD', 'PF'] sample sequence morphing
['MD', 'TB'] insample morphing
['MD', 'TB'] sample sequence morphing
['MD', 'TR'] insample morphing
['MD', 'TR'] sample sequence morphing
['MD', 'VC'] insample morphing
['MD', 'VC'] sample sequence morphing
['MD', 'VL'] insample morphing
['MD', 'VL'] sample sequence morphing
['MD', 'VN'] insample morphing
['MD', 'VN'] sample sequence morphing
['OB', 'PF'] insample morphing
['OB', 'PF'] sample sequence morphing
['OB', 'TB'] insample morphing
['OB', 'TB'] sample sequence morphing
['OB', 'TR'] insample morphing
['OB', 'TR'] sample sequence morphing
['OB', 'VC'] insample morphing
['OB', 'VC'] sample sequence morphing
['OB', 'VL'] insample morphing
['OB', 'VL'] sample sequence morphing
['OB', 'VN'] insample morphing
['OB', 'VN'] sample sequence morphing
['PF', 'TB'] insample morphing
['PF', 'TB'] sample sequence morphing
['PF', 'TR'] insample morphing
['PF', 'TR'] sample sequence morphing
['PF', 'VC'] insample morphing
['PF', 'VC'] sample sequence morphing
['PF', 'VL'] insample morphing
['PF', 'VL'] sample sequence morphing
['PF', 'VN'] insample morphing
['PF', 'VN'] sample sequence morphing
['TB', 'TR'] insample morphing
['TB', 'TR'] sample sequence morphing
['TB', 'VC'] insample morphing
['TB', 'VC'] sample sequence morphing
['TB', 'VL'] insample morphing
['TB', 'VL'] sample sequence morphing
['TB', 'VN'] insample morphing
['TB', 'VN'] sample sequence morphing
['TR', 'VC'] insample morphing
['TR', 'VC'] sample sequence morphing
['TR', 'VL'] insample morphing
['TR', 'VL'] sample sequence morphing
['TR', 'VN'] insample morphing
['TR', 'VN'] sample sequence morphing
['VC', 'VL'] insample morphing
['VC', 'VL'] sample sequence morphing
['VC', 'VN'] insample morphing
['VC', 'VN'] sample sequence morphing
['VL', 'VN'] insample morphing
['VL', 'VN'] sample sequence morphing


Reconstruction of the test samples

asd

BJ original
BJ reconstructed
BJ reconstructed with oracle phase
TR original
TR reconstructed
TR reconstructed with oracle phase
OB original
OB reconstructed
OB reconstructed with oracle phase
HR original
HR reconstructed
HR reconstructed with oracle phase
AC original
AC reconstructed
AC reconstructed with oracle phase
PF original
PF reconstructed
PF reconstructed with oracle phase
FG original
FG reconstructed
FG reconstructed with oracle phase
EH original
EH reconstructed
EH reconstructed with oracle phase
PF original
PF reconstructed
PF reconstructed with oracle phase
CV original
CV reconstructed
CV reconstructed with oracle phase
VC original
VC reconstructed
VC reconstructed with oracle phase
EP original
EP reconstructed
EP reconstructed with oracle phase
VL original
VL reconstructed
VL reconstructed with oracle phase
HR original
HR reconstructed
HR reconstructed with oracle phase
VN original
VN reconstructed
VN reconstructed with oracle phase
MD original
MD reconstructed
MD reconstructed with oracle phase
CB original
CB reconstructed
CB reconstructed with oracle phase