Automatic Transcription of Music Anssi Klapuri

Table of contents (MSc thesis)

1 Introduction 1
2 Literature Review 3
2.1 Methods 3
2.2 Published transcription systems 4
2.3 Related work5
2.4 Roadmap to the most important references6
2.5 Commercial products 8
3 Decomposition of the Transcription Problem 9
3.1 Relevance of the note as a representation symbol 9
3.2 Components of a transcription system 10
3.3 Mid-level representation 10
3.4 Reasons for rejecting the correlogram and choosing sinusoid tracks 12
3.5 Extracting sinusoid tracks in the frequency domain 14
3.6 Selected approach and design philosophy 15
4 Sound Onset Detection and Rhythm Tracking 17
4.1 Psychoacoustic bounds 17
4.2 Onset time detection scheme 17
4.3 Procedure validation 20
4.4 Rhythmic structure 21
5 Tracking the Fundamental Frequency 23
5.1 Harmonic sounds 23
5.2 Time domain methods 24
5.3 Frequency domain methods 25
5.4 Utilizing prior knowledge of the sound 26
5.5 Detection of multiple pitches 26
5.6 Comment 29
6 Number Theoretical Means of Resolving a Mixture of Harmonic Sounds 30
6.1 Feature of a sound 30
6.2 The basic problem in resolving a mixture of harmonic sounds 31
6.3 Certain principles governing Western music 32
6.4 Prime number harmonics: an important result 34
6.5 Dealing with outlier values 35
6.6 Generalization of the result and overcoming its defects 37
6.7 WOS filter to implement the desired sample selection probabilities 44
6.8 Extracting features that cannot be associated with a single harmonic 45
6.9 Feature `subtraction' principle 46
7 Applying the New Method to the Automatic Transcription of Polyphonic Music 48
7.1 Goal of the system 48
7.2 Approach and design philosophy 48
7.3 Overview of the system 48
7.4 Filter kernel creation process 49
7.5 Tone model process 50
7.6 Transcription process 54
7.7 Simulations 59
7.8 Comparison with a reference system utilizing straighforward pattern recognition 65
7.9 Conclusion 67
7.10 Future additions 67
8 Top-Down Processing 68
8.1 Shortcomings of pure bottom-up systems 69
8.2 Top-down phenomena in hearing 69
8.3 Utilization of context 70
8.4 Instrument models 71
8.5 Sequential dependencies in music 73
8.6 Simultaneity dependencies 75
8.7 How to use the dependencies in transcription 76
8.8 Implementation architectures 77
9 Conclusions 79
  References 80



Last modified: Tue Dec 11 13:58:48 EET 2001 - Anssi Klapuri, klap @ cs tut fi