SGN-4106 Speech Recognition, 5 cp

Spring 2012, 4. period

Contents

This course teaches the basic principles behind current speech recognition systems, particularly the probabilistic hidden Markov model (HMM) paradigm in detail. HMMs are used to model the spectrum of speech sounds and they have some nice features for use with speech: they can be concatenated, they are probabilistic and can they have efficient algorithms for their parameter estimation and they can be extended to model coarticulation and they can be combined relatively easily with language models. However, they also make some assumptions on speech data which are not valid, notably the requirement of conditional independence of the observations. We will look in detail at the definition of the model, the calculation of all relevant probabilities, the search for the optimal path (the Viterbi algorithm), training of the models and their use in a large-vocabulary speech recognition system. If we have time, we will also look at speaker adaptation methods.

Lectures

Time and place:

First lecture on Tuesday 13.3.2012.

Teacher: Tuomas Virtanen, firstname.lastname@tut.fi, room TF311.

Exercises

There will 2 groups, you can go to whichever is suitable for you. Time and place will be decided later:
Completion of 20% of the exercises is compulsory for passing the course.

Exams

The exams will cover all the topics discussed in the lectures or excercies. Most of the lecture material below was covered in the lectures, but some areas such as phonetics and human auditory system were not discussed in detail. Within these areas it suffices to study topics that are relevant in automatic speech recognition. You can use your own calculator in the exam (standard scientific calculator, not a programmable one).

Course material

Resources, additional material