SGN-4106 Speech Recognition, 5 cp
Spring 2012, 4. period
Contents
This course teaches the basic principles behind current speech recognition systems, particularly the
probabilistic hidden Markov model (HMM) paradigm in detail.
HMMs are used to model the spectrum of speech sounds and they have some nice features for use with
speech: they can be concatenated,
they are probabilistic and can they have efficient algorithms for their parameter estimation and
they can be extended to model
coarticulation and they can be combined relatively easily with language models.
However, they also make some assumptions on speech data which are not valid, notably the requirement
of conditional independence of the observations.
We will look in detail at the definition of the model, the calculation of all relevant
probabilities, the search for the optimal path (the Viterbi algorithm),
training of the models and their use in a large-vocabulary speech recognition system.
If we have time, we will also look at speaker adaptation methods.
Lectures
Time and place:
- Tuesday 14-16, TB 223
- Wednesday 12-14, TB 223
First lecture on Tuesday 13.3.2012.
Teacher: Tuomas Virtanen,
firstname.lastname@tut.fi, room TF311.
Exercises
There will 2 groups, you can go to whichever is suitable for you.
Time and place will be decided later:
Completion of 20% of the exercises is compulsory for passing the course.
Exams
The exams will cover all the topics discussed in the lectures or
excercies. Most of the lecture material below was covered in the
lectures, but some areas such as phonetics and human auditory system were
not discussed in detail. Within these areas it suffices to study
topics that are relevant in automatic speech recognition.
You can use your own calculator in the exam (standard
scientific calculator, not a programmable one).
Course material
- Lecture 1 Introduction
- Topics: different types of speech recognition problems, history, performance.
- Lecture 2 Review of Phonetics and Probability
- Topics: review of phonetics and probability theory, statistical formulation
of the speech recognition problem.
- Correction: on page 54, P(x< X ≤ x+dx) on right side of the
"interpretation" should be divided by dx; furthermore, the equation is not exact but
approximate (it is exact as dx -> 0).
- Notes on using change of variables in an integral and using this to calculate
one exercise problem.
- Lecture 3 Feature Extraction
- Topics: psychoacoustics, nonlinear frequency scales, cepstral coefficients.
- Lecture 4 Hidden Markov Models (HMM)
- Topics: introduction to HMMS, probability evaluation of a sequence.
- Important corrections for the slides
(especially, the definition of the backward probability is wrong).
- Lecture 5 Language models, search
- Topics: Viterbi training, triphone models, statistical language models.
Resources, additional material