Speech Recognition Project
A Matlab implementation of a Finnish Digit Recognizer using Hidden Markov Models.
Introduction
This project was carried out for the course 80961 Signal Processing Project in Fall 1998.
The idea of the Speech Recognition Project was to implement a digit recognizer for Finnish using Matlab. The Matlab functions and scripts were all well documented and parametrized in order to be able to use them in the future. The Speech Recognition project was done at the Tampere University of Technology by the following ARG members:
Jukka Kivimäki
Tomi Mikkonen
Antti-Veikko Rosti
Anssi Rämö
Teemu Saarelainen
Deliverables
The tasks done in the project:
Speech Database for Digit Recognition in Finnish
Collection of well documented Matlab Functions
Literature Review on Connected Word Models and Continuous Speech Recognition
Results
The results of the project were reasonably good, given that the speech utterances used in training and testing were recorded over a fixed telephone line in the SpeechDat(II)-project. The overall recognition probabilities were 92.8% for the training data and 92.2% for the test data. Both data sets consisted of 1000 utterances of Finnish digits.
The confusion matrices for the training and test data are given in Tables 1 and 2 respectively. The column indices of the confusion matrices give the true digit which was uttered and the row indices give the recognized digit; i.e. the element, aij, of the matrix is the percentage of uttered digits i recognized as digit j. 3-D bar plot of the matrix in Table 2 is given in Figure 1.
Table 1: The Confusion Matrix for the Training Data
| % |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
0 |
| 1 |
100 |
0. |
9 |
0 |
|
3. |
6 |
12. |
2 |
0 |
0 |
|
1. |
1 |
4. |
2 |
0 |
|
| 2 |
0 |
99. |
1 |
0 |
|
0 |
|
0 |
|
0 |
0 |
|
2. |
1 |
0 |
|
0 |
|
| 3 |
0 |
0 |
|
85. |
4 |
0 |
|
0 |
|
0 |
0 |
|
0 |
|
0 |
|
0 |
|
| 4 |
0 |
0 |
|
0 |
|
82. |
1 |
2. |
2 |
0 |
0 |
|
0 |
|
0 |
|
1. |
1 |
| 5 |
0 |
0 |
|
0 |
|
0 |
|
80. |
0 |
0 |
0 |
|
0 |
|
1. |
0 |
0 |
|
| 6 |
0 |
0 |
|
0 |
|
0 |
|
0 |
|
100 |
0 |
|
0 |
|
0 |
|
1. |
1 |
| 7 |
0 |
0 |
|
2. |
9 |
11. |
6 |
0 |
|
0 |
99. |
0 |
0 |
|
0 |
|
0 |
|
| 8 |
0 |
0 |
|
11. |
7 |
0 |
|
0 |
|
0 |
0 |
|
96. |
8 |
3. |
1 |
4. |
3 |
| 9 |
0 |
0 |
|
0 |
|
2. |
7 |
5. |
6 |
0 |
1. |
0 |
0 |
|
91. |
7 |
0 |
|
| 0 |
0 |
0 |
|
0 |
|
0 |
|
0 |
|
0 |
0 |
|
0 |
|
0 |
|
93. |
5 |
|
Table 2: The Confusion Matrix for the Test Data
| % |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
0 |
| 1 |
100 |
0 |
7. |
1 |
11. |
1 |
0 |
|
0 |
|
0 |
0 |
|
3. |
6 |
0 |
|
| 2 |
0 |
100 |
0 |
|
0 |
|
0 |
|
0 |
|
0 |
0 |
|
0 |
|
0 |
|
| 3 |
0 |
0 |
86. |
7 |
0 |
|
0 |
|
1. |
0 |
0 |
0 |
|
0 |
|
1. |
2 |
| 4 |
0 |
0 |
0 |
|
74. |
3 |
0 |
|
0 |
|
0 |
0 |
|
0 |
|
1. |
2 |
| 5 |
0 |
0 |
0 |
|
0. |
9 |
82. |
8 |
0 |
|
0 |
0 |
|
0 |
|
0 |
|
| 6 |
0 |
0 |
2. |
2 |
0 |
|
0 |
|
98. |
0 |
0 |
0 |
|
0 |
|
0 |
|
| 7 |
0 |
0 |
0 |
|
12. |
4 |
0 |
|
1. |
0 |
100 |
0 |
|
0 |
|
4. |
8 |
| 8 |
0 |
0 |
7. |
8 |
0 |
|
0 |
|
0 |
|
0 |
99. |
0 |
2. |
7 |
4. |
8 |
| 9 |
0 |
0 |
0 |
|
4. |
4 |
5. |
1 |
0 |
|
0 |
1. |
0 |
93. |
7 |
0 |
|
| 0 |
0 |
0 |
3. |
3 |
0. |
9 |
1. |
0 |
0 |
|
0 |
0 |
|
0 |
|
88. |
0 |
|
Figure 1: 3-D Bar Plot of the Matrix in Table 2
Saarelainen Teemu
Last modified: Thu Feb 18 22:45:12 EET 1999