Speech Recognition Project

A Matlab implementation of a Finnish Digit Recognizer using Hidden Markov Models.


Introduction

This project was carried out for the course 80961 Signal Processing Project in Fall 1998.

The idea of the Speech Recognition Project was to implement a digit recognizer for Finnish using Matlab. The Matlab functions and scripts were all well documented and parametrized in order to be able to use them in the future. The Speech Recognition project was done at the Tampere University of Technology by the following ARG members:

  • Jukka Kivimäki
  • Tomi Mikkonen
  • Antti-Veikko Rosti
  • Anssi Rämö
  • Teemu Saarelainen

  • Deliverables

    The tasks done in the project:

  • Speech Database for Digit Recognition in Finnish
  • Collection of well documented Matlab Functions
  • Literature Review on Connected Word Models and Continuous Speech Recognition

  • Results

    The results of the project were reasonably good, given that the speech utterances used in training and testing were recorded over a fixed telephone line in the SpeechDat(II)-project. The overall recognition probabilities were 92.8% for the training data and 92.2% for the test data. Both data sets consisted of 1000 utterances of Finnish digits.

    The confusion matrices for the training and test data are given in Tables 1 and 2 respectively. The column indices of the confusion matrices give the true digit which was uttered and the row indices give the recognized digit; i.e. the element, aij, of the matrix is the percentage of uttered digits i recognized as digit j. 3-D bar plot of the matrix in Table 2 is given in Figure 1.

    Table 1: The Confusion Matrix for the Training Data

    % 1 2 3 4 5 6 7 8 9 0
    1 100 0. 9 0 3. 6 12. 2 0 0 1. 1 4. 2 0
    2 0 99. 1 0 0 0 0 0 2. 1 0 0
    3 0 0 85. 4 0 0 0 0 0 0 0
    4 0 0 0 82. 1 2. 2 0 0 0 0 1. 1
    5 0 0 0 0 80. 0 0 0 0 1. 0 0
    6 0 0 0 0 0 100 0 0 0 1. 1
    7 0 0 2. 9 11. 6 0 0 99. 0 0 0 0
    8 0 0 11. 7 0 0 0 0 96. 8 3. 1 4. 3
    9 0 0 0 2. 7 5. 6 0 1. 0 0 91. 7 0
    0 0 0 0 0 0 0 0 0 0 93. 5

    Table 2: The Confusion Matrix for the Test Data

    % 1 2 3 4 5 6 7 8 9 0
    1 100 0 7. 1 11. 1 0 0 0 0 3. 6 0
    2 0 100 0 0 0 0 0 0 0 0
    3 0 0 86. 7 0 0 1. 0 0 0 0 1. 2
    4 0 0 0 74. 3 0 0 0 0 0 1. 2
    5 0 0 0 0. 9 82. 8 0 0 0 0 0
    6 0 0 2. 2 0 0 98. 0 0 0 0 0
    7 0 0 0 12. 4 0 1. 0 100 0 0 4. 8
    8 0 0 7. 8 0 0 0 0 99. 0 2. 7 4. 8
    9 0 0 0 4. 4 5. 1 0 0 1. 0 93. 7 0
    0 0 0 3. 3 0. 9 1. 0 0 0 0 0 88. 0

    Figure 1: 3-D Bar Plot of the Matrix in Table 2


    Saarelainen Teemu
    Last modified: Thu Feb 18 22:45:12 EET 1999