Using Robust Viterbi Algorithm and HMM-Modeling in Unit Selection TTS to Replace Units of Poor Quality

Examples of English Speech Synthesis

Silén, H., Helander, E., Nurminen, J., Koppinen, K., and Gabbouj, M.

Interspeech 2010, Makuhari, Japan


This page contains English samples of HMM-based unit selection using HMM models in cost computation and:

Speech parameterization

For both HMM-based unit selection and HMM-based synthesis, following parameterization was used:

An analysis update interval of 5ms is used for both approaches.

Synthesis samples

Sample 1: Baseline Proposed
Sample 2: Baseline Proposed
Sample 3: Baseline Proposed

The HMM-training is done using the HMM-based speech synthesis system HTS [4] (version 2.1) and an English speech database CMU ARCTIC (speaker slt) available in [5]. Here, postfiltering was used instead of global variance.


[1] Kawahara, H., Masuda-Katsuse, I., and de Cheveigné, A., Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, Speech Communication 27, 1999, pp. 187-207.

[2] Fukada, T., Tokuda, K., Kobayashi, T., and Imai, S., An adaptive algorithm for mel-cepstral analysis of speech, in ICASSP, 1992, pp. 137-140.

[3] Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., and Kitamura, T., Mixed excitation for HMM-based speech synthesis, In EUROSPEECH, 2001, pp. 2263-2266.

[4] Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A., and Tokuda, K., The HMM-based Speech Synthesis System (HTS) Version 2.0, in ISCA SSW6, 2006, pp.294-299.

[5] CMU_ARCTIC speech synthesis databases, available in

[6] Toda, T. and Tokuda, K., Speech parameter generation algorithm considering global variance for HMM-based speech synthesis, in Interspeech, 2005, pp. 2801-2804.

last modified: 2010-11-02