HTK Study Group, 1/2006


This page posts configuration files and a training script that can be used, together with the HTK Tools, in order to create a Switchboard telephone-band conversational speech recognizer with at least 50% WRA. Informal lectures describing the steps in development of this speech recognizer are posted on the Wiki. In order to run these tools, you will need: A few of the features are as follows. The percentage (absolute) of word recognition accuracy supplied by each of these features is listed in parentheses. These percentages are based on the development chain outlined in train.pl; this is not a carefully controlled scientific experiment.
Features
Clustered allophone acoustic models (worth about 10% absolute)
Lexical stress as a QS context variable (worth about 7% absolute)
Split from 1 to 5 Gaussians per state (worth about 8% absolute)
Unsupervised MLLR adaptation to target talker (worth about 1% absolute)
Page created by Mark Hasegawa-Johnson, jhasegaw at uiuc.