Speech Tools Minicourse, 2009
The third biennial Speech Tools Minicourse will be taught January
5-16, 2009. The course will meet daily from 10:00-1:00 in Siebel 1109.
Lectures
- Lecture 1: Transcription and Pre-Processing (Praat and Matlab).
PDF,
TeX,
Video
- Lecture 2: Acoustic Features, Acoustic Modeling, and Language Modeling
(HTK, bash, sed, awk, perl, python, and ruby)
PDF,
TeX,
Video.
- Lecture 3: Acoustic Features, Acoustic Modeling, and
Language Modeling (HTK, bash, and ruby)
- Lecture 4: Acoustic Model Training (HTK, bash, and perl)
PDF,
TeX,
Video.
- Lecture 5: Acoustic Model Training (HTK, bash, and perl)
- Lecture 6: Knowledge Source Composition for Fast Decoding (OpenFST)
- Lecture 7: Knowledge Source Representation for Flexible Decoding and
Training (GMTK)
Sample Problems
- Lecture 1:
- Get a speech data training database. If you are
working on a project for your own research, use that data; if not,
download the AVICAR phonetically balanced sentences, because that is
what I will use for most of the rest of the course.
- View the waveforms in Praat; convince yourself that you can view
the spectrogram and the pitch track, and that you can listen to the
waveform.
- If you're not confident of your matlab skills, work through the
Matlab on
athena tutorial.
- Load one of the 55D (55mph with the windows rolled down) waveforms
in matlab, and perform spectral subtraction. Listen to the waveform
before and after spectral subtraction; how well did it work?
- Try VAD on a relatively quiet waveform (IDL condition) and a
relatively noisy waveform (55D); you should find that it works
perfectly in IDL condition, but perhaps not perfectly in noise.
- Finally, try writing a matlab function that accepts, as argument,
the names of input and output waveform files. The script should read
in the waveform file, perform spectral subtraction, perform VAD, chop
the waveform into multiple sub-files (each file contains no more than
300ms initial silence and 300ms final silence as estimated by the
VAD), then save each sub-file to filenames constructed from the given
output name (for example, if the output name was foo.wav, and VAD
found speech segments starting at sample number 1109, 44038, and
140932, then you could save the ouput to files named foo001109.wav,
foo044038.wav, and foo140932.wav).
- Lecture 2:
- Lecture 3: Using the material presented in Lectures 2 and 3, train
a monophone recognition system. Sarah recommends doing this as
follows:
- Download and install the HTK tools if they are not already installed on
your system, and some of the AVICAR data.
- Follow the tutorial overview presented in the HTK book.
- Contact Mark or Sarah if you have problems or questions.
- Lecture 5: Use train.pl to train an HMM speech recognizer on AVICAR, UASpeech, Buckeye, or the AMI Corpus. Some additional Perl or Ruby processing may be required if your original corpus transcriptions are in a format not supported by train.pl. Contact Sarah (sborys@uiuc.edu, 2013 Beckman) or Mark (jhasegaw@uiuc.edu, 2011 Beckman) if you get stuck or if you find bugs in train.pl.
- Lecture 6: Using the auxiliary files provided
in crypto.tgz, create an FSM decoder similar
to the one Mark trained in lecture. Next, create an FSM encoder.
Send Sarah (sborys@uiuc.edu) an encoded email, then use your decoder
to read the reply!
- Lecture 7: Use GMTK, together with the parameter files and
training scripts in the demo archive, to train a set of monophone
acoustic models. Note that if you are not on IFP, you may need to
adjust your path to make sure that GMTK is in the path.
Useful Links
- Tools Specifically Designed for Speech and Language
- Machine Learning Tools (neural nets, SVMs, etc)
- Data
- Standard unix or unix-related tools
- Matlab or matlab-related tools