Speech Tools Minicourse 2009

Jump to: Lectures, Sample Problems, Useful Links

Lectures

Lecture 1

Transcription and Pre-Processing (Praat and Matlab) PDF, TeX,

Lecture 2

Acoustic Features, Acoustic Modeling, and Language Modeling (HTK, bash, sed, awk, perl, python, and ruby) PDF, TeX,

Lecture 3

Acoustic Features, Acoustic Modeling, and Language Modeling (HTK, bash, and ruby)

Lecture 4

Acoustic Model Training (HTK, bash, and perl) PDF, TeX,

Lecture 5

Acoustic Model Training (HTK, bash, and perl) PDF, TeX train.pl

Lecture 6

Knowledge Source Composition for Fast Decoding (OpenFST) PDF, TeX, crypto.tgz,

Lecture 7

Knowledge Source Representation for Flexible Decoding and Training (GMTK) PDF, TeX, 2009MiniCourse.tgz Demo, fisher1.tgz Demo Waveforms,

Sample Problems

Lecture 1

  1. Get a speech data training database. If you are working on a project for your own research, use that data; if not, download the AVICAR phonetically balanced sentences, because that is what I will use for most of the rest of the course.
  2. View the waveforms in Praat; convince yourself that you can view the spectrogram and the pitch track, and that you can listen to the waveform.
  3. If you're not confident of your matlab skills, work through the Matlab on athena tutorial.
  4. Load one of the 55D (55mph with the windows rolled down) waveforms in matlab, and perform spectral subtraction. Listen to the waveform before and after spectral subtraction; how well did it work?
  5. Try VAD on a relatively quiet waveform (IDL condition) and a relatively noisy waveform (55D); you should find that it works perfectly in IDL condition, but perhaps not perfectly in noise.
  6. Finally, try writing a matlab function that accepts, as argument, the names of input and output waveform files. The script should read in the waveform file, perform spectral subtraction, perform VAD, chop the waveform into multiple sub-files (each file contains no more than 300ms initial silence and 300ms final silence as estimated by the VAD), then save each sub-file to filenames constructed from the given output name (for example, if the output name was foo.wav, and VAD found speech segments starting at sample number 1109, 44038, and 140932, then you could save the ouput to files named foo001109.wav, foo044038.wav, and foo140932.wav).

Lecture 2

  1. Practice Problem Definition
  2. UASpeech transcriptions
  3. AVICAR transcriptions

Lecture 3

  1. Using the material presented in Lectures 2 and 3, train a monophone recognition system. Sarah recommends doing this as follows:
    • Download and install the HTK tools if they are not already installed on your system, and some of the AVICAR data.
    • Follow the tutorial overview presented in the HTK book.
    • Contact Mark or Sarah if you have problems or questions.

Lecture 5

  1. Use train.pl to train an HMM speech recognizer on AVICAR, UASpeech, Buckeye, or the AMI Corpus. Some additional Perl or Ruby processing may be required if your original corpus transcriptions are in a format not supported by train.pl. Contact Sarah (sborys@illinois.edu, 2013 Beckman) or Mark (jhasegaw@illinois.edu, 2011 Beckman) if you get stuck or if you find bugs in train.pl.

Lecture 6

  1. Using the auxiliary files provided in crypto.tgz, create an FSM decoder similar to the one Mark trained in lecture.
  2. Next, create an FSM encoder. Send Sarah (sborys@illinois.edu) an encoded email, then use your decoder to read the reply!

Lecture 7

  1. Use GMTK, together with the parameter files and training scripts in the demo archive, to train a set of monophone acoustic models. Note that if you are not on IFP, you may need to adjust your path to make sure that GMTK is in the path.

Useful Links

Tools Specifically Designed for Speech and Language

Machine Learning Tools (neural nets, SVMs, etc)

Data

Standard unix or unix-related tools

Matlab or matlab-related tools