
Lecture Notes, Landmark-Based Speech Recognition
Spectrogram Reading, Support Vector Machines, Dynamic Bayesian
Networks, and Phonology
These are lecture notes for a one month graduate course, taught in the
computer science department at Tsinghua University, October
11-November 5, 2004. Each lecture is designed to last about 1.5
hours, with 30-60 slides per lecture.
Lectures
- Lecture 1: Introduction to Spectrogram
Reading.
- Signal processing review. Manner features.
- Lecture 2: Spectral Statics.
- Multi-tube
models. Perturbation theory. Vowels and glides.
- Lecture 3: Spectral Dynamics.
-
Impedance-matching techniques for calculating poles and zeros of
stops, fricatives, nasals.
- Lecture 4: Classifiers.
- Error
backpropagation training of arbitrary kernel-based classifiers (RBF,
linear, polynomial, or sigmoidal).
- Lecture 5: Support Vector Machines.
-
Derivation of the SVM. How to set hyperparameters.
- Lecture 6: Speech Recognition Features.
-
FFT-based, time-domain, LPC-based, modulation filtering, and
auditory-model based features.
- Lecture 7: Tree-Structured Bayesian
Networks.
- The sum-product and max-product algorithms. Special
case: audiovisual HMM.
- Lecture 8: Non-Tree Bayesian Networks.
-
Triangulation and Junction Trees. Factorial HMM. Compiling FHMM to
an HMM without sacrificing the triangulation. Zweig-triangle LVCSR.
- Lecture 9: Learning in Bayesian
Networks.
- Learning criteria: ML, MMI, MCE. Expectation
Maximization. Hybrid NN-DBN systems: the Bourlard-Morgan hybrid and
the BDFK (Bengio-De Mori-Flammia-Kompe) hybrid.
- Lecture 10: Phonology.
- Jakobson's
universal notation. Miller and Nicely and speech perception. Chomsky
and Halle and speech production. Integration of production and
perception in Stevens' quantal theory.
- Lecture 11: Articulatory Phonology.
- Lecture 12: Conclusions and New Directions.
Labs
Last modified: Sun Oct 31 03:18:53 Central Standard Time 2004