Speech Enhancement and Robust Speech Recognition
In the past decades, significant advances have been achieved in the area of automatic speech
recognition (ASR). However, the performances of the speech processing systems degrade rapidly
when they are used in adverse environments. The noisy speech can be modeled as following.
The goal of this project is to design an effective signal processing algorithm to effectively
remove the ambient noises such as the computer noise in an office, the wind noise and engine
noise inside a car et al. The noise signal can be either stationary or quasi-stationary.
The vector space of the noisy signal is composed of a signal-plus-noise subspace
and a noise subspace.
Speech enhancement can be performed by removing the noise subspace and estimating
the clean speech from the remaining signal-plus-noise subspace.
Energy provides important information for human speech perception. This information
can be used as a constraint for speech enhancement. The energy constraint is especially
effective for consonant phonemes which are characterized by noise-like waveform.
- The noise signal can be modelled by an autoregressive (AR) process.
Speech Recognition Systems
Gaussian mixture density hidden Markov model (HMM) for acoustical model.
Use MFCC, delta MFCC, normalized energy and delta energy as the features.
Driving conditions: idle, 30 mph and 55 mph.
- Speaker-independent speech recognition in car environment.
- An example of the noisy speech signal (SNR=5dB) and the enhanced speech signal.
Noisy Speech Signal
Enhanced Speech Signal
Speech enhancement performance: improve the signal-to-noise ration (SNR) from 2dB to 8 dB
under various noise conditions.
Speech recognition performance: improve the word recognition accuracy (WRA) from 11% to 41%
under varioud noise conditions.
- Robust speech recognition.