AVICAR Project: Audio-Visual Speech Recognition in a Car
- Database information
- Sample data available without registration
- DTMF segmentation software
The AVICAR corpus was collected and transcribed
by University of Illinois researchers with
funding from Motorola during 2003-2004. For more information about
the database you can watch a video in
Quicktime format, read
the Interspeech paper, or
read the database README
file. All data in the AVICAR12 release (March 2013) are
synchronized audiovisual speech data, orthographically transcribed in
ELAN format, and
Subjects enrolled in this study consented to have their data distributed for free to any speech or language researcher via secure http, but did not consent to have their videos posted on the web. If you are a speech or language researcher interested in dowloading the data, please send a note to Prof. Mark Hasegawa-Johnson (jhasegaw at illinois.edu) specifying your name, the name of your institution, and (briefly) the reason for your interest in the data.
Many people have asked for a limited version of the dataset:
isolated digits or isolated letters, with recordings from only one
microphone. Since so many people are interested, those recordings are
now available here: avicar_somedigits.zip, and avicar_someletters.zip.
The list below provides links to the complete dataset for talker
AM2. There are 5 video files, 35 audio files (seven per video), and
transcription files (one per audio file). USAGE: download the desired
WAV, EAF, and AVI files to any directory (using your browser or wget),
then open the EAF using ELAN.
DTMF Segmentation Software
Software to segment audio files automatically at DTMF tones is available here.