AVICAR Project: Audio-Visual Speech Recognition in a Car


Database information

The AVICAR corpus was collected and transcribed by University of Illinois researchers with funding from Motorola during 2003-2004. For more information about the database you can watch a video in AVI or Quicktime format, read the Interspeech paper, or read the database README file. All data in the AVICAR12 release (March 2013) are synchronized audiovisual speech data, orthographically transcribed in ELAN format, and featuring:


Subjects enrolled in this study consented to have their data distributed for free to any speech or language researcher via secure http, but did not consent to have their videos posted on the web. If you are a speech or language researcher interested in dowloading the data, please send a note to Prof. Mark Hasegawa-Johnson (jhasegaw at illinois.edu) specifying your name, the name of your institution, and (briefly) the reason for your interest in the data.

Many people have asked for a limited version of the dataset: isolated digits or isolated letters, with recordings from only one microphone. Since so many people are interested, those recordings are now available here: avicar_somedigits.zip, and avicar_someletters.zip.

Sample data

The list below provides links to the complete dataset for talker AM2. There are 5 video files, 35 audio files (seven per video), and 35 ELAN transcription files (one per audio file). USAGE: download the desired WAV, EAF, and AVI files to any directory (using your browser or wget), then open the EAF using ELAN.

DTMF Segmentation Software

Software to segment audio files automatically at DTMF tones is available here.