Audiovisual database of dysarthric speech for research promoting universal access to information technology.
Thomas Huang, and
Post-Doctoral Researcher: Heejin Kim
Graduate Research Assistants: Simone Frame, Harsh Vardhan Sharma, and Xi Zhou
Contributors: (4/1/2013) Ladan Baghai-Ravary at Oxford University normalized the audio data to a common level, and filtered out the narrowband noise.
Researchers employed at universities and government labs with an interest in universal access technology may download the data for free via ftp, wget or secure http from http://ifp-08.ifp.uiuc.edu/protected/UASpeech/. If you are interested in downloading the data, e-mail Mark Hasegawa-Johnson. Specify your name, the name of your institution, and your institutional surface mail address.
Neuromotor disabilities such as cerebral palsy may hinder one's use of a keyboard. Automatic speech recognition (ASR) is now sufficiently accurate to provide an alternative user interface for some users with motor disability, but many users with cerebral palsy or closed head injury are unable to use current ASR programs because their disability includes a component of dysarthria: reduced speech intelligibility caused by neuromotor impairment. Standard automatic speech recognition systems work poorly for talkers with dysarthria. The UA-Speech database is intended to promote the development of user interface for talkers with gross neuromotor disorders and spastic dysarthria.
Audiovisual isolated-word recordings of talkers with spastic dysarthria.
Subjects were recruited based primarily on personal contact facilitated by disability support organizations. Subjects were selected based on self-report of either speech pathology or cerebral palsy. Before data were included in the UA-Speech distribution, the diagnosis of spastic dysarthria (sometimes mixed with other forms of dysarthria) was informally confirmed by a certified speech-language pathologist listening to these recordings. Subjects were asked to explicitly grant permission for the dissemination of their data; subjects who refused permission are not represented in the distribution.
Subjects read isolated words from a computer monitor. Prompt words included:
- Digits (10 words X 3 reps): "one, two, three, ..."
- Letters (26 words X 3 reps): the 26 letters of the International Radio Alphabet, "alpha, bravo, charlie,..."
- Computer Commands (19 words X 3 reps): word processing commands, e.g., "command, line, paragraph, enter,..."
- Common Words (100 words X 3 reps): the most common 100 words in the Brown corpus, e.g., "the, of, and,..."
- Uncommon Words (300 words X 1 rep): 300 words selected from Project Gutenberg novels using an algorithm that sought to maximize biphone diversity, e.g., "naturalization, faithfulness, frugality,..."
Audio data were recorded using a 7-channel microphone array, fitted to the top of a computer monitor (see images above). The 8th audio channel is used for synchronization tones, generated by the PC at every slide advance. The same synchronization tones are recorded to the video files. Video data were recorded using a camcorder mounted on a desktop tripod.
Dr. Heejin Kim was funded, during the recording of these data, by a grant from the National Institutes of Health (PHS R21-DC008090-A, "Audiovisual Description and Recognition of Dysarthric Speech").
Graduate research assistants Simone Frame, Harsh Vardhan Sharma, and Xi Zhou were funded, during the recording of these data, by a grant from the National Science Foundation (NSF 05-34106, "Audiovisual Phonologic-Feature-Based Recognition of Dysarthric Speech"). All opinions, results, and findings presented in this work are those of the authors, and are not necessarily endorsed by the NSF.
We gratefully acknowledge those who put us in contact with the people recorded in this database:
- The University of Illinois Division of Rehabilitation and Education Services,
- The University of Wisconsin Trace Center,
- The United Cerebral Palsy, Land of Lincoln,
- The PACE Center of Urbana.
Finally, we most gratefully acknowledge the time, intelligence, and helpfulness of every person who contributed his or her voice to this database.