wTIMIT and wMRT Whispered Speech Corpora

Acquired by Boon Pang Lim with the help of the Illinois Phonetics and Phonology Lab, the Institute for Infocomm Research, Singapore, Prof Richard Sproat and the Agency for Science Technology and Research, Singapore (A*STAR).

Database Descriptions

The whispered TIMIT (wTIMIT) corpus is designed for the study and construction of large vocabulary speech recognizers. TIMIT is a well-known corpus often used as a benchmark for phoneme recognition [1]. This corpus uses the prompts in TIMIT: each speaker both says and whispers all of the 450 phonetically balanced sentences of the TIMIT prompt set. The TEST and TRAIN divisions of the corpus are arbitrary -- they should only be used if one would like to obtain comparable results in [3]. The corpus has two accents (Singaporean-English, and North American), with roughly 20 to 28 speakers from each accent group. It is somewhat gender balanced. For more information, see the corpus documentation. or Dr. Lim's Ph.D. Thesis.

The Modified Rhyme Test (MRT), derived from the diagnostic rhyme test (DRT) [2], is an intelligibility test designed to quantify the intelligibility of speech across different encoding schemes and channels. The test comprises of 50 sets containing six words each; each set of words differ in only either the word-initial or the word-final consonant. The whispered Modified Rhyme Test (wMRT) corpus contains all of these words, read and whispered in the carrier sentence ``Can you say WORD now.'' These utterances were collected from 29 speakers. After removing bad utterances, 15179 utterances remain. See the corpus documentation or Dr. Lim's Ph.D. Thesis for more information.

Download Procedure

Data may be downloaded using a web browser or wget from the wMRT site or the wTIMIT site. You will need a login; login information can be obtained by writing to Boon-Pang.

Terms of Use

These databases are available freely available for research and non-commercial use. To reference the corpus, please use [3]. The authors welcome any feedback -- please drop a note if you do find this data useful.

References

  1. K. F. Lee, H. Hon, Speaker-independent phone recognition using Hidden Markov models, IEEE Transactions on Acoustics, Speech and Signal Processing (1989) 1641-1648.
  2. W. D. Voiers, Evaluating processed speech using the Diagnostic Rhyme Test, Speech Technology (1983), pp 30-39
  3. B. P. Lim, Computational differences between whispered and non-whispered speech, PhD Thesis, UIUC (2010).