- Al-Jazeera interviews: 207 minutes (multi-dialect recorded in Qatar; relatively formal)
- LAKOM: 240 minutes (Moroccan dialect; not yet transcribed)
- Sabah El-Doha: 110 minutes (multi-dialect recorded in Qatar; relatively informal)
- Tesaneef 550 minutes (Qatari dialect, extremely informal)
- Nineteen hours of monaural broadcast speech audio,
- 16 bits/sample in WAV format,
- recorded at 44.1kHz sampling rate, but
- downsampled to 16kHz sampling rate for distribution.
- Fifteen hours of phonetic transcription
- Arabic script,
- fully vowelized,
- extended with Persian and Urdu characters in order to distinguish phonemes that are not part of the core Arabic orthography.
- Fifteen hours of English gloss.
Qatari Arabic Corpus
Parts of the Qatari Arabic Corpus are now available for download at http://www.isle.illinois.edu/dialect/QAC/
Content | Distributed Files |
---|---|
Speech was recorded from four Qatari television
programs in 2009-2011:
|
|