Home / Corpora / Alkhalil Speech Corpus

Alkhalil Speech Corpus

Alkhalil Speech Corpus is an Arabic single speaker speech database recorded by a professional male speaker. It was designed mainly for unit-selection speech synthesis purposes. Yet, other possible applications may include end-to-end speech synthesis and speech recognition. The speech sources are paragraphs and articles that were selected thoroughly to cover different domains including science, literature, academic books, technology, etc.. The corpus includes the following files:

  1. 15 .wav files presented as one channel 24 kHz 16-bit.
  2. 15 .TextGrid files containing phoneme, word, and lemma-level annotations aligned with their corresponding speech utterances. These files can be opened using Praat software.
  3. Orthographic-transcript.txt which contains a fully diacritized and hand-checked orthographic transcription covering more than 80.000 Arabic words.
  4. buckwalter_transcript.txt which is a representation of the orthographic transcript file (3) in Buckwalter Format.
  5. Pronounciation_transcript.txt which is a phonetic representation of the audio files describing the way the words were uttered by the speaker.
    This file is particularly useful for unit-selection based synthesis.

A sample of 10.30 hours of speech is available for download via the button below:

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.