Home / Programms / Al-Mus’haf Corpus

Al-Mus’haf Corpus

Download a tagged version with Arabic Standard tagset
Download a tagged version with Universal tagset


There is not a widely amount of available annotated Arabic corpora. This leads us to contribute to the enrichment of Arabic corpora resources. In this regard, we have decided to start working with correct and carefully selected texts. Thus, beginning with the Quranic Arabic text is the best way to start for such an effort. Furthermore, the annotating linguistic resources, such as Quranic Corpus, are important for researchers working in all Arabic natural language processing fields. To the best of our knowledge, the only available Quranic Arabic corpora are from the University of Leeds, University of Jordan and the University of Haifa. Unfortunately, these corpora have several problems and they do not contain enough grammatical and syntactical information. To build a new Corpus of the Quran, the work used a semi-automatic technique, which consists in using the morphsyntactic of standard Arabic words “AlKhalil Morpho Sys” followed by a manual treatment. As a result of this work, we have built a new Quranic Corpus rich in morphosyntactical information.

Last update was on April 27-2017

ISLRN: 114-868-598-820-5

For further details, please check the following paper(s) :

Or contact mr.imadine@gmail.com

Imad Zeroual and Abdelhak Lakhouaja, “A new Quranic Corpus rich in morphosyntactical information”, International Journal of Speech Technology (IJST), 2016, (DOI) 10.1007/s10772-016-9335-7.