MoSAR corpus

The corpus will be available for Download soon
Abstract: Today there is a large amount of valuable research on corpora, and the availability of corpora has increased significantly in recent years. Unfortunately, this is not the case for all types of corpora. Research in the field of Arabic language processing suffers from a great lack of annotated educational corpora. In this work, we have tried to constitute a new educational corpus by drawing from Moroccan primary school books. This corpus will help education researchers and computational linguists provide appropriate tools to support school students who are learning formal Arabic. We annotated the corpus with morphosyntactic information that can be used in several natural language processing applications. We also added a text difficulty measure, linked to the Moroccan primary school levels, so that the corpus can be used in the development of readability measurement applications. The result is a Modern Standard Arabic Language corpus dedicated to young learners of Arabic as a first language (L1). The corpus is manually labeled by seven levels, namely the primary levels of the Moroccan educational system from 1st to 6th grade, in addition to a more basic level we called level 0.
DOI: 10.1145/3372938.3372961
For further details, please check the following paper : Or contact us at: naoual.nassiri@gmail.com

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.