The first version of Nemlar corpus was produced within the NEMLAR project. This is a set of annotated Arabic texts collected from 13 different domains and contains about 500,000 words.
The Arabic Language Processing team (ALP team) of Mohammed First University in Morocco enriched this corpus by adding the lemma label to all the words in the corpus, and also corrected some annotation errors in the first version.
This new version is in XML format and each word is accompanied by the following tags:
For further details, please check the following paper :