fso
nlp-oujda
ump
Home / Corpora / NEMLAR Corpus
Download

About

The first version of Nemlar corpus was produced within the NEMLAR project. This is a set of annotated Arabic texts collected from 13 different domains and contains about 500,000 words.

The Arabic Language Processing team (ALP team) of Mohammed First University in Morocco enriched this corpus by adding the lemma label to all the words in the corpus, and also corrected some annotation errors in the first version.

This new version is in XML format and each word is accompanied by the following tags:

  • Vowelized form
  • Lemma
  • POS tag
  • Clitics attached to the stem
  • Root
  • Pattern

For further details, please check the following paper :

  • Boudchiche, M.; Mazroui, A.; 2015“Enrichment of the Nemlar corpus by the lemma tag”. Workshop Language Resources of Arabic NLP: Construction, Standardization, Management and Exploitation. Rabat, Morocco. November 26, 2015.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

ăn dặm kiểu NhậtResponsive WordPress Themenhà cấp 4 nông thônthời trang trẻ emgiày cao gótshop giày nữdownload wordpress pluginsmẫu biệt thự đẹpepichouseáo sơ mi nữhouse beautiful