The objective of this work is to develop a POS tagger for the Arabic language. This analyzer uses a very rich tag set that gives syntactic information about proclitic attached to words. This study employs a probabilistic model and a morphological analyzer to identify the right tag in the context. Most published research on probabilistic analysis uses only a training corpus to search the probable tags for each words, and this sometimes affects their performances. In this paper, we propose a method that takes into account the tags that are not included in the training data. These tags are proposed by the Alkhalil_Morpho_Sys analyzer (Bebah et al. 2011). We show that this consideration increases significantly the accuracy of the morphosyntactic analysis. In addition, the adopted tag set is very rich and it contains the compound tags that allow analyze the proclitics attached to words.

For further details, please check the following paper :

Nabil Ababou, Azzeddine Mazroui, “A hybrid Arabic POS tagging for simple and compound morphosyntactic tags”, International Journal of Speech Technology (IJST), 2015, (DOI) 10.1007/s10772-015-9302-8.

Azzeddine Mazroui and Nabil Ababou, “Alkhalil Morpho Sys and smoothing techniques to improve a statistical POS Tagger for Arabic”, proceeding of the 15th International Arab Conference on Information Technology (ACIT’14), December 09-11, 2014, Nizwa, Soltanat Oman.

