Home / Programms / Arabic PoS tagset

Arabic PoS tagset


Part of Speech (PoS) tagging is still not very well investigated with respect to the Arabic language. Determining the PoS tags of a word in a particular context is difficult, primarily because there is no use of diacritics in most of contemporary texts. Consequently, the same word may be spelled in different ways. Further, detecting the difference between Arabic derivatives represents a very challenging issue for the majority of PoS taggers. Hence, the task of tagging the correct PoS tags requires advanced processing and the use of considerable resources. This study aims to design detailed hierarchical levels of the Arabic tagset categories and their relationships. These hierarchical levels allow easier expansion when required and produce more accurate and precise results. They are based on a comparative study and important references in Arabic grammar; they are also validated by experts in this field. In addition, the proposed tagset is implemented in a PoS tagger and tested via various experiments. We believe that our study makes a significant contribution to the literature because this work is an advancement in the direction of achieving a standard, rich, and comprehensive tagset for Arabic.

Download the tagset in XML Format

For further details, please check the following paper :

Imad Zeroual, Abdelhak Lakhouaja, and Rachid Belahbib “Towards a standard part of speech tagset for the Arabic language”, Journal of King Saud University – Computer and Information Sciences, 2017, (DOI) 10.1016/j.jksuci.2017.01.006.

Figures: Hierarchical levels of noun, verb, and particle categories :




Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.