|This project was carried out by the Natural Language Processing team of Oujda (Oujda-NLP team), from Mohammed I University in Morocco, with the support of the Arab League Educational, Cultural and Scientific Organization (ALECSO).
Alkhalil POSTagger assigns to each word of an Arabic sentence a single POS tag taking into account the word context. The proposed system comprises two modules. The first one consists of an analysis out of context, based on a database and on the morphosyntactic analyser Alkhalil Morpho Sys 2. In the second module, we use the word context to identify the correct POS tag from the potential POS tag of the word obtained by the first module. For this purpose, we can optionally use a statistical technique based on the hidden Markov models (HMM), where the observations are the words of the sentence, and the roots represent the hidden states, or use an approximation technique based on a linear or quadratic spline. We validate these approaches using the labelled corpus Nemlar consisting of about 500,000 words. The Alkhalil POSTagger gives the correct POS tag in more than 93% of the words in the test set when using in the disambiguation phase the HMM model, and 92% when using spline function. This later analyse 450 words per second while HMM model analyse only 129 words per second.
Structure of the program
The program is composed from several packages as described in the Figure 1:
Using the API in another project
Add the jar file in the library class path and use the following code to analyse a raw file:
For further details, please check the following paper :