AlKhalil-PoS-Tagger

in Programms 11 November 2022 0 3,619 Views

This project was carried out by the Natural Language Processing team of Oujda (Oujda-NLP team), from Mohammed I University in Morocco, with the support of the Arab League Educational, Cultural and Scientific Organization (ALECSO).
Download Source	Download Jar

Introduction

Alkhalil POSTagger assigns to each word of an Arabic sentence a single POS tag taking into account the word context. The proposed system comprises two modules. The first one consists of an analysis out of context, based on a database and on the morphosyntactic analyser Alkhalil Morpho Sys 2. In the second module, we use the word context to identify the correct POS tag from the potential POS tag of the word obtained by the first module. For this purpose, we can optionally use a statistical technique based on the hidden Markov models (HMM), where the observations are the words of the sentence, and the roots represent the hidden states, or use an approximation technique based on a linear or quadratic spline. We validate these approaches using the labelled corpus Nemlar consisting of about 500,000 words. The Alkhalil POSTagger gives the correct POS tag in more than 93% of the words in the test set when using in the disambiguation phase the HMM model, and 92% when using spline function. This later analyse 450 words per second while HMM model analyse only 129 words per second.
Alkhalil POSTagger is written in Java. So, to use the program, you must install Java Virtual Machine version 1.8 or later. This means that the program can be used in several systems such as Windows, Linux or MacOS.
The program was written using the NetBeans IDE that can be downloaded from https://netbeans.apache.org/. To use the Arab characters, the user must choose UTF-8 in the encoding of the project properties.

Structure of the program

The program is composed from several packages as described in the Figure 1:

As shown in Figure 1, Alkhalil POSTagger uses Alkhalil Morpho Sy “AlKhalil-2.1.21.jar”, “com.fasterxml.jackson.annotations.jar”, “jackson-core-2.2.3.jar” and “jackson-databind-2.2.3.jar” APIs.

Using the API in another project

T Add the jar file in the library class path and use the following code to analyse a raw file:
import parser.TestParser;

TestParser tp = new TestParser();
tp.testerFichier_HMM(“fileIn.txt”, “fileOut .txt”);

where “fileIn .txt” is the name of the file to be processed.
In order to process a file, Alkhalil Morpho Sy (AlKhalil-2.1.21.jar) must be used and the directory “resources” must be put in the “src” directory (Figure 2).

For further details, please check the following paper :

Ababou, N.; Mazroui. A.; “A hybrid Arabic POS tagging for simple and compound morphosyntactic tags”. International Journal of Speech Technology 2016, vol. 19, no 2, p. 289-302.

AlKhalil-PoS-Tagger

Related Articles