This project was carried out by the Natural Language Processing team of Oujda (Oujda-NLP team), from Mohammed I University in Morocco, with the support of the Arab League Educational, Cultural and Scientific Organization (ALECSO). | |
Download Source | Download Jar |
Abstract Stemming is the main step used for handling the morphologically rich languages such as Arabic. It is usually used in several fields such as Natural Language Processing, Information Retrieval (IR), and Text Mining. The goal of stemming is reducing inflected or derived words to their base (root or stem), from a generally written word form. Considering that Arabic is mainly dependent on roots and patterns to generate words, a new efficient heavy/light stemmer is developed based on the interaction between roots and patterns; yet, rich linguistic resources are involved. This stemmer provides three different outputs: individual root, a stem, and a combination of stem/root. In this paper, we highlight the performance of the developed stemmer via various experiments on both Modern Standard Arabic and Classical Arabic. In fact, the achieved accuracies are 96.93% and 96.56% for respectively the Quranic corpus “Al-Mus’haf” and NEMLAR corpus. In the context of usability testing, the effectiveness of the stemmer on IR and Part of Speech (PoS) tagging are studied. The obtained results indicate an improvement in PoS tagging by 10.98% and by 14.12% in search efficiency. |
For further details, please check the following paper :
|