MSc
Dissertation: "A Part of Speech Tagging Model for Albanian", by Besmir Hasanaj
ABSTRACT
With the enormous growth of the digital
information, it is necessary find advanced ways to process it. The goal is to enhance
information retrieval, information extraction and natural language processing.
The most complicated process is text mining which deals finding high quality
information from text. This dissertation presents a statistical part-of-speech
tagger for Albanian. The training, testing and evaluation process is done with
Apache OpenNLP tool. The tagging process is performed
based on a basic and a large tagset. The experiments
are performed on a tagger model trained with corpus composed of a standard
Albanian text written by Albanian authors. The tagger model is tested using a
cross-validation and a sample text. Results showed that the accuracy of the
trained tagger model in real testing environments was about 70% and up to 98%
where the environment settings were optimized for the best accuracy. It was
also noticed that the overall accuracy for this model depends on the number of
training tokens, level of grammatical and morphological complexity in text,
special cases in language expressions, etc.
For the full version of the thesis contact Besmir Hasanaj at Besmir.Hasanaj@albtelecom.al.