GATE: Russian POS tagger
As I wrote before GATE is ‘good to start’ natural language processing open source framework. I’ll try to write interesting observations and finding related to capabilities of this system. And first of all I would like to share two approaches to processing of Russian language in this system.
Russian segment of world wide web is fast growing source of data and has potential of huge electronic market, so ability to process and make reasonable conclusions from it’s content could be quite important for international corporations very soon. At the same time there are not too many solutions and technologies that focus on Russian texts. Support of russian language always was secondary to any company or technology related to the field of the NLP and GATE is an exception. By here are a few ways how you could add support of Russian morphological analysis to GATE based application.
The first approach is based on commercial (but available for non-commercial use) product of Yandex (www.yandex.ru) – MyStem. On the ITBrains website you could find detailed steps and download links that would help you to enable Russian POS tagger functionality to GATE. Unfortunately this site experiences technical issues and I preserved corresponding information here - Ru-morph-tagger.zip – plugin; Documentation (brief guide)
Here is another way to embede support of russian language: Russian POS tagger
Please let me know if these materials are usefull to someone and if translate is needed somewhere. I’d be glad to help.
leave a comment