Plush Thoughts

GATE: Russian POS tagger

Posted in gate, Information Extraction, language processing, natural language processing, NLP by plushloony on October 1, 2009

As I wrote before GATE is ‘good to start’ natural language processing open source framework. I’ll try to write interesting observations and finding related to capabilities of this system. And first of all I would like to share two approaches to processing of Russian language in this system.

Russian segment of world wide web is fast growing source of data and has potential of huge electronic market, so ability to process and make reasonable conclusions from it’s content could be quite important for international corporations very soon. At the same time there are not too many solutions and technologies that focus on Russian texts. Support of russian language always was secondary to any company or technology related to the field of the NLP and GATE is an exception. By here are a few ways how you could add support of Russian morphological analysis to GATE based application.

The first approach is based on commercial (but available for non-commercial use) product of Yandex (www.yandex.ru) –  MyStem.  On the ITBrains website you could find detailed steps and download links that would help you to enable Russian POS tagger functionality to GATE. Unfortunately this site experiences technical issues and I preserved corresponding information here -  Ru-morph-tagger.zip – plugin Documentation (brief guide)

Here is another way to embede support of russian language: Russian POS tagger

Please let me know if these materials are usefull to someone and if translate is needed somewhere. I’d be glad to help.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.