Plush Thoughts

NLP for free! NLP for fun!

Posted in gate, Information Extraction, NLP by plushloony on July 31, 2009

I can’t say that I know a lot services on the web that provide result of sophisticated natural language processing (NLP). Ok, there are huge search machines, news aggregators, plagiarism identification processors, some academic research projects and … what else?…Sure on the corporate and government level there are systems that are processing data like unstructured customers’ feedback, communication data for security purposes, news etc, but NLP gives mostly nothing to average people. Nothing for having fun! Unlikely it’s hard to suggest cool problem to resolve using NLP methods.  I simply believe that not too many people know how interesting is to work on them. More over most part of software created for NLP and information extraction purposes is a subject of research and absolutely free. Don’t think that its ugly student-made programs, if you think so – take a look at GATE (General Architecture for Text Engineering) by Sheffield University.

It’s really powerful open source software that already has dozens of extensions and could be used in mostly all text processing tasks. System is well documented and has preconfigured module called ANNIE to solve standard problem of annotating English text with morphological information, tokenizing, stemming, extracting named entities. Also it’s simple to write your own grammar rules to extract any kind of information you need. Try to play with it and may be you’d have an idea of how to get some value from it! I’d share some experience of using it in the further posts.

Follow

Get every new post delivered to your Inbox.