A few months ago I was working on a project that had a word cloud-like feature. A word cloud is an interesting way to visually represent a popular theme or topic. I had a dataset of user reviews from another project that we wanted to parse and use. This began my first exposure to Natural Language Processing (NLP) and other advanced text analytics tools.


Eventually I came across a wiki article entitled “A quick guide to using OpenNLP from .NET” that introduced me to a remarkable project called IKVM.NET. After generating a shiney new .NET OpenNLP assembly with the steps provided I was able to use the OpenNLP namespaces with ease in my project.

The first step in using the parsers in OpenNLP was to instantiate a model using Java streams. I created a base class for my NounPhraseParser with a utility method to help load these models.



I think this project worked out remarkably well. I don’t know if I’ll attempt to use something like this in a production environment, but if nothing else it was a very enlightening foray into the interesting world of Natural Language Processing. There are many other subjects in this area that I would like to explore, such as Sentiment Analysis and ways to identify subjects of significance in large bodies of text. As the IBM Watson project demonstrated to us not too long ago, this is a young field with staggering potential. The current trajectory of research along with significant advances in computation capability suggest it won’t be long before we can communicate with computers/information systems as easily as if you were talking to your best friend.



I can't believe it's been 6 years since I've blogged about OpenNLP (sigh, and I've still not worked on the project I had meant to when watching for it then... It's on the list still... but...). Anyway... If you've wanted to do natural language processing (NLP) and are looking for options, then check out Sean's post...

