Sunday, October 30, 2005

"Statistical parsing of English sentences"

Statistical parsing of English sentences - The Code Project - C# Programming

"...OpenNLP is both the name of a group of open source projects related to natural language processing (NLP), and the name of a library of NLP tools written in Java by Jason Baldridge, Tom Morton, and Gann Bierner. My C# port is based upon the latest version (1.2.0) of the Java OpenNLP tools, released in April 2005. Development of the Java library is ongoing, and I hope to update the C# port as new developments occur.

Tools included in the C# port are: a sentence splitter, a tokenizer, a part-of-speech tagger, a chunker (used to 'find non-recursive syntactic annotation such as noun phrase chunks'), a parser, and a name finder. ... . All of these tools are driven by maximum entropy models processed by the SharpEntropy library.


Name finding

"Name finding" is the term used by the OpenNLP library to refer to the identification of classes of entity within the sentence - for example, people's name, locations, dates, and so on. The name finder can find up to seven different types of entity, represented by the seven maximum entropy model files in the NameFind subfolder - date, location, money, organization, percentage, person and time. It would, of course, be possible to train new models using the SharpEntropy library, to find other classes of entity. Since this algorithm is dependent on the use of training data, and there are many, many tokens that might come into a category such as "person" or "location", it is far from foolproof. ..."

I should be able to use this, or something like it.

Hum... Interesting...

No comments:

Post a Comment

NOTE: Anonymous Commenting has been turned off for a while... The comment spammers are just killing me...

ALL comments are moderated. I will review every comment before it will appear on the blog.

Your comment WILL NOT APPEAR UNTIL I approve it. This may take some hours...

I reserve, and will use, the right to not approve ANY comment for ANY reason. I will not usually, but if it's off topic, spam (or even close to spam-like), inflammatory, mean, etc, etc, well... then...

Please see my comment policy for more information if you are interested.


PS. I am proactively moderating comments. Your comment WILL NOT APPEAR UNTIL I approve it. This may take some hours...