Sunday, October 30, 2005

"Statistical parsing of English sentences"

Statistical parsing of English sentences - The Code Project - C# Programming

"...OpenNLP is both the name of a group of open source projects related to natural language processing (NLP), and the name of a library of NLP tools written in Java by Jason Baldridge, Tom Morton, and Gann Bierner. My C# port is based upon the latest version (1.2.0) of the Java OpenNLP tools, released in April 2005. Development of the Java library is ongoing, and I hope to update the C# port as new developments occur.

Tools included in the C# port are: a sentence splitter, a tokenizer, a part-of-speech tagger, a chunker (used to 'find non-recursive syntactic annotation such as noun phrase chunks'), a parser, and a name finder. ... . All of these tools are driven by maximum entropy models processed by the SharpEntropy library.

...

Name finding

"Name finding" is the term used by the OpenNLP library to refer to the identification of classes of entity within the sentence - for example, people's name, locations, dates, and so on. The name finder can find up to seven different types of entity, represented by the seven maximum entropy model files in the NameFind subfolder - date, location, money, organization, percentage, person and time. It would, of course, be possible to train new models using the SharpEntropy library, to find other classes of entity. Since this algorithm is dependent on the use of training data, and there are many, many tokens that might come into a category such as "person" or "location", it is far from foolproof. ..."


I should be able to use this, or something like it.

Hum... Interesting...

No comments: