That’s a hOOt! A from scratch, C# based full text indexer and search engine
CodeProject - hOOt - full text search engine
hOOt
is a extremely small size and fast embedded full text search engine for .net built from scratch using an inverted WAH bitmap index. Most people are familiar with an Apache project by the name of Lucene.net which is a port of the original java version. Many people have complained in the past why the .net version of lucene is not maintained, and many unsupported ports of the original exists. To circumvent this I have created this project which does the same job, is smaller, simpler and faster.hOOt
is part of my upcomingRaptorDB document store database
, and was so successful that I decided to release it as a separate entity in the meantime.hOOt
uses the following articles :
- WAH compressed BitArray found here (WAHBitArray.aspx)
- mini Log4net replacement found here (http://www.codeproject.com/KB/miscctrl/minilog4net.aspx)
- MurMur2 hash index and storage file from
RaptorDB
found here (RaptorDB.aspx)fastJSON
serializer found here (http://www.codeproject.com/KB/IP/fastJSON.aspx)- IFilter without COM by Eyal Post found here (http://www.codeproject.com/KB/cs/IFilter.aspx) for the sample application
Based on the response and reaction of users to this project, I will upgrade and enhance
hOOt
to full feature compatibility with lucene.net, so show your love.…
Why Another Full Text Indexer?
I was always fascinated by how Google searches in general and lucene indexing technique and its internal algorithms, but it was just too difficult to follow and anyone who has worked with lucene.net will attest that it is a complicated and convoluted piece of code. While some people are trying to create a more .net optimized version, the fact of the matter is that it is not easy to do with that code base. What amazes me is that nobody has rewritten it from scratch.
hOOt
is much simpler, smaller and faster than lucene.net.One of the reasons for creating
hOOt
was for implementing full text search on string columns in RaptorDB - the document store version. Hopefully more people will be able to use and extendhOOt
instead of lucene.net as it is much easier to understand and change.
hOOt
has been built with the following features in mind:
- Blazing fast operating speed (see performance test section)
- Incredibly small code size.
- Uses WAH compressed BitArrays to store information.
- Multi-threaded implementation meaning you can query while indexing.
- Tiny size only 38kb DLL (lucene.net is ~300kb).
- Highly optimized storage, typically ~30% smaller than lucene.net (the more in the index the greater the difference).
- Query strings are parsed on spaces with the
AND
operator (e.g. all words must exist).The following limitations are in this release:
…
What I really liked about this project was that he not only provided some cool functionality and code (and the implementation of the IFilter stuff), but also explained the concepts behind the code.
What I could help thinking about was how cool would this project be if it was mashed up with, Interactive WinForm Tag Cloud Control (Think “Cool, I can add a Word/Tag Cloud thing to my WinForm app!”)?… hum…
No comments:
Post a Comment