hOOtis a extremely small size and fast embedded full text search engine for .net built from scratch using an inverted WAH bitmap index. Most people are familiar with an Apache project by the name of Lucene.net which is a port of the original java version. Many people have complained in the past why the .net version of lucene is not maintained, and many unsupported ports of the original exists. To circumvent this I have created this project which does the same job, is smaller, simpler and faster.
hOOtis part of my upcoming
RaptorDB document store database, and was so successful that I decided to release it as a separate entity in the meantime.
hOOtuses the following articles :
- WAH compressed BitArray found here (WAHBitArray.aspx)
- mini Log4net replacement found here (http://www.codeproject.com/KB/miscctrl/minilog4net.aspx)
- MurMur2 hash index and storage file from
RaptorDBfound here (RaptorDB.aspx)
fastJSONserializer found here (http://www.codeproject.com/KB/IP/fastJSON.aspx)
- IFilter without COM by Eyal Post found here (http://www.codeproject.com/KB/cs/IFilter.aspx) for the sample application
Based on the response and reaction of users to this project, I will upgrade and enhance
hOOtto full feature compatibility with lucene.net, so show your love.
I was always fascinated by how Google searches in general and lucene indexing technique and its internal algorithms, but it was just too difficult to follow and anyone who has worked with lucene.net will attest that it is a complicated and convoluted piece of code. While some people are trying to create a more .net optimized version, the fact of the matter is that it is not easy to do with that code base. What amazes me is that nobody has rewritten it from scratch.
hOOtis much simpler, smaller and faster than lucene.net.
One of the reasons for creating
hOOtwas for implementing full text search on string columns in RaptorDB - the document store version. Hopefully more people will be able to use and extend
hOOtinstead of lucene.net as it is much easier to understand and change.
hOOthas been built with the following features in mind:
- Blazing fast operating speed (see performance test section)
- Incredibly small code size.
- Uses WAH compressed BitArrays to store information.
- Multi-threaded implementation meaning you can query while indexing.
- Tiny size only 38kb DLL (lucene.net is ~300kb).
- Highly optimized storage, typically ~30% smaller than lucene.net (the more in the index the greater the difference).
- Query strings are parsed on spaces with the
ANDoperator (e.g. all words must exist).
The following limitations are in this release:
What I really liked about this project was that he not only provided some cool functionality and code (and the implementation of the IFilter stuff), but also explained the concepts behind the code.
What I could help thinking about was how cool would this project be if it was mashed up with, Interactive WinForm Tag Cloud Control (Think “Cool, I can add a Word/Tag Cloud thing to my WinForm app!”)?… hum…