Wednesday, September 16, 2009

Lucene.Net and Azure, yes you can…

Microsoft Research - Azure Library for Lucene.Net

“Lucene works on top of an abstract store object called Directory. There are several Directory objects, including FSDirectory, for file systems, and RAMDirectory, for in-memory store. Azure Library for Lucene.Net implements a smart blob-storage Directory object called AzureDirectory which enables the use of Lucene.NET on top of Azure Blob Storage. AzureDirectory automatically creates a local cache of blobs and intelligently auto-uploads them on the fly.
Download Details
File Name: AzureDirectory.zip
Version: 1.0
Date Published: 28 July 2009
Download Size: 0.29 MB
…” [GD: Description Leach Level:95%]
PLEASE NOTE the license;
“…You may use, copy, reproduce, and distribute this Software for any non-commercial purpose, subject to the restrictions in this MSR-LA. Some purposes which can be non-commercial are teaching, academic research, public demonstrations and personal experimentation. You may also distribute this Software with books or other teaching materials, or publish the Software on websites, that are intended to teach the use of the Software for academic or other non-commercial purposes…”
Here’s the contents of the download;
 image
From the Readme.html;
“…
Background
Lucene.NET
Lucene is a mature Java based open source full text indexing and search engine and property store.
Lucene.NET is a mature port of that library to C#.
Lucene/Lucene.Net provides:
* Super simple API for storing documents with arbitrary properties
* Complete control over what is indexed and what is stored for retrieval
* Robust control over where and how things are indexed, how much memory is used, etc.
* Superfast and super rich query capabilities
  • Sorted results
  • Rich constraint semantics AND/OR/NOT etc.
  • Rich text semantics (phrase match, wildcard match, near, fuzzy match etc)
  • Text query syntax (example: Title:(dog AND cat) OR Body:Lucen* )
  • Programmatic expressions
  • Ranked results with custom ranking algorithms
AzureDirectory
AzureDirectory smartly uses a local Directory to cache files as they are created and automatically pushes them to Azure blob storage as appropriate. Likewise, it smartly caches blob files on the client when they change. This provides with a nice blend of just in time syncing of data local to indexers or searchers across multiple machines.
With the flexibility that Lucene provides over data in memory versus storage and the just in time blob transfer that AzureDirectory provides you have great control over the composibility of where data is indexed and how it is consumed.
To be more concrete: you can have 1..N worker roles adding documents to an index, and 1..N searcher webroles searching over the catalog in near real time.
image
There’s been a good number of Lucene.Net posts in the last few weeks (that are in my “stuff I’d like to blog about, but there was something else shiny that distracted me” folder ;) and so this caught my eye.
Azure and Lucene.Net? That’s cool (aka “shiny” ;) I just wish the license was less restrictive…
Update 2/2/2011:
Based on the below comment, seems this project is now MS-PL and can be found at http://code.msdn.microsoft.com/AzureDirectory
Note: Seems that URL returns a "page not found" right this second, but in search the site, it appears this project is indeed there and this error is temporary...

1 comment:

Anonymous said...

Hey, this project now has an open MS-PL license and is available on http://code.msdn.microsoft.com/AzureDirectory
It would be great for you to update your post