Showing posts with label Lucene. Show all posts
Showing posts with label Lucene. Show all posts

Wednesday, March 19, 2014

Link to Elastic with ElasticLINQ

Brad Wilson - Getting Started with ElasticLINQ

Jim Newkirk and I have been doing xUnit.net for 7 years now (and for Jim, NUnit for many years before that). You could say that open source is part of our blood, and when we left Microsoft, we made sure that open source would continue to be part of our daily efforts at Tier 3.

Fast forward 15 months: Tier 3 has been acquired (and is now the CenturyLink Cloud Development Center), and our first major open source effort Iron Foundry has been accepted into the Cloud Foundry Incubator project. Lots of great developers are working to ensure that you can write .NET code against a Platform-as-a-Service stack that doesn't lock you into a specific vendor.

Today we are proud to announce our second major open source effort: ElasticLINQ.

What is ElasticLINQ?

One of the major challenges when writing distributed software is how to distribute the data. When I started here 15 months ago, we had 4 data centers, and plans to expand into several more over the coming year. The data was being stored primarily in Microsoft SQL Server. As our data center footprint grew, it was becoming clear that centralized data storage was not going to scale with us. Having islands of data means that your application (and your users) can end up spending a lot of time waiting for data requests to go halfway around the world; and if there are any network glitches along the way, you might even fail to get the data entirely.

Almost right away we started evaluating alternatives that would let us keep all the data locally. We decided to use Couchbase as our primary data store, based on its extremely strong Cross Data-Center Replication (XDCR) capabilities. Many object data storage systems end up paired with an index engine for comprehensive searching capabilities. Couchbase provides an indexing integration solution with Elasticsearch, a horizontally scalable wrapper around Lucene.

The Lucene query syntax is based on JSON; ElasticSearch documents are also stored as JSON. Our developers, steeped in the worlds of .NET and SQL Server, were much more comfortable using the Language Integrated Query (LINQ) functions introduced in .NET 3.5.

ElasticLINQ bridges these two worlds by letting us query Elasticsearch using LINQ, and have the results projected into CLR types. We enlisted the expertise of Damien Guard (of Attack Pattern), who worked on both the LINQ to SQL and Entity Framework teams, to do the initial version of ElasticLINQ for us.

How do I use ElasticLINQ?

Connection and Context ...

Querying with the Context ...

Full-text searching ...

Custom queries with ElasticMethods ...

image

Custom queries and projections with ElasticFields ...

What's next?

This is v1.0 software, so we have a lot left that we can do. We've just recently started using this in our production code, and we are constantly finding new things we want to support. We expect you will come up with things we never dreamed of, too.

We are excited for the community to start using and contributing to ElasticLINQ. The Github site is a work in progress. Soon we will get documentation posted to the Wiki pages on the site, and get a real home page set up. We are anxiously awaiting the first community contributed bugs, Wiki edits, and pull requests.

We hope you love using ElasticLINQ as much as we do!

...

Got to love LINQ Love! :)

Now if only I knew something, anything about Elastic... :/

Friday, October 11, 2013

Revisiting Sando - Full Text Index and Source your Source, while never leaving Visual Studio...

Visual Studio Gallery - Sando Code Search Tool

Search your C, C++, C#, and XAML code instantly. Form a better query with identifier-based and phrase-based auto-complete. Explore project terms with the word cloud.

image image

Features

  • Searches source code (C#, C++, C, and xaml) using information retrieval technology
  • Pre-indexes source code to provide near-instant searches
  • Indexes source code once, refreshing only changed files, to avoid unnecessary CPU burden
  • Supports literal searches (e.g., "File f = new File();"), symbol searches (e.g., "_fileDialogTab"), and google-style searches (e.g., "open file")
  • Provides extensive preview of search results with highlighted search terms
  • Highlights search terms in code editor
  • Auto-completion suggests likely query additions (e.g., "open" -> "open file")
  • Auto-corrects spelling (e.g., "solutoin" -> "solution")
  • Auto-recommendation suggests similar words if search term doesn't exist in the source code base (e.g., "fire event" -> "raise event")
  • Provides word cloud of existing terms in source code to help users form a query

Supported Languages: C#, C++, C, xaml

[demo] [source]

Sando: A Fast Local Code Search Engine with Open APIs

Sando: A Fast Local Code Search Engine with Open APIs
Code search sucks. There's no auto-correct or suggestions and results are returned as an unranked, plain-text list. This VS Extension aims to make code search a modern tool by leveraging Lucene to index and search all languages and artifacts, returning results in a rich UI.

....

(via CodeProject - Research-Inspired Extensions Hit Visual Studio App Store)

Blogged about this last year, Code Searching with Sando, because "Code search sucks and Find & Replace is from the 80s...", but of course I forgot all about it right after (those darn curator's!). Seeing it today and seeing that the project is alive and well with very recent check-ins (yesterday), I thought it a good time to remind myself (and you) about this project...

 

Related Past Post XRef:
Code Searching with Sando, because "Code search sucks and Find & Replace is from the 80s..."

Friday, December 14, 2012

Adapting to Lucene.Net with this simplification approach, "Simplifying Lucene using Adapter Pattern, Generics, Reflection, and Custom Attributes"

CodeProject - Simplifying Lucene using Adapter Pattern, Generics, Reflection, and Custom Attributes

I am introducing a draft idea of how we can make Lucene.Net more RAD and also simplifying how .NET developers will interact with Lucene.Net.

The idea is very simple. we need to make the developer code normally in C# Generics. That means the developer just needs to build a list of his document objects List<Student>, for example, and passing this to Lucene.Net to store it using default index and store configuration. Also to search Lucene.Net, either send a free string search query, or build a search object that really represents your criteria to search your document.

By building an Adapter component that will handle the communication between Lucene.Net and C# objects using very simple calls that will allow the result of the idea to come to true using custom attributes to allow us to decorate the Document class properties to explicitly say what type of Index or Store you want this property to be configured in Lucene.Net.

Lucene Cloud Adapter Diagram

Simplifying Lucene using Adapter Pattern, Generics, Reflection, and Custom Attributes

By Haitham Khedre, 14 Dec 2012

0.00 (0 votes)

Introduction

I am introducing a draft idea of how we can make Lucene.Net more RAD and also simplifying how .NET developers will interact with Lucene.Net.

The idea is very simple. we need to make the developer code normally in C# Generics. That means the developer just needs to build a list of his document objects List<Student>, for example, and passing this to Lucene.Net to store it using default index and store configuration. Also to search Lucene.Net, either send a free string search query, or build a search object that really represents your criteria to search your document.

By building an Adapter component that will handle the communication between Lucene.Net and C# objects using very simple calls that will allow the result of the idea to come to true using custom attributes to allow us to decorate the Document class properties to explicitly say what type of Index or Store you want this property to be configured in Lucene.Net.

Lucene Cloud Adapter Diagram

I attached a complete working examples that describe the idea, please feel free to try it and add your comment of how we can extend this.

image

..."

Thought this a pretty unique article and wanted to grab it for future reference and review. And I thought you guys might also find it interesting...

Tuesday, November 06, 2012

Lucene.Net "Full-Text Search for Your Intranet or Website using 37 Lines of Code" article updated for 3 & 4 (Lucene.Net 3 and .Net 4)

CodeProject - DotLucene: Full-Text Search for Your Intranet or Website using 37 Lines of Code [Updated]

image

Update

November 6, 2012: The project is now working with Lucene.Net 3.0 and .NET Framework 4.0. Includes Visual Studio 2010 solution. 

Lucene.Net: Excellent Full-Text Search Engine

Can there be a full-text search coded on 37 lines? Well, I am going to cheat a bit and use Lucene.Net for the dirty work. Lucene.Net is a .NET port of Jakarta Lucene search engine. Here is a quick list of its features:

  • It can be used in ASP.NET, Win Forms or console applications.
  • Very good performance.
  • Ranked search results.
  • Search query highlighting in results.
  • Searches structured and unstructured data.
  • Metadata searching (query by date, search custom fields...).
  • Index size approximately 30% of the indexed text.
  • Can also store full indexed documents.
  • Pure managed .NET.
  • Very friendly licensing (Apache Software License 2.0).
  • Localizable (support for Brazilian, Czech, Chinese, Dutch, English, French, Japanese, Korean and Russian included).
  • Extensible (source code included).

..."

I blogged about this 1.79 million years ago (ok, 7 years ago, but that's close to 1.79M in Net years...), DotLucene: Full-Text Search for Your Intranet or Website using 37 Lines of Code, so when I saw it get an update I thought a shout-out was warranted.

This also highlights the staying power of Lucene...

 

Related Past Post XRef:
DotLucene: Full-Text Search for Your Intranet or Website using 37 Lines of Code

Thursday, August 16, 2012

Lucene 101 at LuceneTutorial.com

LuceneTutorial.com

Discover the Lucene full-text search library

Lucene is an open-source Java full-text search library which makes it easy to add search functionality to an application or website.

The goal of Lucene Tutorial.com is to provide a gentle introduction into Lucene.

First-time Visitors

If this is your first-time here, you most probably want to go straight to the 5 minute introduction to Lucene.

image

I'd call this a 101 (i.e. an introduction, survey, overview, etc) site. It doesn't go into the deep depths of Lucene (and is java targeted) but still there's some good tidbits here and its brevity is a plus in that its easy to quickly scan through. If you've just learned how to spell Lucene and want to know a bit more, this sight is for you. If you are a Lucene Master Ninja, then not so much...

(via Dominic Finn - Lucene Tutorial Site)

Tuesday, July 10, 2012

Code Searching with Sando, because "Code search sucks and Find & Replace is from the 80s..."

CodePlex - Sando: A Fast Local Code Search Engine with Open APIs

"Sando: A Fast Local Code Search Engine with Open APIs
Code search sucks. There's no auto-correct or suggestions and results are returned as an unranked, plain-text list. This VS Extension aims to make code search a modern tool by leveraging Lucene to index and search all languages and artifacts, returning results in a rich UI.

Sando in 8 slides...
Code Search Sucks


Top Ten Things that Suck about Code Search:

  1. Unranked results
  2. No suggestions
  3. No auto-correct
  4. No fuzzy matching
  5. No word stemming
  6. Results returned as plain-text
  7. Indexes keywords like "using"
  8. No preview in results
  9. Hard to refine results
  10. Regular expressions are fragile

..."

Visual Studio Gallery - Sando

Because Find & Replace is from the 80s

Code search sucks. It doesn't support google-style queries and results are returned as an unranked, plain-text list. This VS Extension aims to make code search a modern tool by leveraging Lucene to index and search all languages and artifacts, returning results in a rich UI.

Supported Languages: C#, C++, C

[demo] [source]

image

David C. Shepherd - Code Search in Visual Studio - Video Blog

Ever wonder what Sando's code search looked like in action? This brief video (less than three minutes) will give you a quick overview of Sando's abilities and its advantages over Visual Studio's built-in search.

I love the idea behind this, using a modern full text indexing and searching engine as a backdrop for code search... I also really love that David's released it as OSS.

Wednesday, June 20, 2012

"Getting Started with Lucene.net"

Alkampfer Place - Getting Started with Lucene.net

I started working with Lucene.Net and I should admit that is a real powerful library, but it is really huge and needs a little bit of time to be mastered completely. Probably one of the best resource to keep in mind is the FAQ, because it contains really most of the more common question you can have on Lucene and it is a good place to start. Another good place is the Wiki that contains other useful information and many other link to relevant resources.

Getting started with lucene.net is really simple, after you grabbed the bits and placed a reference in your project you are ready to search in your “documents”. Lucene has a set of basic concepts that you need to grasp before starting using it, basically it has Analyzers that elaborate documents to create indexes that are stored Directory and permits fast search; searches are done with IndexSearcher that are capable of searching data inside a directory previously populated by analyzers and indexes. Now lets see how you can index two long string of text:

...

image..."

I've only been following Lucene.Net (in its many forms) for years, yet I still have not used it in a project. Oh I have a number of ideas where I want to use it, but DPO (Domestic Product Owner) hasn't gotten around to prioritizing those tasks (funny how that mowing the lawn, fixing and painting is always a higher priority than my personal coding projects... lol)

Anyway, I AM going to use it "one of these days" so like to grab these kinds of getting started guides. Thought you might find it useful too...

Thursday, January 19, 2012

A "Get right to work introduction to Lucene.Net"

codeguru - Introduction to Lucene.Net

"What is Lucene.Net?

Lucene.Net is an exact port of the original Lucene search engine library, written in C#. It provides a framework (APIs) for creating applications with full text search.

Lucene.Net can be downloaded from http://incubator.apache.org/lucene.net/download.html. Currently it is undergoing incubation at Apache Software Foundation (ASF).

Why Use Lucene.Net?

You can use Lucene.Net to add more power to an already existing search in your ASP.Net web application or website. It can also be used to index and search documents (word, pdf, etc.) within your application.

This article describes how we can use Lucene.Net to add full text search in our ASP.Net applications. Any search function consists of two basic steps, first to index the text and second to search the text. We will use Lucene.Net to do both of the steps.

In this example we will try to read the content of a text file and index it using Lucene.Net. First download the dll and add a reference to the project.

...

image..."

I liked this "get right to work" introduction to Lucene.net, pretty short sweet yet enough code to get started. Nice.

Friday, December 02, 2011

Two cool NuGets of the day: FlickrNet (3.2.4310) and Lucene.net (2.9.4)

Code Climber - Lucene.net 2.9.4 is out, now with NuGet

There is no official news feed for the project, so I’m just copying here the announcement from their homepage:

We finally got it out the door, it took a lot longer than we expected. However, we have a ton of bug fixes rolled into this release as well as a number of new features.

  • Some of the bug fixes include: concurrency issues, mono compilation issues, and memory leaks.
  • A lot of work has been done to clean up the code base, refactoring the code and project files, and providing build scripts
  • A couple of new features: Search.Regex, Simple Faceted Search, and simple phrase analysis in the Fast Vector Highlighter
  • Download it now on our downloads page

Just around the corner is a 2.9.4g release (early January), that has been substantially refactored and uses generics across the board.

Lucene NuGet Packages:

FlickrNet API Library

The Flickr.Net API Library is a .Net Library for accessing the Flickr API. It is written entirely in C# it can be accessed from the following frameworks:

  • .Net Framework 2.0 and above.
  • .Net Compact Framework 2.0 SP1 and above.
  • Silverlight 3.0 and above.
  • Windows 7 Phone
  • Mono
  • Monotouch for iPhone

FlickrNet NuGet Packages:

FlickrNet is one of my favorite managed Flickr libraries and Lucene.Net speaks for itself... :)

Monday, July 18, 2011

Lucene.net, alive and kicking...

"More than 6 months ago I blogged about Lucene.net starting his path toward extinction. Soon after that, due to the "stubbornness" of the main committer, a few forks appeared, the biggest of which was Lucere.net by Troy Howard.

At the end of the year, despite the promises of the main committer of complying to the request of the Apache board by himself, nothing happened and Lucene.net went really close to be being shut down. But luckily, the same Troy Howard that forked Lucene.net a few months before, decided, together with a bunch of other volunteers, to resubmit the documents required by the Apache Board for starting a new project into the Apache Incubator; by the beginning of February the new proposal was voted for by the Board and the project re-entered the incubator.

Since that day a lot happened to Lucene.net:

What is going to happen next?

At the moment, together with catching up with the long backlog, they are working on a SL/WP7 version of Lucene.net, evaluating a new and better automatic porting system from Java and already looking at starting with Lucene 3.

Shall I start using Lucene.net with confidence?

The answer is definitely yes.  ..."

This is like a granddaddy of Java to .Net ports. Been following its evolution since 2004, the up and downs, and it's been an interested story. Good to see that it's not only not dead, but alive and kicking. The idea of a WP7 version of it already has me thinking projects... :)

Monday, November 01, 2010

Lucene.Net & Azure 101

Windows Azure Technical Forum Support Team Blog - How to use Lucene.Net in Windows Azure

What is Lucene

What is Lucene.Net

How to use Lucene.Net

What is Azure library for Lucene.Net

How to use Azure library for Lucene.Net

Concurrence issues

Conclusions

download…”

Been a while since I’ve blogged about Lucene and since Azure received so much attention last week at PDC2010…

What I liked was how the article walks though some code tweaks needed and also touches on handling concurrency.

Related Past Post XRef:
Lucene.Net and Azure, yes you can…

Friday, October 02, 2009

Getting started with Lucene.Net, the DimeCast version

DimeCasts.Net - # 145 - Getting started with Lucene.Net Search Library

“In this episode we are going to start to take a look at the Lucene.Net Search Engine Library. Lucene.Net is a source code, class-per-class, API-per-API and algorithmatic port of the Java Lucene search engine to the C# and .NET platform utilizing Microsoft .NET Framework. In this episode we will learn the basics needed to simply get Lucene.net up and running.

 image

…”

I have this off again, off again (no, that’s not a type-o ;) thing with Lucene.Net. Been following it for a billion years, want to use it in a project, but have yet to take the jump. One of these days…!

And when I do, this quick 12 minute video looks like a good warm-up point.

Wednesday, September 16, 2009

Lucene.Net and Azure, yes you can…

Microsoft Research - Azure Library for Lucene.Net

“Lucene works on top of an abstract store object called Directory. There are several Directory objects, including FSDirectory, for file systems, and RAMDirectory, for in-memory store. Azure Library for Lucene.Net implements a smart blob-storage Directory object called AzureDirectory which enables the use of Lucene.NET on top of Azure Blob Storage. AzureDirectory automatically creates a local cache of blobs and intelligently auto-uploads them on the fly.
Download Details
File Name: AzureDirectory.zip
Version: 1.0
Date Published: 28 July 2009
Download Size: 0.29 MB
…” [GD: Description Leach Level:95%]
PLEASE NOTE the license;
“…You may use, copy, reproduce, and distribute this Software for any non-commercial purpose, subject to the restrictions in this MSR-LA. Some purposes which can be non-commercial are teaching, academic research, public demonstrations and personal experimentation. You may also distribute this Software with books or other teaching materials, or publish the Software on websites, that are intended to teach the use of the Software for academic or other non-commercial purposes…”
Here’s the contents of the download;
 image
From the Readme.html;
“…
Background
Lucene.NET
Lucene is a mature Java based open source full text indexing and search engine and property store.
Lucene.NET is a mature port of that library to C#.
Lucene/Lucene.Net provides:
* Super simple API for storing documents with arbitrary properties
* Complete control over what is indexed and what is stored for retrieval
* Robust control over where and how things are indexed, how much memory is used, etc.
* Superfast and super rich query capabilities
  • Sorted results
  • Rich constraint semantics AND/OR/NOT etc.
  • Rich text semantics (phrase match, wildcard match, near, fuzzy match etc)
  • Text query syntax (example: Title:(dog AND cat) OR Body:Lucen* )
  • Programmatic expressions
  • Ranked results with custom ranking algorithms
AzureDirectory
AzureDirectory smartly uses a local Directory to cache files as they are created and automatically pushes them to Azure blob storage as appropriate. Likewise, it smartly caches blob files on the client when they change. This provides with a nice blend of just in time syncing of data local to indexers or searchers across multiple machines.
With the flexibility that Lucene provides over data in memory versus storage and the just in time blob transfer that AzureDirectory provides you have great control over the composibility of where data is indexed and how it is consumed.
To be more concrete: you can have 1..N worker roles adding documents to an index, and 1..N searcher webroles searching over the catalog in near real time.
image
There’s been a good number of Lucene.Net posts in the last few weeks (that are in my “stuff I’d like to blog about, but there was something else shiny that distracted me” folder ;) and so this caught my eye.
Azure and Lucene.Net? That’s cool (aka “shiny” ;) I just wish the license was less restrictive…
Update 2/2/2011:
Based on the below comment, seems this project is now MS-PL and can be found at http://code.msdn.microsoft.com/AzureDirectory
Note: Seems that URL returns a "page not found" right this second, but in search the site, it appears this project is indeed there and this error is temporary...

Monday, January 05, 2009

Two new Lucene.Net Articles, Text Analysis and Custom Synonym Analyzer

CodeProject - Lucene.Net - Text Analysis

“Lucene.Net is a high performance Information Retrieval (IR) library, also known as a search engine library. Lucene.Net contains powerful APIs for creating full text indexes and implementing advanced and precise search technologies into your programs. Some people may confuse Lucene.net with a ready to use application like a web search/crawler, or a file search application, but Lucene.Net is not such an application, it's a framework library. Lucene.Net provides a framework for implementing these difficult technologies yourself. Lucene.Net makes no discriminations on what you can index and search, which gives you a lot more power compared to other full text indexing/searching implications; you can index anything that can be represented as text. There are also ways to get Lucene.Net to index HTML, Office documents, PDF files, and much more.

What are Analyzers?

An Analyzer has a single job, and that is to be a advanced work breaker. Which an object that will read a stream of text and break apart the words into objects called Tokens. The Token class will generally hold the results of the analysis as individual words. This is a very brief summary of what an Analyzer can do and how it affects your full text index. A good Analyzer will not only break the words apart, but it is also performs a transformation of the text to make it more suitable for indexing. One simple transformation an Analyzer can do is to lowercase everything it comes across, that way your index will be case insensitive. 

In the Lucene framework there are two major spots where an Analyzer is used, and that is when indexing and then searching. For the indexing portion, the direct results of the Analyzer is what gets indexed. So for example, in a previous example of an Analyzer that will convert everything to lowercase, if we come across the word "CAT", the analyzer will output "cat", and in the full text index, a Term of "cat" will be associated with the Document. For an even bigger example if we use an Analyzer that will break the words apart with the spaces, and then the Analyzer will convert it all to lowercase the follow the results should look something like this.

Attached to this article is the the Analyzer Viewer application, that I made. Attached are both the source and a ready to run binary of the application.. The sample is more like a little utility to see how the basic Analyzers included with Lucene.Net will view text. The application will allow you to directly input some text, and it will show you all the results of the text analysis, and how it split them up into tokens and what transformations it applied.

Some interesting things to looks at include, typing in email addresses, numbers with letters, numbers alone, acronyms, alternating cases, and just anything else you want to play with to see how the indexing process goes. 

Implementations of a Tokenizer.

As i mentioned earlier the Tokenizer class is an abstract base class of a TokenStream. Lucene.Net provides a few implementations of a Tokenizer that it uses in some of the Analyzers. Here is a couple of them and a small description of each.

KeywordTokenizer - This Tokenizer will read the entire stream of text and return the whole things as a single Token.

…”

CodeProject - Lucene.Net – Custom Synonym Analyzer

“…

How Do I Get Lucene.Net to Work with Synonyms?

The goal here is to be able to search for a word and be able to retrieve results that contain words that have the same meaning as the words you are searching for. This will allow you to be able to kind of search by meaning than search by the keywords.

We can easily get Lucene.Net to work with synonyms by creating a custom Analyzer class. The Analyzer will be able to inject the synonyms into the full text index. For some details on the internals of an Analyzer, please see my previous article Lucene.Net – Text Analysis.

Points of Interest

The SynonymAnalyzer is really great for indexing, but I think it might junk up a Query if you plan to use the SynonymAnalyzer for use with a QueryParser to construct a query. One way around this is to modify the SynonymFilter, and SynonymAnalyzer to have a bool switch to turn the synonym injection on and off. That way you could turn the synonym injection off while you are using it with a QueryParser.

The code attached includes the Analyzer Viewer application that I had in my last article, but it also includes an update to include our brand new synonym analyzer.

..”

 

Two new cool Lucene.net articles from Andrew Smith (blog). I swear that I’m going to use Lucene.Net one of these days… ;)

 

Related Past Post XRef:
Five pages to getting started with Lucene.Net - Introducing Lucene.Net
Lucene.Net & C# Indexing and Searching WinForm Example
Lucene.Net Resource List – Books, links and API’s, oh my…
LINQ to Lucene
Using Lucene.Net to Index And Search C# Source
Lucene.Net 2.0 Final Released
"DotLucene / Lucene.Net has moved to ASF"
Indexing Database Content with dotLucene
DotLucene: Full-Text Search for Your Intranet or Website using 37 Lines of Code

Tuesday, September 30, 2008

Five pages to getting started with Lucene.Net - Introducing Lucene.Net

CodeProject - Introducing Lucene.Net

“Lucene.Net is a high performance Information Retrieval (IR) library, also known as a search engine library. Lucene.Net contains powerful APIs for creating full text indexes and implementing advanced and precise search technologies into your programs. Some people may confuse Lucene.net with a ready to use application like a web search/crawler, or a file search application, but Lucene.Net is not such an application, it's a framework library. Lucene.Net provides a framework for implementing these difficult technologies yourself. Lucene.Net makes no discriminations on what you can index and search, which gives you a lot more power compared to other full text indexing/searching implications; you can index anything that can be represented as text. There are also ways to get Lucene.Net to index HTML, Office documents, PDF files, and much more.

Lucene.Net is an API per API port of the original Lucene project, which is written in Java even the unit tests were ported to guarantee the quality. Also, Lucene.Net index is fully compatible with the Lucene index, and both libraries can be used on the same index together with no problems. A number of products have used Lucene and Lucene.Net to build their searches; some well known websites include Wikipedia, CNET, Monster.com, Mayo Clinic, FedEx, and many more. But, it’s not just web sites that have used Lucene; there is also a product that has used Lucene.Net, called Lookout, which is a search tool for Microsoft Outlook that just brought Outlook’s integrated search to look painfully slow and inaccurate.

Lucene.Net is currently undergoing incubation at the Apache Software Foundation. Its source code is held in a subversion repository and can be found here. If you need help downloading the source, you can use the free TortoiseSVN, or RapidSVN. The Lucene.Net project always welcomes new contributors. And, remember, there are many ways to contribute to an open source project other than writing code.

…”

I know I’m a day late and a dollar short with this as it’s been already linked to by most the top .Net link bloggers, still I’ve been watching the Lucene.Net space for years (4+) and am happy to link to another article on how to get started with it (as I swear I’m going to use it one day… really, truly I am…!  ;)

The article is short and sweet and to the point. It provides a great starting point for getting going with Lucene.Net (funny that given the article title) with a adding a “document” to index as well as how to search. I think this is one of the best getting started with Lucene.Net guides I’ve seen in a while…

 

Related Past Post XRef:
Lucene.Net & C# Indexing and Searching WinForm Example
Using Lucene.Net to Index And Search C# Source
Lucene.Net 2.0 Final Released
DotLucene 1.9 Final Released
"DotLucene / Lucene.Net has moved to ASF"
Indexing Database Content with dotLucene
DotLucene: Full-Text Search for Your Intranet or Website using 37 Lines of Code
Lucene.Net 1.4.0 Beta build-001
Open Lucene.NET - The Open Source Search Engine
SourceForge.net: Lucene.Net core moved from SF
SourceForge.net: Project Info - Lucene.Net search engine

Monday, December 24, 2007

Lookout and Outlook 2007

Mikes Lookout - How to Install Lookout on Outlook 2007

"I got another request today from an old friend for how to make Lookout run inside Outlook 2007.  I’ve probably received a thousand such requests over the last few years…  Since I recently installed Outlook 2007, I finally was able to test it out. 

This fix should make Lookout work.  However, if you have other .NET addins running in Outlook, there is a chance they will no longer work.  The fix is reversible though, so don’t be too scared.  But this fix is definitely for the tech savvy.  Gory details:

..."

If you can scrounge up Lookout (the very cool .Net and Lucene based full text indexing and search tool) and want to use it with Outlook 2007 then you'll need these tips.

Personally I'm happy enough with the search built into Outlook 2007 that I don't plan on going back to Lookout, but it's good to know this information is here incase I need it... (or incase there are others out there that don't share my feelings about Outlook 2007's Search... ;)

But it does kind of irk me that Microsoft bought the company/product and then quietly killed it. Didn't subsume it, re-label it, just seemed to kill it. It's not available from any MS site anymore so you're stuck trusting the downloads you find "out there".

At least they could release the source...? (say via CodePlex as an 'real world' example of extending Outlook? Sure the code is dated now, probably in .Net 1.1, but still it would be pretty cool to see it...)

(via Joel on Software - Getting Lookout to run on Outlook 2007 again)

Wednesday, November 14, 2007

LINQ to Lucene

CodePlex - LINQ to Lucene

"What is LINQ to Lucene?

Providing a custom LINQ solution for the Lucene Information Retrieval System, commonly referred to as a search-engine.

Current Release
The Query Release
Tuesday, 13 November 2007
This release provides a real focus on the querying abilities of the LINQ to Lucene project and is the first real 'working release', converting LINQ statements to Lucene queries with deferred query execution and object creation or projection. It culminates the majority of the required querying features for LINQ that Lucene provides natively.

..."

LINQ, LINQ, LINQ... I think it's time to work on LINQ to Greg's Brain... ;)

That aside, this looks like a cool project. Tracking...

Thursday, September 13, 2007

Lucene.Net & C# Indexing and Searching WinForm Example

ASPCode.net - C# and Lucene to index and search

"This sample will show you how to use Lucene from your .NET application to index and search content. There are some articles and samples to be found on the web, but it seems that they are a bit outdated. Myself I used Lucene version 1.4 something some year(s) ago and thought now that I needed it again I could just download the new dll:s and copy my existing code. Turns out they have made quite a few API changes.

..."

Extra and updated examples never hurt...

If you have plain text (or can get it, say via IFilter) and need or want full text indexing/searching, then Lucene.net deserves a look.

I've seen it used in a number of open source/free/etc projects and have always been impressed with its indexing speed and query performance.

(via DotNetKicks.com - C# and Lucene to index and search)

 

Related Past Post XRef: (I think that's all of them. Been following Lucene for a bit... :)
Using Lucene.Net to Index And Search C# Source
Lucene.Net 2.0 Final Released
DotLucene 1.9 Final Released
"DotLucene / Lucene.Net has moved to ASF"
Indexing Database Content with dotLucene
DotLucene: Full-Text Search for Your Intranet or Website using 37 Lines of Code
Lucene.Net 1.4.0 Beta build-001
Open Lucene.NET - The Open Source Search Engine
SourceForge.net: Lucene.Net core moved from SF
SourceForge.net: Project Info - Lucene.Net search engine

Wednesday, July 04, 2007

Using Lucene.Net to Index And Search C# Source

SimoneB's Blog - Indexing and searching source code with Lucene.Net

"...

My idea was to create a homemade source code indexing and search service, so I started fiddling with Lucene.Net, CastleProject, C# Parser and a couple other open source projects to see what I could come up with. There are already a lot of services which allows to search source code online, see Krugle, Google Code Search and Koders among others.

Well, of course I couldn't use one of them as my course project, so I started implementing my own. I called it CS2 - C Sharp Code Search, and its source code is available under the MIT license on its Google Project Hosting website. I think it's a good example of the usage of Lucene.Net and CastleProject's IoC container in a wanna be real life project.

At the moment only the indexing part is implemented and you can see it working launching the console application project contained in the solution. ..."

Wow, do you see the brightly lit 30w CFC light-bulb floating over my head?

This post is the basis for a great idea. Internal/in-house/behind the firewall/IP protected source code full text indexing and searching.

Think how cool it would be to full text search ALL the source code hosted in a TFS server? Tie into the TFS event and a service could index code as it's checked in. With a web front end (or a Part in the SharePoint team portal), all the developers in-house (or connected to your network via VPN, etc) could search your source code repository.

And not just boring/normal (Window Search, Google desktop, X1, etc) full text indexing, but fully parsed indexing. So searches could be limited to methods, properties, etc ("method:DoSomeWork CONTAINS blablabla" or "property:SillyFlag" or "comment:TODO" or ProjectAssembly:InHouseAssembly.DLL or... )

Think about dependency searching (list all projects using a given assembly, etc) or re-use scenarios (find other projects/code snips where object/property/method XYZ is used) or refactoring (how may times has this same code snip been copied over and over) or code review or... or...

I mean, we don't have anything like this available to us now, do we (remember, I'm talking server based, full repository, in house parsed source code full text indexing)? Doesn't this seem like a no-brainer?

Hum....

(via DotNetKicks.com - Indexing and searching source code with Lucene.Net)