Back in 2007 I posted a study notes post where I was planning to post about DotLucene and have forgotten about it. The other day I found these notes in my Google Notebook. Instead of lost and forgotten in my Google Notebook, it’s better to share here so someone can make a good use of it. These are just links and notes dump that I found over various sites. I hope I didn’t forget to add the link back references for all the notes.

What is Lucene.Net?

[ # ] Lucene.Net is a source code, class-per-class, API-per-API and algorithmatic port of the Java Lucene search engine to the C# and .NET platform utilizing Microsoft .NET Framework.

Lucene.Net sticks to the APIs and classes used in the original Java implementation of Lucene. The API names as well as class names are preserved with the intention of giving Lucene.Net the look and feel of the C# language and the .NET Framework. For example, the method Hits.length() in the Java implementation now reads Hits.Length() in the C# port.

In addition to the APIs and classes port to C#, the algorithm of Java Lucene is ported to C# Lucene. This means an index created with Java Lucene is back-and-forth compatible with the C# Lucene; both at reading, writing and updating. In fact a Lucene index can be concurrently searched and updated using Java Lucene and C# Lucene processes.


What happens to Lucene.NET library?

This has no effect on Lucene.NET project incubated at Apache (http://incubator.apache.org/lucene.net/) as dotlucene.net was an independent site promoting DotLucene/Lucene.NET.

Where do I find more about Lucene.NET?

Is dotlucene.net content available?

The devoted dotlucene.net fans can download www.dotlucene.net content as a zip archive: www.dotlucene.net.zip (476 kB). DotLucene API Search demo 1.1 (2.38 MB)

Lucene.Net Architecture [ # ]

Lucene.Net Architecture

The lower layer is the data access layer (Storage). Then, the upper layer is about accessing the index files (data access). This layer is used by the indexing system and the searching system. On top of those we find a layer for searching and a search request parser layer used by the searching part of Lucene.Net. Identically we found a parser layer and a document layer used for the indexation part of Lucene.Net.

[ # ] At the heart of the engine is the index (similar to a database table) with fields (like database columns) that contains documents (like database rows). To search one must write a query and give it to the engine to finding matching documents. The query language for a database is SQL and for Lucene it’s a query object (you can construct complex queries by composing an object graph of query instances).

Search engines are super fast for finding text because documents are stored in an inverted index (where the terms of each field is tokenized, hashed and sorted at index time).

In contrast to database queries, search engines can calculate relevance scores when searching. This is because they use a better querying model called the vector space model instead of the classical boolean model. In the vector model, documents and queries are represented as vectors. The similarity between a query and any document can be calculated with simple vector operations. Documents with a higher similarity will appear higher in the results. Conversely, databases only know if rows meets the where criteria or not and cannot compute a relevance score – this true/false classification is how the boolean model got it’s name.

Resources/Reference links

Lucene.Net Cross-Platform Implementations: LuceneImplementations

Lucene implementations in languages other than Java:

  • CLucene – Lucene implementation in C++
  • Lucene.Net – Lucene implementation in .NET
  • Lucene4c – Lucene implementation in C
  • LuceneKit – Lucene implementation in Objective-C (Cocoa/GNUstep support)
  • Lupy – Lucene implementation in Python (RETIRED)
  • NLucene – another Lucene implementation in .NET (out of date)
  • Zend Search – Lucene implementation in the Zend Framework for PHP 5
  • Plucene – Lucene implementation in Perl
  • KinoSearch – a new Lucene implementation in Perl
  • PyLucene – GCJ-compiled version of Java Lucene integrated with Python
  • MUTIS – Lucene implementation in Delphi
  • Ferret – Lucene implementation in Ruby

Other Search/Indexing Services

No related posts.