Me, Myself and Mayvelous
14 Feb
Back in 2007 I posted a study notes post where I was planning to post about DotLucene and have forgotten about it. The other day I found these notes in my Google Notebook. Instead of lost and forgotten in my Google Notebook, it’s better to share here so someone can make a good use of it. These are just links and notes dump that I found over various sites. I hope I didn’t forget to add the link back references for all the notes.
[ # ] Lucene.Net is a source code, class-per-class, API-per-API and algorithmatic port of the Java Lucene search engine to the C# and .NET platform utilizing Microsoft .NET Framework.
Lucene.Net sticks to the APIs and classes used in the original Java implementation of Lucene. The API names as well as class names are preserved with the intention of giving Lucene.Net the look and feel of the C# language and the .NET Framework. For example, the method Hits.length() in the Java implementation now reads Hits.Length() in the C# port.
In addition to the APIs and classes port to C#, the algorithm of Java Lucene is ported to C# Lucene. This means an index created with Java Lucene is back-and-forth compatible with the C# Lucene; both at reading, writing and updating. In fact a Lucene index can be concurrently searched and updated using Java Lucene and C# Lucene processes.
What happens to Lucene.NET library?
This has no effect on Lucene.NET project incubated at Apache (http://incubator.apache.org/lucene.net/) as dotlucene.net was an independent site promoting DotLucene/Lucene.NET.
The devoted dotlucene.net fans can download www.dotlucene.net content as a zip archive: www.dotlucene.net.zip (476 kB). DotLucene API Search demo 1.1 (2.38 MB)

The lower layer is the data access layer (Storage). Then, the upper layer is about accessing the index files (data access). This layer is used by the indexing system and the searching system. On top of those we find a layer for searching and a search request parser layer used by the searching part of Lucene.Net. Identically we found a parser layer and a document layer used for the indexation part of Lucene.Net.
[ # ] At the heart of the engine is the index (similar to a database table) with fields (like database columns) that contains documents (like database rows). To search one must write a query and give it to the engine to finding matching documents. The query language for a database is SQL and for Lucene it’s a query object (you can construct complex queries by composing an object graph of query instances).
Search engines are super fast for finding text because documents are stored in an inverted index (where the terms of each field is tokenized, hashed and sorted at index time).
In contrast to database queries, search engines can calculate relevance scores when searching. This is because they use a better querying model called the vector space model instead of the classical boolean model. In the vector model, documents and queries are represented as vectors. The similarity between a query and any document can be calculated with simple vector operations. Documents with a higher similarity will appear higher in the results. Conversely, databases only know if rows meets the where criteria or not and cannot compute a relevance score – this true/false classification is how the boolean model got it’s name.
Lucene implementations in languages other than Java:
Related Posts:
2 Responses for "Dear DotLucene"
[...] Dear DotLucene [...]
You’ve really got some great notes here.