Class Reference – Lucene.NET
March 14th, 2009 by TrilobyteThis article contains an overview of the most important classes in Lucene.
Documents (Lucene.Net.Documents.Document)
Documents are the central entity in Lucene, it is what gets stored in the index. A document can represent any kind of information you want, for example: it can contain database records, emails, HTML pages, word documents, etc.. A document doesn’t have any required attributes, it is basically just a list of [fields]. Optionally you can set a boost for a document.
Fields (Lucene.Net.Documents.Field)
Fields are used to describe a [document]. A field is basically a key value pair containing the name of the field and it’s value. There are different behaviors for a field: there are Keyword fields, UnIndexed fields, Text fields, you can use the constructor functions to instantiate a field.
Field types
| Field method/type | Analyzed | Indexed | Stored | Usage |
|---|---|---|---|---|
| Field.Keyword(string, string) | x | x | URLs, nickname, social security numbers | |
| Field.Keyword(string, dateTime) | x | x | ||
| Field.UnIndexed(string, string) | x | Document type, when not used for search criteria | ||
| Field.UnStored(string, string) | x | x | Document titles and content | |
| Field.Text(string, string) | x | x | x | Document titles and content |
| Field.Text(string, TextReader) | x | x | Document titles and content |
IndexWriter (Lucene.Net.Index.IndexWriter)
The index writer is responsible for writing [documents] to an index. The index can be stored in either a directory on disk (Lucene.Net.Store.FSDirectory), or in memory (Lucene.Net.Store.RAMDirectory). It uses [analyzer] to break up text before a [document] is stored in the index. You can either create a new index or you can change an existing index.
Analyzerer (Lucene.Net.Analysis.Analyzer)
This class, and it derivatives, is responsible for breaking down text into single words or terms and do processing on them. For example: remove commonly used words, like: ‘the’, ‘and’ and ‘a’ or transform word into other words in case of verbs ( walked -> walk ) or even add synonyms.
Term (Lucene.Net.Index.Term)
A term is a key/value pair on which you want to search, where the key is the name of the [field] and value is the value on which to search.
IndexSearcher (Lucene.Net.Search.IndexSearcher)
The index searcher performs the actual search, it opens the index, searches through it using the search [query] and returns the [hits] matching the the [query].
Query (Lucene.Net.Search.Query)
A query describes the results you want to get from a search.
Hits (Lucene.Net.Search.Hits)
A list with all the documents returned from a search operation performed by the [indexreader].
That is it for this article, I hope you learned how Lucene works and that you understand the basic concepts. If you have additional information, comments or questions please don’t hesitate to respond.
In my next article I will show you how to implement a basic index application, which can index a bunch of plain text files from disk. No rocket sience but that is not the point.
2 Replies to “Class Reference – Lucene.NET”
October 20th, 2009 at 17:26
I downloaded the source code for Lucene.NET from the following URL and i did not found the Field.Keyword method can you please help me to get the latest code which has Field.keyword
https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/C%23/
Thanks,
Hem
October 21st, 2009 at 19:32
The API has changed apparently, please check the changelogs, I think the change is documented.