Alternatives, did you mean… – Lucene.NET
March 14th, 2009 by TrilobyteIn this article I will explain how you can implement the auto correct feature, commonly known as ‘did you mean …’. Google does it, why won’t you provide the same functionality for your users? With Lucene you can easily build this functionality into your own applications, in the next few sections I will show you how.
I assume you have read my previous articles and that you have the index and search application up and running, if not: you can download the source code of our starting point here.
The Theory:
I will start by explaining what the theory is behind auto correct in Lucene. It is basically very simple: at index time you build a table containing all the words in the document and their frequency of occurrence. This words table is stored into a separate structure called the spell index.
Now that you have your spell index you can look (similar) words up by using a SpellChecker.
The code:
I assume that you have the indexer and searcher up and running, otherwise download the source code here. First make sure the SpellChecker assembly is referenced.
Next, add an additional namespace:
using SpellChecker.Net.Search.Spell;
First we are going to index the words in the field description with the following method:
private static void IndexWords(string indexPath, string spellPath)
{
// open the index reader
IndexReader indexReader = IndexReader.Open(FSDirectory.GetDirectory(indexPath, false));
// create the spell checker
var spell = new SpellChecker.Net.Search.Spell.SpellChecker(FSDirectory.GetDirectory(spellPath, false));
// add all the words in the field description to the spell checker
spell.IndexDictionary(new LuceneDictionary(indexReader, "description"));
}
This will create an index of the all the words in the field description, we can query that index to find similar words.
The following code queries the spell index and suggests words that are like the specified term:
private static void SuggestSimilar(string spellPath, string term)
{
// create the spell checker
var spell = new SpellChecker.Net.Search.Spell.SpellChecker(FSDirectory.GetDirectory(spellPath, false));
// get 2 similar words
string[] similarWords = spell.SuggestSimilar(term, 2);
// show the similar words
for (int wordIndex = 0; wordIndex < similarWords.Length; wordIndex++)
Console.WriteLine(similarWords[wordIndex] + " is similar to " + term);
}
This will find 2 words that are similar to the specified term.
putting it all together:
private static void Main(string[] args)
{
// create a directory to store the index in
string rootPath = @"c:\LuceneSampleCatalog";
Directory.CreateDirectory(rootPath);
// create a directory to store the index in
string indexPath = rootPath + @"\Index";
Directory.CreateDirectory(indexPath);
// create a directory to store the spell index in
string spellPath = rootPath + @"\Spell";
Directory.CreateDirectory(spellPath);
// index the books
IndexBooks(indexPath);
// index the words
IndexWords(indexPath, spellPath);
// search the created index
Search(indexPath, "Sequel");
// Suggest similar words
SuggestSimilar(spellPath, "Ingland");
}
That is it for this article, you can download the full source code here, in the next article I will show you how you can implement faceted search in your application.
2 Replies to “Alternatives, did you mean… – Lucene.NET”
December 4th, 2009 at 03:18
Hi,
Will like to know if this spellchecker component is within the lucene.net library? Or can we download other library just to support this feature.
December 7th, 2009 at 20:21
Hello Joff,
It depends on the actual version of Lucene you are using. Up till 2.0 IIRC it was integrated in the core. After 2.0 it was continued as a separate project.
You can find the binary in the Lucene.NET download: http://incubator.apache.org/lucene.net/download/ it is located in the contrib folder.
You can download the source from: https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/C%23/contrib/SpellChecker.Net/