In this article we will build faceted search and drill-down functionality into our application. Implementing faceted search is actually quiet simple, once you know the concepts. I will start with some theory behind the implementation and then I will show you the actual code. This article elaborates on my previous Lucene articles and I assume you have read indexing, searching and alternatives. If you didn’t read those you can download the starting point source code here.

The theory

The most important thing you should keep in mind is that the every document in the Lucene index is at a fixed position, called the index. When performing a search, you basically get a BitVector(Java) or BitArray(.NET) back from the IndexReader. Every bit that is turned on ( 1, true ) represents a hit, the location of the bit in the BitArray corresponds with the location of a document 3. For example: 00010100 means that the third and fifth documents are hits for the executed search query.

We can use these knowledge to compare two result sets to each other and get the common results from them using a bit wise AND operation. For example: 00010100 AND 00000100 means that only the third document is present in both result sets. When we calculate the number of bits turned on, we know how many documents are present in both result sets.

Example:

We have a facet called content type and it contains the values: ‘news’, ‘articles’ and ‘vacancies’. We initially perform a search operation on each of those values and store their BitArray for later use.

Next we perform a search on a user given term, for example ‘lucene’ and we retrieve the BitArray for this result set.

Now we can get the number of news items in the result set by applying a bit wise AND operation on the BitArrays of the lucene term result set and news facet value result set. Once we calculate the cardinality of the resulting BitArray we now how many documents in the lucene term result set are actually news documents.

This covers the theory behind the faceted search in Lucene, lets get down and dirty with some code.

The code

I assume that you went through the previous articles and have the indexer and search interface working. Optionaly, you can download the starting point source code here.

First lets add a little helper function which can calculate the cardinality of a BitArray. I am not going to explain this function in depth, see my blog post for more details.

Now lets implement the magic:

private static void FacetedSearch(string indexPath, string genre, string term)
{
// create searcher
var searcher = new IndexSearcher(indexPath);

// first get the BitArray result from the genre query
var genreQuery = new TermQuery(new Term("genre", genre));
var genreQueryFilter = new QueryFilter(genreQuery);
BitArray genreBitArray = genreQueryFilter.Bits(searcher.GetIndexReader());
Console.WriteLine("There are " + GetCardinality(genreBitArray) + " document with the genre " + genre);

// Next perform a regular search and get its BitArray result
Query searchQuery = MultiFieldQueryParser.Parse(term, new[] {"title", "description"}, new[] {BooleanClause.Occur.SHOULD, BooleanClause.Occur.SHOULD}, new StandardAnalyzer());
var searchQueryFilter = new QueryFilter(searchQuery);
BitArray searchBitArray = searchQueryFilter.Bits(searcher.GetIndexReader());
Console.WriteLine("There are " + GetCardinality(searchBitArray) + " document containing the term " + term);

// Now do the faceted search magic, combine the two bit arrays using a binary AND operation
BitArray combinedResults = searchBitArray.And(genreBitArray);
Console.WriteLine("There are " + GetCardinality(combinedResults) + " document containing the term " + term + " and which are in the genre " + genre);
}

First we open the index reader, we have done this plenty of times, no news here.

Next we get the bit array of a term query on the genre term. This bit array contains a list of bits each indicating whether a document is in selected by the query or not.

Then we perform a regular search, I covered this kind of code in my previous article. However, instead of iterating through the result we retrieve the corresponding bit array.

Now that we have both bit arrays we can determine which documents are in both results by combining them using a bitwise AND operation. We can use the cardinality of the resulting bit array to give use a possible drill down.

That is it for this article. You can download the full source code here.

Share and Enjoy:
  • Print
  • Twitter
  • LinkedIn
  • Digg
  • DZone
  • del.icio.us
  • Google Bookmarks
  • StumbleUpon
  • Technorati