Lucene 2.3 – working example of Indexer and Searcher

Hi All,

Now days I was working on Lucene a Java API that offers you search capability for your application.
Lucene is a powerful search library that lets you easily add search to any application. One of the key
factors behind Lucene’s popularity is its simplicity, but don’t let that fool you under the hood there are
sophisticated, state of the art Information Retrieval techniques quietly at work.

Current version available is 2.3.2.

The book i am referring for Lucene is Manning series "Lucene in Action ", but problem with this book is , this is handling Lucene 1.4 version that is entirely different from the latest one. There are many new syntax changes because of that you will no be able to run this books example with 2.3 Version.

I have modified its basic Indexer and Searcher example to run with latest version and posting here for your reference.

Indexer.java

This will create Index of directory provided by the user 

 
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.Date;

/**
 * This code was originally written for Erik's Lucene intro java.net article
 */
public class Indexer
{

// open an index and start file directory traversal
    /**
     * Index all the Stuff for searching
     *
     * @since $Release$
     *
     */
    public static int index(File indexDir, File dataDir) throws IOException
    {

        if (!dataDir.exists() || !dataDir.isDirectory())
        {
            throw new IOException(
                dataDir + " does not exist or is not a directory"
                );
        }

        IndexWriter writer =
            new IndexWriter(indexDir, new StandardAnalyzer(), true);
        writer.setUseCompoundFile(false);
        indexDirectory(writer, dataDir);

        int numIndexed = writer.docCount();
        writer.optimize();
        writer.close();

        return numIndexed;
    }

    /**
     * Main Program
     *
     * @since $Release$
     *
     * @param args DOCUMENT ME!
     *
     * @throws Exception DOCUMENT ME!
     */
    public static void main(String[] args) throws Exception
    {

        if (args.length != 2)
        {
            throw new Exception(
                "Usage: java " + Indexer.class.getName() +
                " <index dir> <data dir>"
                );
        }

        File indexDir = new File(args[0]);
        File dataDir = new File(args[1]);
        long start = new Date().getTime();
        int numIndexed = index(indexDir, dataDir);
        long end = new Date().getTime();
        System.out.println(
            "Indexing " + numIndexed + " files took " + (end – start) +
            " milliseconds"
            );
    }

//       recursive method that calls itself when it finds a directory
    /**
     * DOCUMENT ME!
     *
     * @since $Release$
     */
    private static void indexDirectory(IndexWriter writer, File dir)
        throws IOException
    {

        File[] files = dir.listFiles();

        for (int i = 0; i < files.length; i++)
        {

            File f = files[i];

            if (f.isDirectory())
            {
                indexDirectory(writer, f);
            }
            else if (f.getName()
                    .endsWith(".txt"))
            {
                indexFile(writer, f);
            }
        }
    }

//       method to actually index a file using Lucene
    /**
     * DOCUMENT ME!
     *
     * @since $Release$
     */
    private static void indexFile(IndexWriter writer, File f) throws IOException
    {

        if (f.isHidden() || !f.exists() || !f.canRead())
        {

            return;
        }

        System.out.println("Indexing " + f.getCanonicalPath());

        Document doc = new Document();
       
        doc.add(new Field("contents", new FileReader(f)));
        doc.add(
            new Field(
                "filename",
                f.getCanonicalPath(),
                Field.Store.YES,
                Field.Index.UN_TOKENIZED
                )
            );

        //doc.add(Field.Text("contents", new FileReader(f)));
        //doc.add(Field.Keyword("filename", f.getCanonicalPath()));
        writer.addDocument(doc);
    }
}

 

Searcher.java –

Using Searcher you can perform search on created Index by Indexer.

 import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import java.io.File;
import java.util.Date;

/**
 * This code was originally written for Erik's Lucene intro java.net article
 */
public class Searcher
{

    /**
     * Search Using Lucene
     *
     * @since $Release$
     */
    public static void main(String[] args) throws Exception
    {

        if (args.length != 2)
        {
            throw new Exception(
                "Usage: java " + Searcher.class.getName() +
                " <index dir> <query>"
                );
        }

        File indexDir = new File(args[0]);
        String q = args[1];

        if (!indexDir.exists() || !indexDir.isDirectory())
        {
            throw new Exception(
                indexDir + " does not exist or is not a directory."
                );
        }

        search(indexDir, q);
    }

    /**
     * Search Function
     *
     * @since $Release$
     */
    public static void search(File indexDir, String q) throws Exception
    {

        Directory fsDir = FSDirectory.getDirectory(indexDir, false);
        IndexSearcher is = new IndexSearcher(fsDir);

        QueryParser qp = new QueryParser("contents", new StandardAnalyzer());
        Query query = qp.parse(q);

        long start = new Date().getTime();
        Hits hits = is.search(query);
        long end = new Date().getTime();
        System.err.println(
            "Found " + hits.length() + " document(s) (in " + (end – start) +
            " milliseconds) that matched query '" + q + "':"
            );

        for (int i = 0; i < hits.length(); i++)
        {

            Document doc = hits.doc(i);
            System.out.println(doc.get("filename"));
        }
    }
}

I Hope this information and examples will help you.

Do post your comments ;). 

About nitingautam

I am Tech Lead (Java/J2EE/ExtJs) with a MNC located @ Gurgaon.
This entry was posted in Java. Bookmark the permalink.

4 Responses to Lucene 2.3 – working example of Indexer and Searcher

  1. sandrar says:

    Hi! I was surfing and found your blog post… nice! I love your blog. ­čÖé Cheers! Sandra. R.

  2. ben says:

    hello,
    how can i execute this program?
    how can i input the directory of index and the directory of datei to indexed in programm?
    thank you

  3. admin says:

    See the main program I am using arguments to get the input parameters
    File indexDir = new File(args[0]);
    File dataDir = new File(args[1]);

  4. Marcello says:

    Hi,

    It’s posible to add only one document to an existing index ? incrementally ?

    Regards,
    Marcello

Leave a Reply

Your email address will not be published. Required fields are marked *