org.galagosearch.core.index
Class IndexReader

java.lang.Object
  extended by org.galagosearch.core.index.IndexReader

public class IndexReader
extends java.lang.Object

This implements the core functionality for all inverted list readers. It can also be used as a read-only TreeMap for disk-based data structures. In Galago, it is used both to store index data and to store documents.

An index is a mapping from String to byte[]. If compression is turned on, the value must be small enough that it fits in memory. If compression is off, values are streamed directly from disk so there is no size restriction. Indexes support iteration over all keys, or direct lookup of a single key. The structure is optimized to support fast random lookup on disks.

Data is stored in blocks, typically 32K each. Each block has a prefix-compressed set of keys at the beginning, followed by a block of value data. IndexWriter/IndexReader can GZip compress that value data, or it can be stored uncompressed. For inverted list data it's best to use your own compression, but for text data the GZip compression is a good choice.

Typically this class is extended by composition instead of inheritance.

Author:
trevor

Nested Class Summary
 class IndexReader.Iterator
           
 
Constructor Summary
IndexReader(java.io.File pathname)
          Identical to the other constructor, except this one takes a File object instead of a string as the parameter.
IndexReader(java.lang.String pathname)
          Opens an index found in the at pathname.
 
Method Summary
 org.galagosearch.tupleflow.DataStream blockStream(IndexReader.Iterator iter)
          This convenience method returns a DataStream for the region of the inverted file pointed to by the iterator.
 org.galagosearch.tupleflow.DataStream blockStream(long len)
          Like the other blockStream variant, but this one uses the current file location as the starting offset.
 org.galagosearch.tupleflow.DataStream blockStream(long offset, long length)
          This convenience method returns a DataStream for a region of an inverted file.
 void close()
          Closes all files associated with the IndexReader.
 java.io.RandomAccessFile getInput()
          Returns the file object for the inverted file.
 IndexReader.Iterator getIterator()
          Returns an iterator pointing to the very first key in the index.
 IndexReader.Iterator getIterator(java.lang.String key)
          Returns an iterator pointing at a specific key.
 org.galagosearch.tupleflow.Parameters getManifest()
          Returns a Parameters object that contains metadata about the contents of the index.
 org.galagosearch.tupleflow.DataStream getValueStream(java.lang.String key)
          Gets the value stored in the index associated with this key.
 java.lang.String getValueString(java.lang.String key)
          Gets the value stored in the index associated with this key.
 VocabularyReader getVocabulary()
          Returns the vocabulary structure for this IndexReader.
static boolean isIndexFile(java.lang.String pathname)
          Returns true if the file specified by this pathname was probably written by IndexWriter.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

IndexReader

public IndexReader(java.lang.String pathname)
            throws java.io.FileNotFoundException,
                   java.io.IOException
Opens an index found in the at pathname.

Parameters:
pathname - Filename of the index to open.
Throws:
java.io.FileNotFoundException
java.io.IOException

IndexReader

public IndexReader(java.io.File pathname)
            throws java.io.FileNotFoundException,
                   java.io.IOException
Identical to the other constructor, except this one takes a File object instead of a string as the parameter.

Parameters:
pathname -
Throws:
java.io.FileNotFoundException
java.io.IOException
Method Detail

isIndexFile

public static boolean isIndexFile(java.lang.String pathname)
                           throws java.io.FileNotFoundException,
                                  java.io.IOException
Returns true if the file specified by this pathname was probably written by IndexWriter. If this method returns false, the file is definitely not readable by IndexReader.

Parameters:
pathname -
Returns:
Throws:
java.io.FileNotFoundException
java.io.IOException

getVocabulary

public VocabularyReader getVocabulary()
Returns the vocabulary structure for this IndexReader. Note that the vocabulary contains only the first key in each block.


getIterator

public IndexReader.Iterator getIterator()
                                 throws java.io.IOException
Returns an iterator pointing to the very first key in the index. This is typically used for iterating through the entire index, which might be useful for testing and debugging tools, but probably not for traditional document retrieval.

Throws:
java.io.IOException

getIterator

public IndexReader.Iterator getIterator(java.lang.String key)
                                 throws java.io.IOException
Returns an iterator pointing at a specific key. Returns null if the key is not found in the index.

Throws:
java.io.IOException

getValueString

public java.lang.String getValueString(java.lang.String key)
                                throws java.io.IOException
Gets the value stored in the index associated with this key.

Parameters:
key -
Returns:
The index value for this key, or null if there is no such value.
Throws:
java.io.IOException

getValueStream

public org.galagosearch.tupleflow.DataStream getValueStream(java.lang.String key)
                                                     throws java.io.IOException
Gets the value stored in the index associated with this key.

Parameters:
key -
Returns:
The index value for this key, or null if there is no such value.
Throws:
java.io.IOException

getManifest

public org.galagosearch.tupleflow.Parameters getManifest()
Returns a Parameters object that contains metadata about the contents of the index. This is the place to store important data about the index contents, like what stemmer was used or the total number of terms in the collection.


blockStream

public org.galagosearch.tupleflow.DataStream blockStream(long len)
                                                  throws java.io.IOException
Like the other blockStream variant, but this one uses the current file location as the starting offset.

Throws:
java.io.IOException

blockStream

public org.galagosearch.tupleflow.DataStream blockStream(long offset,
                                                         long length)
                                                  throws java.io.IOException
This convenience method returns a DataStream for a region of an inverted file.

Throws:
java.io.IOException

blockStream

public org.galagosearch.tupleflow.DataStream blockStream(IndexReader.Iterator iter)
                                                  throws java.io.IOException
This convenience method returns a DataStream for the region of the inverted file pointed to by the iterator.

Throws:
java.io.IOException

getInput

public java.io.RandomAccessFile getInput()
Returns the file object for the inverted file. This is useful for actually reading the data from a byte range returned by the iterator.


close

public void close()
           throws java.io.IOException
Closes all files associated with the IndexReader.

Throws:
java.io.IOException


Copyright © 2009. All Rights Reserved.