org.galagosearch.core.parse
Class ArcParser

java.lang.Object
  extended by org.galagosearch.core.parse.ArcParser
All Implemented Interfaces:
DocumentStreamParser

public class ArcParser
extends java.lang.Object
implements DocumentStreamParser

Parses ARC files, like those produced by the Heretrix web crawler.

Author:
trevor

Constructor Summary
ArcParser(java.io.BufferedInputStream stream)
           
 
Method Summary
 Document nextDocument()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ArcParser

public ArcParser(java.io.BufferedInputStream stream)
          throws java.io.FileNotFoundException,
                 java.io.IOException
Throws:
java.io.FileNotFoundException
java.io.IOException
Method Detail

nextDocument

public Document nextDocument()
                      throws java.io.IOException
Specified by:
nextDocument in interface DocumentStreamParser
Throws:
java.io.IOException


Copyright © 2009. All Rights Reserved.