|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
| Interface Summary | |
|---|---|
| DocumentStreamParser | |
| Class Summary | |
|---|---|
| AdditionalTextCombiner | Adds tuples of type AdditionalDocumentText to the end of the text field in a document. |
| AnchorTextCreator | |
| AnchorTextDocumentCreator | From an IdentifiedLink object, this class constructs a document containing only anchor text. |
| ArcParser | Parses ARC files, like those produced by the Heretrix web crawler. |
| CollectionLengthCounter | |
| DateExtractor | A very crude extractor of dates from text. |
| Document | |
| DocumentDataExtractor | Copies a few pieces of metadata about a document (identifier, url, length) from a document object and stores them in a DocumentData tuple. |
| DocumentDataNumberer | Sequentially numbers document data objects. |
| DocumentFilter | |
| DocumentIndexReader | |
| DocumentIndexWriter | Writes document text and metadata to an index file. |
| DocumentLinkData | |
| DocumentSource | From a set of inputs, splits the input into many DocumentSplit records. |
| DocumentToKeyValuePair | This is used in conjunction with KeyValuePairToDocument. |
| Extent | |
| ExtentExtractor | Converts all tags from a document object into DocumentExtent tuples. |
| ExtentsNumberer | |
| FieldConflater | |
| IndexReaderSplitParser | Reads Document data from an index file. |
| KeyValuePairToDocument | This is used in conjunction with DocumentToKeyValuePair. |
| LinkCombiner | |
| LinkExtractor | Extracts links from documents (anchor text, URLs). |
| Porter2Stemmer | |
| PositionPostingsNumberer | |
| PostingsPositionExtractor | |
| PriorParser | |
| StringPooler | The point of this class is to replace strings in document objects with already-used copies. |
| Tag | This class represents a tag in a XML/HTML document. |
| TagTokenizer | This class processes document text into tokens that can be indexed. |
| TagTokenizer.Pair | |
| TrecTextParser | |
| TrecWebParser | |
| UniversalParser | |
| WordCounter | |
| WordCountReducer | |
| WordFilter | WordFilter filters out unnecessary words from documents. |
|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||