org.galagosearch.core.parse
Class LinkExtractor

java.lang.Object
  extended by org.galagosearch.tupleflow.StandardStep<Document,org.galagosearch.core.types.ExtractedLink>
      extended by org.galagosearch.core.parse.LinkExtractor
All Implemented Interfaces:
org.galagosearch.tupleflow.Processor<Document>, org.galagosearch.tupleflow.Source<org.galagosearch.core.types.ExtractedLink>, org.galagosearch.tupleflow.Step

@InputClass(className="org.galagosearch.core.parse.Document")
@OutputClass(className="org.galagosearch.core.types.ExtractedLink")
public class LinkExtractor
extends org.galagosearch.tupleflow.StandardStep<Document,org.galagosearch.core.types.ExtractedLink>

Extracts links from documents (anchor text, URLs).

Author:
trevor

Field Summary
 
Fields inherited from class org.galagosearch.tupleflow.StandardStep
processor
 
Constructor Summary
LinkExtractor(org.galagosearch.tupleflow.TupleFlowParameters parameters)
           
 
Method Summary
 java.lang.Class<Document> getInputClass()
           
 java.lang.Class<org.galagosearch.core.types.ExtractedLink> getOutputClass()
           
 void process(Document document)
           
 java.lang.String scrubUrl(java.lang.String url)
           
 
Methods inherited from class org.galagosearch.tupleflow.StandardStep
close, setProcessor
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LinkExtractor

public LinkExtractor(org.galagosearch.tupleflow.TupleFlowParameters parameters)
Method Detail

scrubUrl

public java.lang.String scrubUrl(java.lang.String url)

process

public void process(Document document)
             throws java.io.IOException
Specified by:
process in interface org.galagosearch.tupleflow.Processor<Document>
Specified by:
process in class org.galagosearch.tupleflow.StandardStep<Document,org.galagosearch.core.types.ExtractedLink>
Throws:
java.io.IOException

getInputClass

public java.lang.Class<Document> getInputClass()

getOutputClass

public java.lang.Class<org.galagosearch.core.types.ExtractedLink> getOutputClass()


Copyright © 2009. All Rights Reserved.