org.galagosearch.core.parse
Class LinkExtractor
java.lang.Object
org.galagosearch.tupleflow.StandardStep<Document,org.galagosearch.core.types.ExtractedLink>
org.galagosearch.core.parse.LinkExtractor
- All Implemented Interfaces:
- org.galagosearch.tupleflow.Processor<Document>, org.galagosearch.tupleflow.Source<org.galagosearch.core.types.ExtractedLink>, org.galagosearch.tupleflow.Step
@InputClass(className="org.galagosearch.core.parse.Document")
@OutputClass(className="org.galagosearch.core.types.ExtractedLink")
public class LinkExtractor
- extends org.galagosearch.tupleflow.StandardStep<Document,org.galagosearch.core.types.ExtractedLink>
Extracts links from documents (anchor text, URLs).
- Author:
- trevor
| Fields inherited from class org.galagosearch.tupleflow.StandardStep |
processor |
|
Constructor Summary |
LinkExtractor(org.galagosearch.tupleflow.TupleFlowParameters parameters)
|
| Methods inherited from class org.galagosearch.tupleflow.StandardStep |
close, setProcessor |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LinkExtractor
public LinkExtractor(org.galagosearch.tupleflow.TupleFlowParameters parameters)
scrubUrl
public java.lang.String scrubUrl(java.lang.String url)
process
public void process(Document document)
throws java.io.IOException
- Specified by:
process in interface org.galagosearch.tupleflow.Processor<Document>- Specified by:
process in class org.galagosearch.tupleflow.StandardStep<Document,org.galagosearch.core.types.ExtractedLink>
- Throws:
java.io.IOException
getInputClass
public java.lang.Class<Document> getInputClass()
getOutputClass
public java.lang.Class<org.galagosearch.core.types.ExtractedLink> getOutputClass()
Copyright © 2009. All Rights Reserved.