public class AutoDetectProductCrawler extends ProductCrawler implements CoreMetKeys
A ProductCrawler that uses a suite of files to define its crawling
and ingestion policy:
actions-map.xml - This file is an XML specification for
actions that the crawler should take in response to its 3 lifecycle phases:
preIngest, postIngestSuccess, and postIngestFail. met-extr-preconditions.xml - This file defines
preconditions that MetExtractors must pass before being called by
the AutoDetectCrawler. mime-extractor-map.xml - This file maps MimeType
names to names of MetExtractors to call for a particular
Product File as it is encountered during a crawl (e.g.,
assuming that Metadata needs to be generated, as oppossed to being
available apriori). See
./src/resources/examples/mime-extractor-map.xml for an example
of the structure of this file. mimetypes.xml - An Apache Tika style mimetypes
file, augmented with the ability to have arbitrary regular expressions that
define a particular Product MimeType. This MimeType
is then mapped to an extractor vai the mime-extractor-map.xml
file, described above. | Modifier and Type | Field and Description |
|---|---|
static String |
MIME_TYPES_HIERARCHY |
actionRepo, DIR_FILTER, FILE_FILTER, ingester, ingestStatus, LOGFILE_LOCATION, FILE_SIZE, FILENAME, MIME_TYPE, PRODUCT_ID, PRODUCT_NAME, PRODUCT_RECEVIED_TIME, PRODUCT_STRUCTURE, PRODUCT_TYPE| Constructor and Description |
|---|
AutoDetectProductCrawler() |
| Modifier and Type | Method and Description |
|---|---|
protected Metadata |
getMetadataForProduct(File product) |
protected boolean |
passesPreconditions(File product) |
protected File |
renameProduct(File product,
Metadata productMetadata) |
void |
setMimeExtractorRepo(String mimeExtractorRepo) |
crawl, crawl, getIngestStatus, handleFilegetActionIds, getApplicationContext, getClientTransferer, getDaemonPort, getDaemonWait, getFilemgrUrl, getGlobalMetadata, getId, getProductPath, getRequiredMetadata, isCrawlForDirs, isNoRecur, isSkipIngest, setActionIds, setApplicationContext, setClientTransferer, setCrawlForDirs, setDaemonPort, setDaemonWait, setFilemgrUrl, setGlobalMetadata, setId, setNoRecur, setProductPath, setRequiredMetadata, setSkipIngestpublic static final String MIME_TYPES_HIERARCHY
protected Metadata getMetadataForProduct(File product) throws IOException, MetExtractionException
getMetadataForProduct in class ProductCrawlerIOExceptionMetExtractionExceptionprotected boolean passesPreconditions(File product)
passesPreconditions in class ProductCrawlerprotected File renameProduct(File product, Metadata productMetadata) throws NamingConventionException
renameProduct in class ProductCrawlerNamingConventionExceptionpublic void setMimeExtractorRepo(String mimeExtractorRepo) throws IllegalAccessException, CrawlerActionException, MetExtractionException, InstantiationException, FileNotFoundException, ClassNotFoundException
Copyright © 1999–2017 Apache OODT. All rights reserved.