public class AutoDetectProductCrawler extends ProductCrawler implements CoreMetKeys
A ProductCrawler
that uses a suite of files to define its crawling
and ingestion policy:
actions-map.xml
- This file is an XML specification for
actions that the crawler should take in response to its 3 lifecycle phases:
preIngest, postIngestSuccess, and postIngestFail. met-extr-preconditions.xml
- This file defines
preconditions that MetExtractor
s must pass before being called by
the AutoDetectCrawler. mime-extractor-map.xml
- This file maps MimeType
names to names of MetExtractor
s to call for a particular
Product
File
as it is encountered during a crawl (e.g.,
assuming that Metadata
needs to be generated, as oppossed to being
available apriori). See
./src/resources/examples/mime-extractor-map.xml
for an example
of the structure of this file. mimetypes.xml
- An Apache Tika style mimetypes
file, augmented with the ability to have arbitrary regular expressions that
define a particular Product
MimeType
. This MimeType
is then mapped to an extractor vai the mime-extractor-map.xml
file, described above. Modifier and Type | Field and Description |
---|---|
static String |
MIME_TYPES_HIERARCHY |
actionRepo, DIR_FILTER, FILE_FILTER, ingester, ingestStatus, LOG
FILE_LOCATION, FILE_SIZE, FILENAME, MIME_TYPE, PRODUCT_ID, PRODUCT_NAME, PRODUCT_RECEVIED_TIME, PRODUCT_STRUCTURE, PRODUCT_TYPE
Constructor and Description |
---|
AutoDetectProductCrawler() |
Modifier and Type | Method and Description |
---|---|
protected Metadata |
getMetadataForProduct(File product) |
protected boolean |
passesPreconditions(File product) |
protected File |
renameProduct(File product,
Metadata productMetadata) |
void |
setMimeExtractorRepo(String mimeExtractorRepo) |
crawl, crawl, getIngestStatus, handleFile
getActionIds, getApplicationContext, getClientTransferer, getDaemonPort, getDaemonWait, getFilemgrUrl, getGlobalMetadata, getId, getProductPath, getRequiredMetadata, isCrawlForDirs, isNoRecur, isSkipIngest, setActionIds, setApplicationContext, setClientTransferer, setCrawlForDirs, setDaemonPort, setDaemonWait, setFilemgrUrl, setGlobalMetadata, setId, setNoRecur, setProductPath, setRequiredMetadata, setSkipIngest
public static final String MIME_TYPES_HIERARCHY
protected Metadata getMetadataForProduct(File product) throws IOException, MetExtractionException
getMetadataForProduct
in class ProductCrawler
IOException
MetExtractionException
protected boolean passesPreconditions(File product)
passesPreconditions
in class ProductCrawler
protected File renameProduct(File product, Metadata productMetadata) throws NamingConventionException
renameProduct
in class ProductCrawler
NamingConventionException
public void setMimeExtractorRepo(String mimeExtractorRepo) throws IllegalAccessException, CrawlerActionException, MetExtractionException, InstantiationException, FileNotFoundException, ClassNotFoundException
Copyright © 1999–2017 Apache OODT. All rights reserved.