This is the developer guide for the Apache OODT Catalog and Archive Service (CAS) File Manager component, or File Manager for short. Primarily, this guide will explain the File Manager architecture and interfaces, including its tailorable extension points. For information on installation, configuration, and examples, please see our User Guides.
The remainder of this guide is separated into the following sections:
The File Manager component is responsible for tracking, ingesting and moving file data and metadata between a client system and a server system. The File Manager is an extensible software component that provides an XML-RPC external interface, and a fully tailorable Java-based API for file management.
In this section, we will describe the architecture of the File Manager, including its constituent components, object model, and key capabilities.
The major components of the File Manager are the Client and Server, the Repository Manager, the Catalog, the Validation Layer, the Versioner, and the Transferer. The relationship between all of these components are shown in the diagram below:
The File Manager Server contains both a Repository that manages products (and the products' location in the archive as specified by Versioner), and a Catalog that validates metadata via the Validation Layer. Transfer of data products from the Client to the Server is the domain of the Transfer and can be initiated at either the Client or the Server.
The critical objects managed by the File Manager include:
Each Product contains 1 or more References, and one Metadata object. Each Product is a member of a single Product Type. The Metadata collected for each Product is defined by a mapping of Product Type->1...* Elements. Each Product Type has an associated Versioner. These relationships are shown in the below figure.
The File manager has been designed with a new of key capabilities in mind. These capabilities include:
Easy management of different types of Products. The Repository Manager extension point is responsible for managing Product Types, and their associated information. Management of Product Types includes adding new types, deleting and updating existing types, and retrieving Product Type Objects, by their ID or by their name.
Support for different kinds of back end catalogs. The Catalog extension point allows Product instance metadata and file location information to be stored in different types of back end data stores quite easily. Existing implementations of the Catalog interface include a JDBC based back end database, along with a flat-file index powered by Lucene.
Management of Product instance information. Management includes adding, deleting and updating product instance information, including file locations (References), along with Product Metadata. It also includes retrieving Metadata and References associated with existing Products as well as obtaining the Products themselves.
Element management for Metadata. The File Manager's Validation Layer extension point allows for the management of Element policy information in different types of back end stores. For instance, Element policy could be stored in XML files, a Database, or a Metadata Registry.
Data transfer mechanism interface. By having an extension point for Data Transfer, the File Manager can support different Data Transfer protocols, both local and remote.
Advanced support for File Repository layouts. The Versioner extension point allows for different File Repository layouts based on Product Types.
Support for multiple Product structures. The File Manager Client allows for Products to be Flat, or Hierarchical-based. Flat products are collections of singular files that are aggregated together to make a Product. Hierarchical Products are Products that contain collections of directories, and sub-directories, and files.
Design for scalability. The File Manager uses the popular client-server paradigm, allowing new File Manager servers to be instantiated, as needed, without affecting the File Manager clients, and vice-versa.
Standard communication protocols. The File Manager uses XML-RPC as its main external interface between the File Manager client and server. XML-RPC, the little brother of SOAP, is fast, extensible, and uses the underlying HTTP protocol for data transfer.
RSS-based Product syndication. The File Manager web interface allows for the RSS-based syndication of Product feeds based on Product Type.
Data transfer status tracking. The File Manager tracks all current Product and File transfers and even publishes an RSS-feed of existing transfers.
This capability set is not exhaustive, and is meant to give the user a feel for what general features are provided by the File Manager. Most likely the user will find that the File Manager provides many other capabilities besides those described here.
We have constructed the File Manager making use of the factory method pattern to provide multiple extension points for the File Manager. An extension point is an interface within the File Manager that can have many implementations. This is particularly useful when it comes to software component configuration because it allows different implementations of an existing interface to be selected at deployment time.
Using extension points, it is fairly simple to support many different types of what are typically referred to as "plug-in architectures." Each of the core extension points for the File Manager is described below:
Catalog | The Catalog extension point is responsible for storing all the instance data for Products, Metadata, and for file References. Additionally, the Catalog provides a query capability for Products. |
Data Transfer | The Data Transfer extension point allows for the movement of a Product to and from the archive managed by the File Manager component. Different protocols for Data Transfer may include local (disk-based) copy, or remote XML-RPC based transfer across networked machines. |
Repository Manager | The Repository Manager extension point provides a means for managing all of the policy information (i.e., the Product Types and their associated information) for Products managed by the File Manager. |
Validation Layer | The Validation Layer extension point allows for the querying of element definitions associated with a particular Product Type. The extension point also maps Product Type to Elements. |
Versioning | The Versioning extension point allows for the definition of different URI generation schemes that define the final resting location of files for a particular Product. |
System | The extension point that provides the external interface to the File Manager services. This includes the File Manager server interface, as well as the associated File Manager client interface, that communicates with the server. |
There are at least two implementations of all of the aforementioned extension points for the File Manager. Each extension point implementation is detailed in this section.
The File Manager was built to support several of the above capabilities outlined in Section 3. In particular there were several use cases that we wanted to support, some of which are described below.
The red numbers in the above Figure correspond to a sequence of steps that occurs and a series of interactions between the different File Manager extension points in order to perform the file ingestion activity. In Step 1, a File Manager client is invoked for the ingest operation, which sends Metadata and References for a particular Product to ingest to the File Manager server’s System Interface extension point. The System Interface uses the information about Product Type policy made available by the Repository Manager in order to understand whether or not the product should be transferred, where it’s root repository path should be, and so on. The System Interface then catalogs the file References and Metadata using the Catalog extension point. During this catalog process, the Catalog extension point uses the Validation Layer to determine which Elements should be extracted for the particular Product, based upon its Product Type. After that, Data Transfer is initiated either at the client or server end, and the first step to Data Transfer is using the Product’s associated Versioner to generate final file References. After final file References have been determined, the file data is transferred by the server or by the client, using the Data Transfer extension point.
The aim of this document is to provide information relevant to developers about the CAS File Manager. Specifically, this document has described the File Manager's architecture, including its constituent components, object model and key capabilities. Additionally, the this document provides an overview of the current implementations of the File Manager's extension points.
In the Basic User Guide and Advanced User Guide, we will cover topics like installation, configuration, and example uses as well as advanced topics like scaling and other tips and tricks.