This project has retired. For details please refer to its Attic page.
cas-pge – CAS PGE Basic Developer Guide

Introduction

This is the developer guide for the Apache OODT Catalog and Archive Service (CAS) Program Generation Executable (PGE) component, or CAS-PGE for short. This guide explains the CAS-PGE architecture as well as its tailorable extension points.

The remainder of this guide is separated into the following sections:

Project Description

In order to fully understand the CAS-PGE component, it is helpful to have a solid grasp of the CAS Workflow component. If you need some background on CAS Workflow, please see our CAS Workflow Developer Guide. With CAS Workflow in mind, it is often the case that CAS Workflow is used as part of a data processing system - where workflows are responsible for controlling the run order of different Product Generation Executables (PGEs). In circumstances like this, CAS-PGE can help wrap a PGE as part of a CAS Workflow. One can think of a PGE as a piece of code, which given a set of inputs, generates output files. Thus, CAS-PGE is designed to help accomplish the most common actions required to run PGEs: ie. finding their input files, executing the PGE, and saving their output files. CAS-PGE performs some of these actions by interacting with a second CAS component as well: CAS File Manager. The CAS File Manager can be part of this type of workflow-based data processing system, which manages data files, and can support metadata-filtering queries across those files to allow for fast retrieval. In other words, CAS File Manger complements CAS-PGE by supporting file cataloging for files involved in PGE operations.

In summary, CAS-PGE's role is to provide tools for encapsulating PGEs; however, it also seeks to leverage and make the use of other CAS components to support the aforementioned goal.

Architecture

[TBD]

Extension Points

PGEs usually need a method by which information is given to them on how to run, what to run with (i.e. input files), and where to place the output files as well as what to name them. CAS-PGE accomplishes this, and other tasks, by making use of customizable extension points.

The following is a description of the most common extension points

  • SciPgeConfigFileWriter - writes configuration files for describing how a PGE will run, with which input files it will run with, and where the output will be placed
  • PcsMetFileWriter - controls which metadata should be sent to the CAS File Manager (with each output file) for ingestion
  • PGETaskInstance - an extensible module which performs the most generic and common actions required by typical PGEs. This module makes getting started with a default PGE configuration simple.
  • PgeConfigBuilder - builds a PgeConfig object, which has the ability to control how a CAS-PGE will run

The relationship between these extension-points and other CAS-PGE components is described in the below figure.

Extension Points

Runtime Execution

In terms of runtime execution, CAS-PGE makes use of two mediums to configure how a PGE will run: metadata and a PgeConfig object. Using these two pieces of information, CAS-PGE can configure how many configuration files it should generate, which SciPgeConfigFileWriter(s) to use to create these configuration files, which output files need which PcsMetFileWriter to generate their metadata for CAS File Manager ingestion, how to run the PGE, which CAS File Manager to talk to, etc. For the first medium (metadata), there is a set of reserved metadata fields that CAS-PGE expects, which affects the way CAS-PGE runs (i.e. which CAS File Manager to ingest to). For the second medium (PgeConfig), the PgeConfigBuilder builds up a PgeConfig object, which can also control how CAS-PGE runs.