This is the developer guide for the OODT Catalog and Archive Service (CAS) Workflow Manager component, or Workflow Manager for short. Primarily, this guide will explain the Workflow Manager architecture and interfaces, including its tailorable extension points. For information on installation, configuration, and examples, please see our User Guides.
The remainder of this guide is separated into the following sections:
The Workflow Manager component is responsible for description, execution, and monitoring of Workflows, using a client, and a server system. Workflows are typically considered to be sequences of tasks, joined together by control flow, and data flow, that must execute in some ordered fashion. Workflows typically generate output data, perform routine management tasks (such as email, etc.), or describe a business's internal routine practices. The Workflow Manager is an extensible software component that provides an XML-RPC external interface, and a fully tailorable Java-based API for workflow management.
In this section, we will describe the architecture of the Workflow Manager, including its constituent components, object model, and key capabilities.
The major components of the Workflow Manager are the Client and Server, the Workflow Repository, the Workflow Engine,and the Workflow Instance Repository. The relationship between all of these components are shown in the diagram below:
The Workflow Manager Server contains both a Workflow Repository that manages workflow models, and Workflow Engine that processes workflow instances. The Workflow Engine also has a persistence layer called a Workflow Instance Repository that is responsible for saving workflow instance metadata and state.
The critical objects managed by the Workflow Manager include:
Each Event kicks off 1 or more Workflow Instances, providing a Metadata context (submitted by an external user). Each Workflow Instance is a run-time execution model of a Workflow. Each Workflow contains 1 or more Workflow Tasks. Each Workflow Task contains a single Workflow Task Configuration, and one or more Workflow Conditions. Each Workflow Task has a corresponding Workflow Task Instance (that it models), as does each Workflow Condition have a corresponding Workflow Condition Instance. These relationships are shown in the below figure.
The Workflow Manager is responsible for providing the necessary key capabilities for managing processing pipelines, data flow, and control flow. Each high level capability provided by the Workflow Manager is detailed below:
Explicit Modeling. The Workflow manager captures both identified workflow patterns (control-flow) and data-flow between Workflow Task Instances. Workflows are directed graphs, allowing for true parallelism.
Persistence. Support for persistance of Workflow Instances to several backend repositories, including relational databases, and Apache Lucene flat file indices.
Standard Representations. The Workflow Manager represents Workflow models as XML documents.
Scalability. The Workflow Manager uses the popular client-server paradigm, allowing new Workflow Manager servers to be instantiated, as needed, without affecting the Workflow Manager clients, and vice-versa.
Standard communication protocols. The Workflow Manager uses XML-RPC as its main external interface between the File Manager client and server. XML-RPC, the little brother of SOAP, is fast, extensible, and uses the underlying HTTP protocol for data transfer.
Event-Driven Execution. Workflows are triggered by events that can include arbitrary Metadata parameters, provided as a shared context between stages of the executing Workflow.
This capability set is not exhaustive, and is meant to give the user a feel for what general features are provided by the Workflow Manager. Most likely the user will find that the Workflow Manager provides many other capabilities besides those described here.
We have constructed the Workflow Manager making use of the factory method pattern to provide multiple extension points for the Workflow Manager. An extension point is an interface within the Workflow Manager that can have many implementations. This is particularly useful when it comes to software component configuration because it allows different implementations of an existing interface to be selected at deployment time.
Using extension points, it is fairly simple to support many different types of what are typically referred to as "plug-in architectures." Each of the core extension points for the Workflow Manager is described below:
Workflow Instance Repository | The Workflow Instance Repository extension point is responsible for storing all the instance data for Workflow Instances, including shared context metadata, runtime properties such as start date time, end date time, and task start/end date time. |
Workflow Repository | The Workflow Repository extension point is responsible for managing Workflow models, storing control flow, and Workflow Tasks, which model data flow. The Workflow Repository also stores Workflow Condition information, and Workflow Task Configuration. In essence, the Workflow Repository is a repository of abstract Workflow models, that get turned into Workflow Instances by the Engine extension point. |
Workflow Engine | The Workflow Engine's responsibility is to turn abstract Workflow models into executing Workflow Instances. The Workflow Engine tracks and monitors execution of Workflow Instances, and provides the ability to start, stop and pause executing Workflow Instances. |
System | The extension point that provides the external interface to the Workflow Manager services. This includes the Workflow Manager server interface, as well as the associated Workflow Manager client interface, that communicates with the server. |
There are at least two implementations of all of the aforementioned extension points for the Manager, with the exception of the ThreadPoolWorkflowEngine, which itself is meant to be an extension point. Each extension point implementation is detailed below:
The Workflow Manager was built to support several of the above capabilities. In particular there were several use cases that we wanted to support, some of which are described below.
The black numbers in the above Figure correspond to a sequence of steps that occurs and a series of interactions between the different Workflow Manager extension points in order to perform the workflow execution activity. In Step 1, an event is provided to the Workflow Manager event listenter (the System extension point), along with required Metadata. The Workflow Manager, in step 2, looks up if ther are any associated Workflow Repository models associated with the provided Event. If so, in steps 3 and 4, the returned Workflow models are sent to the WorkflowEngine, to be turned into executable Workflow Instances.Each WorkflowInstance is handed off to a WorkflowProcessorThread, taken from the ThreadPoolWorkflowEngine, in steps 5 and 6. The WorkflowProcessorThread, in step 7, steps through each executable WorkflowTask, checking to make sure that all necessary Workflow Conditions (if any) are satisfied. If all Workflow Conditions are satisfied, then the Workflow Task is executed, either locally, or if an Resource Manager is defined, then the task is sent (in step 7) to the Resource Manager (labeled Process Manager in the figure). In steps 8-13, the WorkflowTask is executed on remote resources using the Resource Manager, and eventually completed, with the final notification being sent back to the corresponding Workflow Processor Thread, which is stepping through the Workflow Instance, controlling its exectuion.
The aim of this document is to provide information relevant to developers about the CAS Worklfow Manager. Specifically, this document has described the Workflow Manager's architecture, including its constituent components, object model and key capabilities. Additionally, this document provides an overview of the current implementations of the Workflow Manager's extension points.
In the Basic User Guide and Advanced User Guide, we will cover topics like installation, configuration, and example uses as well as advanced topics like scaling and other tips and tricks.