Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

Commit

Permalink
Adds docs related to distributed workflow
Browse files Browse the repository at this point in the history
  • Loading branch information
IRCody committed Jun 10, 2016
1 parent 32eac42 commit 9f64f29
Show file tree
Hide file tree
Showing 2 changed files with 67 additions and 0 deletions.
35 changes: 35 additions & 0 deletions docs/DISTRIBUTED_WORKFLOW_ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Distributed Workflow

A distributed workflow is a workflow where one or more steps have a remote target specified. An example of this is:

```yaml
---
collect:
metrics:
/intel/mock/foo: {}
/intel/mock/bar: {}
/intel/mock/*/baz: {}
config:
/intel/mock:
user: "root"
password: "secret"
process:
-
plugin_name: "passthru"
target: "127.0.0.1:8082"
publish:
-
plugin_name: "file"
target: "127.0.0.1:8082"
config:
file: "/tmp/published"

```

## Architecture

Distributed workflow is accomplished by allowing remote targets to be specified as part of a task workflow. This is done by having a gRPC server running that can handle actions needed by the scheduler to run a task. These are defined in the [managesMetrics](https://github.com/intelsdi-x/snap/blob/distributed-workflow/scheduler/scheduler.go) interface defined in scheduler/scheduler.go. This interface is implemented by both pluginControl in control/control.go and ControlProxy in grpc/controlproxy/controlproxy.go. This allows the scheduler to not know/care where a step in the workflow is running. On task creation the workflow is walked and the appropriate type is selected or created for each step in the workflow.

## Performance considerations

The main performance penalty for using remote targets is that data is now sent over the network instead of locally. This is minimized since snap will only make remote calls for steps in the workflow that specify a remote target.
32 changes: 32 additions & 0 deletions docs/TASKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,38 @@ For more on tasks, visit [`SNAPCTL.md`](SNAPCTL.md).

The workflow is a [DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph) which describes the how and what of a task. It is always rooted by a `collect`, and then contains any number of `process`es and `publish`es.

#### Remote Targets

Process and Publish nodes in the workflow can also target remote snap nodes via the 'target' key. The purpose of this is to allow offloading of resource intensive workflow steps from the node where data collection is occuring. Modifying the example above we have:

```yaml
---
collect:
metrics:
/intel/mock/foo: {}
/intel/mock/bar: {}
/intel/mock/*/baz: {}
config:
/intel/mock:
user: "root"
password: "secret"
process:
-
plugin_name: "passthru"
target: "127.0.0.1:8082"
publish:
-
plugin_name: "file"
target: "127.0.0.1:8082"
config:
file: "/tmp/published"

```

If a target is specified for a step in the workflow, that step will be executed on the remote instance specified by the ip:port target. Each node in the workflow is evaluated independently so a workflow can have any, all, or none of it's steps being done remotely (if `target` key is omitted, that step defaults to local). The ip and port target are the ip and port that has a running control-grpc server. These can be specified to snapd via the `control-listen-addr` and `control-listen-port` flags. The default is the same ip as the snap rest-api and port 8082.

An example json task that uses remote targets can be found under [examples](https://github.com/intelsdi-x/snap/blob/distributed-workflow/examples/tasks/distributed-mock-file.json). More information about the architecture behind this can be found [here](DISTRIBUTED_WORKFLOW_ARCHITECTURE.md).

#### collect

The collect section describes which metrics to collect. Metrics can be enumerated explicitly via:
Expand Down

0 comments on commit 9f64f29

Please sign in to comment.