Glossary

dag

A directed-acyclic-graph, or in other words a ‘pipeline’ of a number of task instances that are executed in a specific order.

task

A single step as part of a dag. The task contains the actual logic that is being executed throughout that step.

schema

A class that define the structure of your data.

run

A ‘run’ refers to a single execution of a dag. The run will be given its own run_id and run_folder to version its output from other runs.

field

An attribute on a schema; a given field usually maps directly to a single dataframe column.

project

A Python package – i.e. a directory of code – that contains all the settings for an instance of Gluepy. This would include data configuration, Gluepy-specific options and application-specific settings.

module

A Python package – i.e. a directory of code – that contains a set of DAGs, Commands, Tasks or other Gluepy components. Modules can be project specific, or generic and reusable across multiple projects.