Settings
Gluepy have two types of configurations, we have the Core Settings that contains our “Project” configuration such as
connection details, logging configuration, installed modules, and we also have the Context Configuration which refers to the specific
model parameters of the execution our DAG, its run_id
, run_folder
and so on.
Core Settings
The core settings are the project settings that are automatically loaded for any execution happening with the manage.py
Gluepy Commands.
These configurations can be environment specific, and you can refer which settings to load using the GLUEPY_SETTINGS_MODULE
environment variable.
BASE_DIR
Default: os.path.dirname(os.path.dirname(__file__))
(path to root of project)
A string that represent the full absolute path to the root of the project, used by other settings and configuration to construct paths.
CONFIG_PATH
Default: os.path.join(BASE_DIR, "configs")
(path to configs folder of project)
A string that represent the full absolute path to the configs folder of the project where your YAML files resides that later populate your default Context.
INSTALLED_MODULES
Default: []
(empty list of strings)
A list of strings that represent the dotted import path available on your system path for any Gluepy module that you want to enable as part of the project.
See more at Modules.
STORAGE_ROOT
Default: os.path.join(BASE_DIR, "data")
(file path to data directory)
The path to the root of where all data assets are located. This path could be an absolute local path if using the LocalStorage
STORAGE_BACKEND
or
it can be a relative path of using Blob Storage backends such as S3Storage
.
STORAGE_BACKEND
Default: "gluepy.files.storages.local.LocalStorage"
(dotted string to LocalStorage
)
Dotted path to the Storage Backends class to be loaded and later used by the default_storage
object throughout application.
.. setting:: DATA_BACKEND
DATA_BACKEND
Default: "gluepy.files.data.PandasDataManager"
(dotted string to PandasDataManager
)
Dotted path to the Data Backends class to be loaded and later used by the data_manager
object throughout application.
CONTEXT_BACKEND
Default: "gluepy.conf.context.DefaultContextManager"
(dotted string to DefaultContextManager
)
Dotted path to the Context Configuration manager class to be loaded and later used by the default_context
object throughout application.
START_TASK
Default: "gluepy.exec.tasks.BootstrapTask"
(dotted string to BootstrapTask
)
Dotted path to a Task that we want to inject to the beginning of every DAG that we execute in our project. Usually helpful to provide a standard set of diagnostic meta data around the execution.
LOGGING
Default: {}
(empty dictionary)
A logging.dictConfig
that is loaded for any command executed through the Gluepy Commands.
Context Configuration
As described in detail in our Context topic guide, the Context Configuration refers to the DAG/Model specific parameters that made up a specific execution, that you may want to frequently adjust to tweak the behavior of your pipeline and project.
Unlike the Core Settings which are standardize and predefined, the Context is more of a “user config” where you can add any parameter or variable that you may want to use throughout your project.
For example, you may have a context.yaml
file that looks like this:
# Gluepy protected parameters
meta:
run_id:
run_folder:
created_at:
# Custom user added parameters
forecaster:
start_date: 2024-01-01
That you later want to access in your Python code like this:
from gluepy.conf import default_context
from gluepy.exec import Task
class ForecasterTask(Task):
def run(self):
print(default_context.forecaster.start_date)
Context
Singleton class that holds all parameters and configurations related to the specific execution, such as run_id
, run_folder
, created_at
and other project parameters.
The context is lazily evaluated using the Context Managers and accessible using the gluepy.conf.default_context
object.
Given its a singleton, there can only accept a single instance of a Context
at any point in time.
Context Managers
The default_context
object is automatically populated using the backend defined in CONTEXT_BACKEND
. This allow you
as a developer to extend Gluepy to potentially create your own class that may load parameters from a remote source, an API, an environment variable
or from any other sources.
Gluepy comes with a DefaultContextManager
out of the box that loads the default_context
from .yaml files located in the CONFIG_PATH
directory.
If you ever need to access the instance of the CONTEXT_BACKEND
context manager directly, you can do so using the lazily evaluated gluepy.conf.default_context_manager
object.
Protected Parameters
There are a few parameters that are populated by Gluepy, these are defined under the meta
tag and contain meta data around the ongoing execution.
meta:
run_id:
run_folder:
created_at: