Writing your first Gluepy app, part 1

Let’s learn by writing a simple pipeline.

Throughout this tutorial, we’ll walk you through how to setup a new Gluepy project and how to leverage the various expects of Gluepy by writing a simple forecasting training pipeline.

We’ll assume that you already have Gluepy installed already. You can tell if Gluepy is installed with the following execution in your terminal

>>> import gluepy
>>> print(gluepy.VERSION)
1.2

If Gluepy is installed, you should see the version of your installation. The version you have installed may be different from the version displayed above.

Creating a Project

The first time you use Gluepy, you need to do some initial setup to get the structure and files of your project in place. This is what is referred to as a ‘Gluepy Project’ and it includes a CLI entrypoint, a set of Settings and a set of Gluepy modules.

To create your project, go to a new directory where you want to store your code and run the following command:

$ gluepy-cli startproject demo
Created project 'demo'

This will create a demo/ directory with the following structure:

  • configs/. The folder that contain all configurations in your application.

    • context.yaml. The default parameters that populate your Context.

    • settings.py. The application parameters that populate your Settings.

  • manage.py. The entrypoint to your project and the CLI from where you will execute your commands.

Next up, we want to create our first module that holds the logic of our application.

Create a Module

Up until now, you have only created the minimal configuration required by a Gluepy project. None of the files you created in the previous steps actually holds any business logic for your data pipelines.

Let’s create our first Module.

$ python manage.py startmodule forecaster
Created module 'forecaster'

This will create a new forecaster/ directory within your project that holds the initial files you will need for your Gluepy module in the following structure:

  • dags.py. This is the module that holds all your DAG definitions.

  • tasks.py. This is the module that holds all your Task definitions.

  • commands.py. This is the module that holds all your custom Gluepy Commands.

These files can be replaced with directories named tasks/, commands/ and dags/ if your module grows to consist of many classes and functions that you want to separate into different files.

Install our Module

To install and enable our module in our project, you need to go to the configs/settings.py file and add it to the INSTALLED_MODULES.

This will automatically import all the DAGs, Tasks and Commands defined in your module and expose it through the manage.py Gluepy Commands.

# settings.py
INSTALLED_MODULES = ["forecaster", ]

Run our first DAG

Now that we have created our project named demo, added our first module named forecaster and activated it in our project, let’s ensure things are working correctly by running the SampleDAG defined to us by default in our dags.py file using the dag command.

$ python manage.py dag sample
INFO 2024-06-25 12:28:47,057 dag - ---------- Started task 'BootstrapTask'
DEBUG 2024-06-25 12:28:47,057 tasks -
         Run ID: c24ef3e4-d869-427b-905e-8672caa4cd54
         Run Folder: runs/2024/6/25/c24ef3e4-d869-427b-905e-8672caa4cd54

DEBUG 2024-06-25 12:28:47,058 local - Writing file to path '/demo/data/runs/2024/6/25/c24ef3e4-d869-427b-905e-8672caa4cd54/context.yaml'.
INFO 2024-06-25 12:28:47,058 dag - ---------- Completed task 'BootstrapTask' in 0.001315 seconds
INFO 2024-06-25 12:28:47,058 dag - ---------- Started task 'SampleTask'
INFO 2024-06-25 12:28:47,058 dag - ---------- Completed task 'SampleTask' in 0.000001 seconds