Skip to content

Commit

Permalink
Create an initial version of a package template. (#124)
Browse files Browse the repository at this point in the history
  • Loading branch information
nclaeys authored Jul 5, 2024
1 parent fe2d6f7 commit 13ee7c1
Show file tree
Hide file tree
Showing 27 changed files with 722 additions and 0 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
<a name="unreleased"></a>
## Unreleased

### features
- Add a template for creating Conveyor packages

## [1.6.2 - 2024-07-01]

### features
Expand Down
Empty file added package/alert/__init__.py
Empty file.
3 changes: 3 additions & 0 deletions package/alert/cookiecutter.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"package_name": "common"
}
Empty file added package/alert/hooks/__init__.py
Empty file.
32 changes: 32 additions & 0 deletions package/alert/{{ cookiecutter.package_name }}/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
__pycache__
*.pyc
*.pyo
*.pyd
.Python
pip-log.txt
pip-delete-this-directory.txt
.tox
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
*.log
.git
.datafy
dags
Dockerfile
target/
logs/
resources/
dags/

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
128 changes: 128 additions & 0 deletions package/alert/{{ cookiecutter.package_name }}/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# dbt
.user.yml
30 changes: 30 additions & 0 deletions package/alert/{{ cookiecutter.package_name }}/.gitpod.dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
FROM gitpod/workspace-python

RUN sudo apt-get update \
&& sudo apt-get dist-upgrade -y \
&& sudo apt-get install -y --no-install-recommends \
git \
ssh-client \
software-properties-common \
make \
build-essential \
ca-certificates \
libpq-dev \
&& sudo apt-get clean \
&& sudo rm -rf \
/var/lib/apt/lists/* \
/tmp/* \
/var/tmp/*

RUN wget "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -O "awscliv2.zip" && \
unzip awscliv2.zip && \
sudo ./aws/install --install-dir /opt/aws-cli --bin-dir /usr/local/bin/ && \
sudo chmod a+x /opt/
# Env vars
ENV PYTHONIOENCODING=utf-8
ENV LANG=C.UTF-8

# Update python and install
COPY .python-version .python-version
RUN cd /home/gitpod/.pyenv && git fetch && git checkout v2.3.24 #Update pyenv
RUN pyenv install
44 changes: 44 additions & 0 deletions package/alert/{{ cookiecutter.package_name }}/.gitpod.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# List the start up tasks. Learn more https://www.gitpod.io/docs/config-start-tasks/
tasks:
- name: Gitpod config (browser open, workspace bin path)
init: |
mkdir -p /workspace/bin
cat > /workspace/bin/open.sh <<'EOF'
#!/bin/bash
exec gp preview --external "$@"
EOF
chmod +x /workspace/bin/open.sh
command: |
sudo update-alternatives --install /usr/bin/www-browser www-browser /workspace/bin/open.sh 100
exit
- name: Install conveyor
init: curl -s https://static.conveyordata.com/cli-install/install.sh | bash
command: |
curl -s https://static.conveyordata.com/cli-install/update.sh | bash
exit
image:
file: .gitpod.dockerfile

# VS Code settings
# vscode:
# extensions:
# - dorzey.vscode-sqlfluff

# Prebuild settings
github:
prebuilds:
# enable for the default branch (defaults to true)
main: true
# enable for all branches in this repo (defaults to false)
branches: true
# enable for pull requests coming from this repo (defaults to true)
pullRequests: true
# enable for pull requests coming from forks (defaults to false)
pullRequestsFromForks: true
# add a check to pull requests (defaults to true)
addCheck: true
# add a "Review in Gitpod" button as a comment to pull requests (defaults to false)
addComment: false
# add a "Review in Gitpod" button to the pull request's description (defaults to false)
addBadge: false
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.11.9
12 changes: 12 additions & 0 deletions package/alert/{{ cookiecutter.package_name }}/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
FROM python:3.11.9-alpine

WORKDIR /app

COPY requirements.txt requirements.txt
COPY setup.py setup.py
# Put dependencies in it's own layer as a cache, if you change code only the code layer needs to be rebuild
RUN python3 -m pip install --no-cache-dir -r requirements.txt

COPY src ./src
RUN python3 -m pip install . --no-cache-dir
ENTRYPOINT ["python3", "-m", "alerting.app"]
22 changes: 22 additions & 0 deletions package/alert/{{ cookiecutter.package_name }}/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
.PHONY: build

build:

requirements:
pip-compile requirements.in --no-header

install:
python -m venv venv
( \
. venv/bin/activate; \
pip install -r requirements.txt; \
pip install -r dev-requirements.txt; \
pip install -e .; \
)

lint:
. venv/bin/activate; \
python3 -m black src; \
python3 -m flake8 src; \
python3 -m isort src; \
python3 -m mypy src; \
90 changes: 90 additions & 0 deletions package/alert/{{ cookiecutter.package_name }}/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# {{ cookiecutter.package_name|capitalize}}

## Prerequisites

- [pyenv](https://github.com/pyenv/pyenv) (recommended)
- setting up slack alerting (optional)

### Configuring slack alerting
Before you can test this package, you need to have access to a Slack workspace and be able to create a channel and an application with incoming webhook.

- From your Slack workspace, create a channel you want to use to receive notification conveyor-notifications. The Slack documentation [here](https://slack.com/help/articles/201402297-Create-a-channel) walks you though it.
- From your Slack workspace, create a Slack app and an incoming Webhook. The Slack documentation [here](https://api.slack.com/messaging/webhooks) walks through the necessary steps. Take note of the Incoming Slack Webhook URL.

### Configuring a slack connection in Airflow
First we need to create an Airflow connection to provide your Incoming Slack Webhook URL to airflow.
You can do this using the Airflow UI or by using the secrets backend configuration feature of Conveyor, which is described [here](https://docs.dev.conveyordata.com/how-to-guides/working-with-airflow/airflow-secrets-backend).

- Navigate to https://app.conveyordata.com/environments and select the samples environment
- Navigate to Admin > Connections in airflow
- Add a connection of type HTTP
- Enter slack_webhook as the connection id
- Enter https://hooks.slack.com/services/ as the Host
- Enter the remainder of your Webhook URL as the Password (formatted as T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX)
- Save


## Package Structure

```bash
root/
|-- dags/
| |-- example_dag_alerts.py
|-- pkgs/
| |-- alert.py
| |-- complex_alert.py
|-- src/
| |-- alerting/
| | |-- app.py
|-- README.md
|-- Dockerfile
```

The package code is separated over 3 directories, each with their specific purpose:
- `pkgs`: this directory contains utility functions that you can call from your Airflow dags
- `src`: this contains directory contains more complex source code that you want to use across projects.
This code is packaged in a docker container and can be triggered in Airflow by defining a custom task or by adding a failure callback to your DAG.
- `dags`: this directory is actually not used for Conveyor packages but rather for Conveyor projects. Adding this in the same directory allows you to easily develop your package.
Working with both a project and a package in the same directory is described in more detail [here](https://docs.conveyordata.com/how-to-guides/conveyor-packages/best-practices).

## Getting started
Start using this template as follows:
- create a trial version of your package: `conveyor package trail --version 0.0.1`
- test your package code in a Conveyor project:
- create a sample project in your package directory: `conveyor project create --name testproject`. For more details look [here](https://docs.conveyordata.com/how-to-guides/conveyor-packages/best-practices)
- add the package dependency in your `.conveyor/project.yaml` with content:
```yaml
dependencies:
packages:
- name: <packagename>
versions: [0.0.1]
```
- build and deploy your project to an environment: `conveyor project build && conveyor project deploy --environment <some-environment>`.
If you are developing Airflow tasks in your package, you can also use `conveyor run` to test them.
- trigger the `example-dag-alert-simple-callback` or the `example-dag-alert-complex-callback` dag in Airflow
- make changes to your package and run `conveyor package trial --version 0.0.1` to update the version in Airflow

## Concepts
This template package is created to show how packages can be used to create common code that can be used
by many projects within your organisation. To illustrate this we show one of the most common usecases, namely: adding alerts to dags.

### when to add functions to pkgs directory
Add functions to the pkgs directory when they are simple and can execute within an Airflow worker.
Typical usecases are wrapper functions/operators that abstract away some custom logic within your organisation.
If you need additional python packages to execute your logic, this approach will not work as you cannot customize the Airflow python environment.

#### Steps
- Create python functions in your package that will be processed by Airflow
- trail/release your package
- Refer to your package in the Airflow dag code as follows: `common_alert = packages.load("common.alert", version=1.0.0, trial=True)`

### creating a docker image with common functionality
For more advanced usecases it might be needed to run it using a custom container.
Here you have full flexibility on which python packages that you want to use.

#### Steps
- Write the necessary source code for your package
- Make sure you have a `Dockerfile` and package your src code in the Docker image
- Write a python function in the `pkgs` directory. This will run a container to execute your source code using image: `packages.image()`.
- trail/release your package
- Refer to your package in project dag code using: `common_alert = packages.load("common.alert", trial=True)` and trigger one of your common functions
Empty file.
Empty file.
Loading

0 comments on commit 13ee7c1

Please sign in to comment.