Skip to content

Commit

Permalink
Merge pull request #13 from teamdatatonic/feature/add-poetry
Browse files Browse the repository at this point in the history
Build: migrate from pipenv to poetry
  • Loading branch information
felix-datatonic authored Jun 2, 2023
2 parents d2c7c28 + c465a0b commit 4d259e3
Show file tree
Hide file tree
Showing 20 changed files with 5,171 additions and 4,405 deletions.
7 changes: 0 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -95,13 +95,6 @@ target/
profile_default/
ipython_config.py

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

Expand Down
2 changes: 1 addition & 1 deletion .python-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.7.12
3.7.12
8 changes: 4 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ This guide is chiefly for users wishing to contribute to the opensource version.
## Links to Important Resources
- [pytest](https://docs.pytest.org)
- [unittest.mock](https://docs.python.org/3/library/unittest.mock.html)
- [pipenv](https://pipenv-fork.readthedocs.io/en/latest/index.html)
- [poetry](https://python-poetry.org/docs/#installation)
- [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/overview)
- [Vertex AI](https://cloud.google.com/vertex-ai/docs)
- [AI Platform SDK](https://googleapis.dev/python/aiplatform/latest/index.html)
Expand Down Expand Up @@ -211,12 +211,12 @@ def test_vertex_endpoint_uri(output_uri: str):
```

## Adding or changing python dependencies
We use [pipenv](https://pipenv-fork.readthedocs.io/en/latest/index.html) to handle our packages and their dependencies. Each group of pipeline components (e.g. [aiplatform](./pipeline_components/aiplatform/)) containers its own pipenv environment, and there is a [separate pipenv environment](./pipelines/) for the ML pipelines themselves and the pipeline trigger code.
We use [poetry](https://python-poetry.org/docs/#installation) to handle our packages and their dependencies. Each group of pipeline components (e.g. [vertex](./components/vertex-components/)) includes its own poetry environment, and there is a [separate poetry environment](./pipelines/) for the ML pipelines themselves and the pipeline trigger code.

### Adding python dependencies
You may need to add new packages for your own use cases. To do this, run the following from the relevant directory ([pipelines](./pipelines) for the main ML pipeline dependencies or the directory of the relevant component group e.g. [aiplatform](./pipeline_components/aiplatform/)):
```
pipenv install <package name>
poetry install <package name>
```

## Committing Changes
Expand Down Expand Up @@ -260,7 +260,7 @@ make pre-commit
- **Checks fail and displays an error message**. Some errors cannot be automatically fixed by pre-commit hooks, and instead they will display the error number and the file and line which failed. For more details beyond the error message, you can look up the error number online. The most common errors are caused by lines which exceed the character limit. Once you identify the cause of the error, you will need to fix this in your code, add the edited file to the staging area, and then commit again.

### Commit changes to Python packages and dependencies
If you have changes to `Pipfile` and `Pipfile.lock`, please make sure you commit these files!
If you have changes to `pyproject.toml` and `poetry.lock`, please make sure you commit these files!

## Makefile
This project contains a [Makefile](Makefile) which contains "rules" describing the commands to be executed by the system. These allow you to quickly and easily run commands for specific purposes, for example running all of the unit-tests, or compiling a pipeline. You can find the full set of available `make` rules by running:
Expand Down
27 changes: 14 additions & 13 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -19,25 +19,26 @@ help: ## Display this help screen
@grep -h -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}'

pre-commit: ## Runs the pre-commit checks over entire repo
@cd pipelines && \
pipenv run pre-commit run --all-files
cd pipelines && \
poetry run pre-commit run --all-files

setup: ## Set up local environment for Python development on pipelines
@pip install pipenv && \
@pip install pip --upgrade && \
pip install poetry --upgrade && \
cd pipelines && \
pipenv install --dev
poetry install --with dev

test-trigger: ## Runs unit tests for the pipeline trigger code
@cd pipelines && \
pipenv run python -m pytest tests/trigger
poetry run python -m pytest tests/trigger

compile-pipeline: ## Compile the pipeline to training.json or prediction.json. Must specify pipeline=<training|prediction>
@cd pipelines/src && \
pipenv run python -m pipelines.${PIPELINE_TEMPLATE}.${pipeline}.pipeline
poetry run python -m pipelines.${PIPELINE_TEMPLATE}.${pipeline}.pipeline

setup-components: ## Run unit tests for a component group
@cd "components/${GROUP}" && \
pipenv install --dev
poetry install --with dev

setup-all-components: ## Run unit tests for all pipeline components
@set -e && \
Expand All @@ -48,7 +49,7 @@ setup-all-components: ## Run unit tests for all pipeline components

test-components: ## Run unit tests for a component group
@cd "components/${GROUP}" && \
pipenv run pytest
poetry run pytest

test-all-components: ## Run unit tests for all pipeline components
@set -e && \
Expand All @@ -59,11 +60,11 @@ test-all-components: ## Run unit tests for all pipeline components

test-components-coverage: ## Run tests with coverage
@cd "components/${GROUP}" && \
pipenv run coverage run -m pytest && \
pipenv run coverage report -m
poetry run coverage run -m pytest && \
poetry run coverage report -m

test-all-components-coverage: ## Run tests with coverage
@set -e && \
@set -e && \
for component_group in components/*/ ; do \
echo "Test components under $$component_group" && \
$(MAKE) test-components-coverage GROUP=$$(basename $$component_group) ; \
Expand All @@ -81,7 +82,7 @@ run: ## Compile pipeline, copy assets to GCS, and run pipeline in sandbox enviro
@ $(MAKE) compile-pipeline && \
$(MAKE) sync-assets && \
cd pipelines/src && \
pipenv run python -m pipelines.trigger --template_path=./$(pipeline).json --enable_caching=$(enable_pipeline_caching)
poetry run python -m pipelines.trigger --template_path=./$(pipeline).json --enable_caching=$(enable_pipeline_caching)

sync_assets ?= true
e2e-tests: ## (Optionally) copy assets to GCS, and perform end-to-end (E2E) pipeline tests. Must specify pipeline=<training|prediction>. Optionally specify enable_pipeline_caching=<true|false> (defaults to default Vertex caching behaviour). Optionally specify sync_assets=<true|false> (defaults to true)
Expand All @@ -91,7 +92,7 @@ e2e-tests: ## (Optionally) copy assets to GCS, and perform end-to-end (E2E) pipe
echo "Skipping syncing assets to GCS"; \
fi && \
cd pipelines && \
pipenv run pytest --log-cli-level=INFO tests/${PIPELINE_TEMPLATE}/$(pipeline) --enable_caching=$(enable_pipeline_caching)
poetry run pytest --log-cli-level=INFO tests/${PIPELINE_TEMPLATE}/$(pipeline) --enable_caching=$(enable_pipeline_caching)

env ?= dev
deploy-infra: ## Deploy the Terraform infrastructure to your project. Requires VERTEX_PROJECT_ID and VERTEX_LOCATION env variables to be set in env.sh. Optionally specify env=<dev|test|prod> (default = dev)
Expand Down
12 changes: 10 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,19 @@ In a production MLOps solution, your ML pipelines need to be repeatable. So, we

1. Clone the repository locally
1. Install Python: `pyenv install`
1. Install pipenv and pipenv dependencies: `make setup`
1. Install pre-commit hooks: `cd pipelines && pipenv run pre-commit install`
1. Install poetry and poetry dependencies: `make setup`
1. Install pre-commit hooks: `cd pipelines && poetry run pre-commit install`
1. Copy `env.sh.example` to `env.sh`, and update the environment variables in `env.sh`
1. Load the environment variables in `env.sh` by running `source env.sh`

Note: `poetry install` or `poetry add`, installs packages within the project's virtual environment.
If you use `pip` directly, you might accidentally install packages globally or in the wrong environment, leading to conflicts or difficulties in managing dependencies.

### Configuring poetry to detect python version using pyenv

1. Run `poetry config virtualenvs.prefer-active-python true`
1. Install project dependencies using `poetry install`

### Deploying Cloud Infrastructure

The cloud infrastructure is managed using Terraform and is defined in the [`terraform`](terraform) directory. There are three Terraform modules defined in [`terraform/modules`](terraform/modules):
Expand Down
4 changes: 2 additions & 2 deletions cloudbuild/pr-checks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
---
steps:

# Install pipenv and deps, run pre-commit and unit tests
# Install poetry and deps, run pre-commit and unit tests
# Then compile pipelines (to make sure they can compile)
# need to run "git init" for pre-commit checks to work
- name: python:3.7
Expand All @@ -37,5 +37,5 @@ steps:
options:
logging: CLOUD_LOGGING_ONLY

# Increase timeout to allow pipenv to resolve dependencies
# Increase timeout to allow poetry to resolve dependencies
timeout: 3600s
2 changes: 1 addition & 1 deletion cloudbuild/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
---
steps:

# Install pipenv, install deps, compile pipelines
# Install poetry, install deps, compile pipelines
- name: python:3.7
entrypoint: /bin/sh
args:
Expand Down
17 changes: 8 additions & 9 deletions components/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,21 @@ This directory contains multiple Python packages that are used to define pipelin

## Creating a new pipeline components package

To create a new set of components (with different Python dependencies), copy one of the existing subdirectories and rename the different files and directories as appropriate (e.g. `bigquery-components` -> `my-new-components`). You will also need to update any references in the Python files themselves, as well as the `Pipfile` and `pyproject.toml`.
To create a new set of components (with different Python dependencies), copy one of the existing subdirectories and rename the different files and directories as appropriate (e.g. `vertex-components` -> `my-new-components`). You will also need to update any references in the Python files themselves, as well as `poetry.lock` and `pyproject.toml`.

Your Python dependencies should be defined in `Pipfile`, `pyproject.toml`, and in `packages_to_install` (in the `@component` decorator):
Your Python dependencies should be defined in `poetry.lock`, `pyproject.toml`, and in `packages_to_install` (in the `@component` decorator):

- In `Pipfile`, add `kfp` to the `[packages]` section (pinned to a specific version), and add any dependencies that your component uses under `[dev-packages]` (each pinned to a specific version)
- In `pyproject.toml`, add `kfp` to the `[dependencies]` section (pinned to a specific version), and add any dependencies that your component uses under `[[project.optional-dependencies]` -> `tests` (each pinned to a specific version)
- In `pyproject.toml`, add `kfp` to the `[dependencies]` section (pinned to a specific version), and add any dependencies that your component uses under `[tool.poetry.dependencies]`(each pinned to a specific version)
- In `packages_to_install` (in the `@component` decorator used to define your component), add any dependencies that your component uses (each pinned to a specific version)

Define your pipeline components using the `@component` decorator in Python files under `my-new-components/src/my-new-components`. You will need to update the `__init__.py` file to provide tests - see the [Kubeflow Pipelines documentation](https://www.kubeflow.org/docs/components/pipelines/v1/sdk-v2/python-function-components/#building-python-function-based-components) for more information about writing pipeline components.

Finally, you will need to install this new components package into the [`pipelines`](../pipelines) package. In [`pipelines/Pipfile`](../pipelines/Pipfile), add the following line to the `packages` section:
```ini
my-new-components = {editable = true, path = "./../components/my-new-components"}
```
Finally, you will need to install this new components package into the [`pipelines`](../pipelines) package. In [`pipelines/pyproject.toml`](../pipelines/pyproject.toml), add the following line to the `tool.poetry.dependencies` section:

Once you have added this line to [`pipelines/Pipfile`](../pipelines/Pipfile), run `make setup` from the root of the repository to install the new components package into the `pipelines` package.
```
my-new-components = { path = "../components/my-new-components", develop = true }
```
Once you have added this line to [`pipelines/pyproject.toml`](../pipelines/pyproject.toml), run `make setup` from the root of the repository to install the new components package into the `pipelines` package.

## Testing components

Expand Down
14 changes: 0 additions & 14 deletions components/bigquery-components/Pipfile

This file was deleted.

Loading

0 comments on commit 4d259e3

Please sign in to comment.