Merge pull request #13 from teamdatatonic/feature/add-poetry

Build: migrate from pipenv to poetry
teamdatatonic · Jun 2, 2023 · 4d259e3 · 4d259e3
2 parents d2c7c28 + c465a0b
commit 4d259e3
Show file tree

Hide file tree

Showing 20 changed files with 5,171 additions and 4,405 deletions.
diff --git a/.gitignore b/.gitignore
@@ -95,13 +95,6 @@ target/
 profile_default/
 ipython_config.py
 
-# pipenv
-#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-#   However, in case of collaboration, if having platform-specific dependencies or dependencies
-#   having no cross-platform support, pipenv may install dependencies that don't work, or not
-#   install all needed dependencies.
-#Pipfile.lock
-
 # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 __pypackages__/
 

diff --git a/.python-version b/.python-version
@@ -1 +1 @@
-3.7.12
+3.7.12 
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -22,7 +22,7 @@ This guide is chiefly for users wishing to contribute to the opensource version.
 ## Links to Important Resources
 - [pytest](https://docs.pytest.org)
 - [unittest.mock](https://docs.python.org/3/library/unittest.mock.html)
-- [pipenv](https://pipenv-fork.readthedocs.io/en/latest/index.html)
+- [poetry](https://python-poetry.org/docs/#installation)
 - [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/overview)
 - [Vertex AI](https://cloud.google.com/vertex-ai/docs)
 - [AI Platform SDK](https://googleapis.dev/python/aiplatform/latest/index.html)
@@ -211,12 +211,12 @@ def test_vertex_endpoint_uri(output_uri: str):
 ```
 
 ## Adding or changing python dependencies
-We use [pipenv](https://pipenv-fork.readthedocs.io/en/latest/index.html) to handle our packages and their dependencies. Each group of pipeline components (e.g. [aiplatform](./pipeline_components/aiplatform/)) containers its own pipenv environment, and there is a [separate pipenv environment](./pipelines/) for the ML pipelines themselves and the pipeline trigger code.
+We use [poetry](https://python-poetry.org/docs/#installation) to handle our packages and their dependencies. Each group of pipeline components (e.g. [vertex](./components/vertex-components/)) includes its own poetry environment, and there is a [separate poetry environment](./pipelines/) for the ML pipelines themselves and the pipeline trigger code.
 
 ### Adding python dependencies
 You may need to add new packages for your own use cases. To do this, run the following from the relevant directory ([pipelines](./pipelines) for the main ML pipeline dependencies or the directory of the relevant component group e.g. [aiplatform](./pipeline_components/aiplatform/)):
 ```
-pipenv install <package name>
+poetry install <package name>
 ```
 
 ## Committing Changes
@@ -260,7 +260,7 @@ make pre-commit
 - **Checks fail and displays an error message**. Some errors cannot be automatically fixed by pre-commit hooks, and instead they will display the error number and the file and line which failed. For more details beyond the error message, you can look up the error number online. The most common errors are caused by lines which exceed the character limit. Once you identify the cause of the error, you will need to fix this in your code, add the edited file to the staging area, and then commit again.
 
 ### Commit changes to Python packages and dependencies
-If you have changes to `Pipfile` and `Pipfile.lock`, please make sure you commit these files!
+If you have changes to `pyproject.toml` and `poetry.lock`, please make sure you commit these files!
 
 ## Makefile
 This project contains a [Makefile](Makefile) which contains "rules" describing the commands to be executed by the system. These allow you to quickly and easily run commands for specific purposes, for example running all of the unit-tests, or compiling a pipeline. You can find the full set of available `make` rules by running:

diff --git a/Makefile b/Makefile
@@ -19,25 +19,26 @@ help: ## Display this help screen
 	@grep -h -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}'
 
 pre-commit: ## Runs the pre-commit checks over entire repo
-	@cd pipelines && \
-	pipenv run pre-commit run --all-files
+	cd pipelines && \
+	poetry run pre-commit run --all-files
 
 setup: ## Set up local environment for Python development on pipelines
-	@pip install pipenv && \
+	@pip install pip --upgrade && \
+	pip install poetry --upgrade && \
 	cd pipelines && \
-	pipenv install --dev
+	poetry install --with dev
 
 test-trigger: ## Runs unit tests for the pipeline trigger code
 	@cd pipelines && \
-	pipenv run python -m pytest tests/trigger
+	poetry run python -m pytest tests/trigger
 
 compile-pipeline: ## Compile the pipeline to training.json or prediction.json. Must specify pipeline=<training|prediction>
 	@cd pipelines/src && \
-	pipenv run python -m pipelines.${PIPELINE_TEMPLATE}.${pipeline}.pipeline
+	poetry run python -m pipelines.${PIPELINE_TEMPLATE}.${pipeline}.pipeline
 
 setup-components: ## Run unit tests for a component group
 	@cd "components/${GROUP}" && \
-	pipenv install --dev
+	poetry install --with dev
 
 setup-all-components: ## Run unit tests for all pipeline components
 	@set -e && \
@@ -48,7 +49,7 @@ setup-all-components: ## Run unit tests for all pipeline components
 
 test-components: ## Run unit tests for a component group
 	@cd "components/${GROUP}" && \
-	pipenv run pytest
+	poetry run pytest
 
 test-all-components: ## Run unit tests for all pipeline components
 	@set -e && \
@@ -59,11 +60,11 @@ test-all-components: ## Run unit tests for all pipeline components
 
 test-components-coverage: ## Run tests with coverage
 	@cd "components/${GROUP}" && \
-	pipenv run coverage run -m pytest && \
-	pipenv run coverage report -m
+	poetry run coverage run -m pytest && \
+	poetry run coverage report -m
 
 test-all-components-coverage: ## Run tests with coverage
-		@set -e && \
+	@set -e && \
 	for component_group in components/*/ ; do \
 		echo "Test components under $$component_group" && \
 		$(MAKE) test-components-coverage GROUP=$$(basename $$component_group) ; \
@@ -81,7 +82,7 @@ run: ## Compile pipeline, copy assets to GCS, and run pipeline in sandbox enviro
 	@ $(MAKE) compile-pipeline && \
 	$(MAKE) sync-assets && \
 	cd pipelines/src && \
-	pipenv run python -m pipelines.trigger --template_path=./$(pipeline).json --enable_caching=$(enable_pipeline_caching)
+	poetry run python -m pipelines.trigger --template_path=./$(pipeline).json --enable_caching=$(enable_pipeline_caching)
 
 sync_assets ?= true
 e2e-tests: ## (Optionally) copy assets to GCS, and perform end-to-end (E2E) pipeline tests. Must specify pipeline=<training|prediction>. Optionally specify enable_pipeline_caching=<true|false> (defaults to default Vertex caching behaviour). Optionally specify sync_assets=<true|false> (defaults to true)
@@ -91,7 +92,7 @@ e2e-tests: ## (Optionally) copy assets to GCS, and perform end-to-end (E2E) pipe
 		echo "Skipping syncing assets to GCS"; \
     fi && \
 	cd pipelines && \
-	pipenv run pytest --log-cli-level=INFO tests/${PIPELINE_TEMPLATE}/$(pipeline) --enable_caching=$(enable_pipeline_caching)
+	poetry run pytest --log-cli-level=INFO tests/${PIPELINE_TEMPLATE}/$(pipeline) --enable_caching=$(enable_pipeline_caching)
 
 env ?= dev
 deploy-infra: ## Deploy the Terraform infrastructure to your project. Requires VERTEX_PROJECT_ID and VERTEX_LOCATION env variables to be set in env.sh. Optionally specify env=<dev|test|prod> (default = dev)

diff --git a/README.md b/README.md
@@ -55,11 +55,19 @@ In a production MLOps solution, your ML pipelines need to be repeatable. So, we
 
 1. Clone the repository locally
 1. Install Python: `pyenv install`
-1. Install pipenv and pipenv dependencies: `make setup`
-1. Install pre-commit hooks: `cd pipelines && pipenv run pre-commit install`
+1. Install poetry and poetry dependencies: `make setup`
+1. Install pre-commit hooks: `cd pipelines && poetry run pre-commit install`
 1. Copy `env.sh.example` to `env.sh`, and update the environment variables in `env.sh`
 1. Load the environment variables in `env.sh` by running `source env.sh`
 
+Note: `poetry install` or `poetry add`, installs packages within the project's virtual environment. 
+If you use `pip` directly, you might accidentally install packages globally or in the wrong environment, leading to conflicts or difficulties in managing dependencies.
+
+### Configuring poetry to detect python version using pyenv
+
+1. Run `poetry config virtualenvs.prefer-active-python true`
+1. Install project dependencies using `poetry install`
+
 ### Deploying Cloud Infrastructure
 
 The cloud infrastructure is managed using Terraform and is defined in the [`terraform`](terraform) directory. There are three Terraform modules defined in [`terraform/modules`](terraform/modules):

diff --git a/cloudbuild/pr-checks.yaml b/cloudbuild/pr-checks.yaml
@@ -14,7 +14,7 @@
 ---
 steps:
 
-  # Install pipenv and deps, run pre-commit and unit tests
+  # Install poetry and deps, run pre-commit and unit tests
   # Then compile pipelines (to make sure they can compile)
   # need to run "git init" for pre-commit checks to work
   - name: python:3.7
@@ -37,5 +37,5 @@ steps:
 options:
   logging: CLOUD_LOGGING_ONLY
 
-# Increase timeout to allow pipenv to resolve dependencies
+# Increase timeout to allow poetry to resolve dependencies
 timeout: 3600s
diff --git a/cloudbuild/release.yaml b/cloudbuild/release.yaml
@@ -14,7 +14,7 @@
 ---
 steps:
 
-  # Install pipenv, install deps, compile pipelines
+  # Install poetry, install deps, compile pipelines
   - name: python:3.7
     entrypoint: /bin/sh
     args:

diff --git a/components/README.md b/components/README.md
@@ -4,22 +4,21 @@ This directory contains multiple Python packages that are used to define pipelin
 
 ## Creating a new pipeline components package
 
-To create a new set of components (with different Python dependencies), copy one of the existing subdirectories and rename the different files and directories as appropriate (e.g. `bigquery-components` -> `my-new-components`). You will also need to update any references in the Python files themselves, as well as the `Pipfile` and `pyproject.toml`.
+To create a new set of components (with different Python dependencies), copy one of the existing subdirectories and rename the different files and directories as appropriate (e.g. `vertex-components` -> `my-new-components`). You will also need to update any references in the Python files themselves, as well as `poetry.lock` and `pyproject.toml`.
 
-Your Python dependencies should be defined in `Pipfile`, `pyproject.toml`, and in `packages_to_install` (in the `@component` decorator):
+Your Python dependencies should be defined in `poetry.lock`, `pyproject.toml`, and in `packages_to_install` (in the `@component` decorator):
 
-- In `Pipfile`, add `kfp` to the `[packages]` section (pinned to a specific version), and add any dependencies that your component uses under `[dev-packages]` (each pinned to a specific version)
-- In `pyproject.toml`, add `kfp` to the `[dependencies]` section (pinned to a specific version), and add any dependencies that your component uses under `[[project.optional-dependencies]` -> `tests` (each pinned to a specific version)
+- In `pyproject.toml`, add `kfp` to the `[dependencies]` section (pinned to a specific version), and add any dependencies that your component uses under `[tool.poetry.dependencies]`(each pinned to a specific version)
 - In `packages_to_install` (in the `@component` decorator used to define your component), add any dependencies that your component uses (each pinned to a specific version)
 
 Define your pipeline components using the `@component` decorator in Python files under `my-new-components/src/my-new-components`. You will need to update the `__init__.py` file to provide tests - see the [Kubeflow Pipelines documentation](https://www.kubeflow.org/docs/components/pipelines/v1/sdk-v2/python-function-components/#building-python-function-based-components) for more information about writing pipeline components.
 
-Finally, you will need to install this new components package into the [`pipelines`](../pipelines) package. In [`pipelines/Pipfile`](../pipelines/Pipfile), add the following line to the `packages` section:
-```ini
-my-new-components = {editable = true, path = "./../components/my-new-components"}
-```
+Finally, you will need to install this new components package into the [`pipelines`](../pipelines) package. In [`pipelines/pyproject.toml`](../pipelines/pyproject.toml), add the following line to the `tool.poetry.dependencies` section:
 
-Once you have added this line to [`pipelines/Pipfile`](../pipelines/Pipfile), run `make setup` from the root of the repository to install the new components package into the `pipelines` package.
+```
+my-new-components = { path = "../components/my-new-components", develop = true }
+```
+Once you have added this line to [`pipelines/pyproject.toml`](../pipelines/pyproject.toml), run `make setup` from the root of the repository to install the new components package into the `pipelines` package.
 
 ## Testing components
 

diff --git a/components/bigquery-components/Pipfile b/components/bigquery-components/Pipfile