Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tutorial that shows how to set up Prefect Cloud as a platform for data teams #16123

Merged
merged 32 commits into from
Dec 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
60d1bdd
Add stub topics for the operator tutorials
djsauble Nov 21, 2024
8d181a0
Split the author and operator tutorials into separate sub-sections
djsauble Nov 21, 2024
32317b8
Improve the description of the platform tutorial
djsauble Nov 21, 2024
6c92765
Improve the description of the debug tutorial
djsauble Nov 21, 2024
0e95119
Update the names of the tutorial groups to externally-recognizable pe…
djsauble Nov 22, 2024
b786517
Complete the first draft of the 'Set up a data platform' tutorial
djsauble Nov 26, 2024
faf8916
Add missing image
djsauble Nov 26, 2024
5cae5d1
Rename the incidents tutorial to an alerts tutorial
djsauble Nov 26, 2024
1967096
Update the nav in mint.json
djsauble Nov 26, 2024
485d758
Adjust the title of the alert tutorial
djsauble Nov 26, 2024
2a1bdde
Clarify that you need Python 3.9 or newer
djsauble Nov 26, 2024
8a0656d
Hint that there are options other than Docker
djsauble Nov 26, 2024
eb44a2b
Remove mention of the incident tutorial which has been replaced with …
djsauble Nov 26, 2024
21efed5
Provide a caveat that you must provision infrastructure because we're…
djsauble Nov 26, 2024
22ffd56
Remove mention of 'Prefect Cloud' from the section headings since thi…
djsauble Nov 26, 2024
d30cbe1
Reinforce that the script is creating one deployment per team, becaus…
djsauble Nov 26, 2024
457fa65
Rename the platform tutorial to be a bit more accurate
djsauble Dec 2, 2024
4b86b33
Move asides to the 'Next steps' section so they don't distract from t…
djsauble Dec 2, 2024
26f51e1
Merge branch 'main' into operator_tutorials
djsauble Dec 2, 2024
29fde6b
Refocus this branch on platform use cases, including CI/CD
djsauble Dec 2, 2024
e5a5c63
Update the set up a platform tutorial to use workspaces instead of teams
djsauble Dec 2, 2024
fa39a2c
Unlink the CI/CD tutorial from the nav for now
djsauble Dec 2, 2024
46999fb
Merge branch 'main' into operator_tutorials
djsauble Dec 2, 2024
8a07aa3
Remove the CI/CD tutorial
djsauble Dec 2, 2024
63d930a
Put team content at the end in an optional section
djsauble Dec 2, 2024
691ff7f
Merge branch 'main' into operator_tutorials
daniel-prefect Dec 2, 2024
2a150dd
Merge branch 'main' into operator_tutorials
discdiver Dec 2, 2024
0c16de6
Apply suggestions from code review
daniel-prefect Dec 3, 2024
00b6b46
Remove virtual env instructions
djsauble Dec 3, 2024
f38f378
Provide a link with more details about the Teams feature
djsauble Dec 3, 2024
6c9fc2e
Add a link to CLI auth docs
djsauble Dec 3, 2024
ac8398e
Switch home page image to dark mode
djsauble Dec 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions docs/mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,20 @@
{
"group": "Tutorials",
"pages": [
"v3/tutorials/schedule",
"v3/tutorials/pipelines",
"v3/tutorials/scraping"
{
"group": "For data engineers",
"pages": [
"v3/tutorials/schedule",
"v3/tutorials/pipelines",
"v3/tutorials/scraping"
]
},
{
"group": "For platform engineers",
"pages": [
"v3/tutorials/platform"
]
}
],
"version": "v3"
},
Expand Down
Binary file added docs/v3/img/ui/home-page.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
193 changes: 193 additions & 0 deletions docs/v3/tutorials/platform.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
---
title: Set up a platform for data pipelines
description: Give your data team the ability to deploy and run workflows.
---

In this tutorial, you'll learn how to set up a platform for data pipelines with Prefect Cloud.
We'll show you how to create workspaces, work pools, workers, and then deploy flows to the infrastructure you've provisioned.

## Prerequisites

To complete this tutorial, you'll need:

- Git
- Docker
- Python 3.9 or newer
- A forever free [Prefect Cloud](https://app.prefect.cloud/) account

You'll also need to clone the Prefect demo repository and install the prefect library:

```bash
# Clone the demo repository
git clone https://github.com/PrefectHQ/demos.git
cd demos
```

```bash
# Install dependencies
pip install -r requirements.txt
```

```bash
# Show that the Prefect CLI is installed
prefect --version
```

## Create workspaces

We recommend one workspace per development environment.
In this tutorial, we'll create a production workspace and a staging workspace.
To create a workspace on Prefect Cloud, you'll need a forever free account.

1. Head to https://app.prefect.cloud/ and sign in or create an account.
1. If you haven't created a workspace yet, you'll be prompted to choose a name for your workspace.
Name your first workspace `production`.

Next, create a staging workspace:

1. Click the workspace switcher in the sidebar
1. Go to **Switch workspace** / **Create a new workspace**
1. Name this workspace `staging`.

You should now have two workspaces named `production` and `staging`.

## Create work pools

Work pools let you dynamically provision infrastructure for your flows.
First, create a work pool for your production workspace.

1. Switch to the `production` workspace using the workspace switcher in the sidebar.
1. Click **Work pools** in the sidebar.
1. Create a new **Docker** hybrid work pool, named `default-work-pool` and using the default configuration.

Next, repeat this steps and create a second work pool named `default-work-pool` in the `staging` workspace.
You should now have one work pool in each workspace.

## Start workers

Because you're using hybrid work pools, you need to start at least one worker for each work pool to process flow runs.

First, you need to authenticate to Prefect Cloud [using the CLI](/v3/manage/cloud/connect-to-cloud#log-into-prefect-cloud-from-a-terminal).

```bash
prefect cloud login
```
discdiver marked this conversation as resolved.
Show resolved Hide resolved

Once you've authenticated to Prefect Cloud, open two terminal windows and run two workers:

```bash Terminal 1
# Switch to the production workspace
prefect cloud workspace set --workspace "<account>/production"

# Start a worker for the production work pool
prefect worker start --pool default-work-pool
```

```bash Terminal 2
# Switch to the staging workspace
prefect cloud workspace set --workspace "<account>/staging"

# Start a worker for the staging work pool
prefect worker start --pool default-work-pool
```

<Note>
You can use `prefect cloud workspace ls` to see the fully qualified names of the workspaces in your account
</Note>

## Deploy and run flows

Run the following script from the demo repository to create flow runs in each workspace:

```bash
# Run flows in the production workspace
prefect cloud workspace --set "<account>/production"
python simulate_failures.py
```

```bash
# Run flows in the staging workspace
prefect cloud workspace --set "<account>/staging"
python simulate_failures.py --fail-at-run 3
```

Each script invocation will take approximately one minute to finish.
If you check the logs for the worker running in the staging workspace, you will see errors, this is expected.

## View flow run activity

Now that you've deployed and run your flows, you can view the run details in the UI.
If you go to the **Home** page for each workspace, you can see run activity.

![Home page](/v3/img/ui/home-page.png)

Here's what happened:

- The script created two deployments of the flow, one deployment per workspace.
- The script then generated multiple flow runs for each deployment.
- The activity graph on the **Home** page shows successful runs as green and failed runs as red.
- Notice that the flow runs in the `production` workspace succeeded, but some of the flow runs in the `staging` workspace failed.

This output is expected, you'll learn how to debug these failures in the next tutorial.

## Optional: Manage team access

If you're a Prefect Enterprise customer, you can use [Teams](/v3/manage/cloud/manage-users/manage-teams) to manage access to your workspaces and work pools.

<Expandable title="steps">
### Define your team structure

Most data platforms serve multiple teams.
In Prefect Enterprise, you can create teams to control access to your infrastructure.

#### Create teams

First, you need to create teams in your Prefect Cloud account.

1. Click the workspace switcher in the sidebar, and then click **Account settings**.
1. Click **Teams** in the sidebar.
1. Create two teams, named `everyone` and `developers`.
1. Optional: Choose the team you want to add members to, and then click **Add team members**.

#### Grant workspace access

You need to explicitly grant teams access to workspaces.

1. Select the `production` workspace using the workspace switcher.
1. From the workspace switcher, click **Invite and manage members**.
1. In the **Teams** section, add the two teams you created (`everyone` and `developers`) with the **Developer** role.

Repeat these steps for the `staging` workspace.
Your teams now have developer access to both workspaces.

### Grant team access to work pools

By default, your teams have access to all work pools in the workspace.
However, you can partition your infrastructure by granting teams access to only specific work pools.

1. Select the `staging` workspace using the workspace switcher.
1. Click **Work pools** in the sidebar.
1. Open the actions menu for the `default-work-pool` work pool, and then click **Manage Access**.
1. Click **Add Users, Service Accounts, & Teams**.
1. Click the **Actors** dropdown, and then select the `everyone` and `developers` teams.
1. Click the **Access Level** dropdown, and then select **Run**.
1. Click **Add**.

Repeat these steps to grant `developers` access to the `production` work pool.
You've now set up your infrastructure so that the `developers` team can run flows on the work pool in either `production` or `staging`, but the `everyone` team can run flows only on the work pool in `staging`.
</Expandable>

## Next steps

In this tutorial, you successfully set up a platform for data pipelines, and then deployed and ran flows on the infrastructure you provisioned.
If this doesn't perfectly match your use case, here are some variations you can explore:

- You can set up a self-managed instance with [Prefect server](/v3/manage/self-host).
- You can provision users with [SSO](/v3/manage/cloud/manage-users/configure-sso) or just use [service accounts](/v3/manage/cloud/manage-users/service-accounts).
- You can deploy flows with [Kubernetes](/v3/deploy/infrastructure-examples/kubernetes), [serverless compute](/v3/deploy/infrastructure-examples/serverless), or [Prefect Managed infrastructure](/v3/deploy/infrastructure-examples/managed).
- You can [write flows from scratch](/v3/develop/write-flows).
- You can [automate deployments with GitHub Actions](/v3/deploy/infrastructure-concepts/deploy-ci-cd).

<Tip>
Need help? [Book a meeting](https://calendly.com/prefect-experts/prefect-product-advocates?utm_campaign=prefect_docs_cloud&utm_content=prefect_docs&utm_medium=docs&utm_source=docs) with a Prefect Product Advocate to get your questions answered.
</Tip>
Loading