Skip to content

Commit

Permalink
Added more information about partitions
Browse files Browse the repository at this point in the history
  • Loading branch information
maheshwarip committed Oct 17, 2024
1 parent 67c4f17 commit 22c29e2
Showing 1 changed file with 30 additions and 0 deletions.
30 changes: 30 additions & 0 deletions src/content/docs/pipelines/configuration/partition-filenames.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
pcx_content_type: concept
title: Partitions, Filenames and Filepaths
sidebar:
order: 11

---

## Partitions
Partitioning organizes data into directories based on specific fields to improve query performance. It helps by reducing the amount of data scanned for queries, enabling faster reads. By default, Pipelines partitions data by event date. This will be customizable in the future.

For example, the output from a Pipeline in your R2 bucket might look like this:
```sh
- event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz
- event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz
```

## Filepath
Customizing the filepath allows you to store data with a specific prefix inside your specified R2 bucket. The data will remain partitioned by date.

To modify the prefix for a Pipeline using Wrangler:
```sh
wrangler pipelines update <pipeline-name> --filepath "test"
```

All the output records generated by your pipeline will be stored under the prefix "test", and will look like this:
```sh
- test/event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz
- test/event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz
```

0 comments on commit 22c29e2

Please sign in to comment.