diff --git a/src/content/docs/pipelines/configuration/partition-filenames.mdx b/src/content/docs/pipelines/configuration/partition-filenames.mdx new file mode 100644 index 00000000000000..3fb187f20cbba6 --- /dev/null +++ b/src/content/docs/pipelines/configuration/partition-filenames.mdx @@ -0,0 +1,30 @@ +--- +pcx_content_type: concept +title: Partitions, Filenames and Filepaths +sidebar: + order: 11 + +--- + +## Partitions +Partitioning organizes data into directories based on specific fields to improve query performance. It helps by reducing the amount of data scanned for queries, enabling faster reads. By default, Pipelines partitions data by event date. This will be customizable in the future. + +For example, the output from a Pipeline in your R2 bucket might look like this: +```sh +- event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz +- event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz +``` + +## Filepath +Customizing the filepath allows you to store data with a specific prefix inside your specified R2 bucket. The data will remain partitioned by date. + +To modify the prefix for a Pipeline using Wrangler: +```sh +wrangler pipelines update --filepath "test" +``` + +All the output records generated by your pipeline will be stored under the prefix "test", and will look like this: +```sh +- test/event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz +- test/event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz +```