From dadf14cf66b02f7a2aaf6e23b92223de9cd0891f Mon Sep 17 00:00:00 2001 From: Patrik Nordwall Date: Fri, 9 Feb 2024 12:50:00 +0100 Subject: [PATCH] doc: Data partitions (#519) --- docs/src/main/paradox/data-partition.md | 99 +++++++++++ .../main/paradox/images/data-partition.drawio | 163 ++++++++++++++++++ .../main/paradox/images/data-partition.svg | 3 + docs/src/main/paradox/index.md | 1 + 4 files changed, 266 insertions(+) create mode 100644 docs/src/main/paradox/data-partition.md create mode 100644 docs/src/main/paradox/images/data-partition.drawio create mode 100644 docs/src/main/paradox/images/data-partition.svg diff --git a/docs/src/main/paradox/data-partition.md b/docs/src/main/paradox/data-partition.md new file mode 100644 index 00000000..abec2991 --- /dev/null +++ b/docs/src/main/paradox/data-partition.md @@ -0,0 +1,99 @@ +# Data partitioning + +Using a single non-distributed database can become a bottleneck for applications that have high throughput +requirements. To be able to spread the load over more than one database the event journal, snapshot store and +durable state can be split up over multiple tables and physical backend databases. + +The data is partitioned by the slices that are used for @ref[`eventsBySlices`](query.md#eventsbyslices) and +@extref:[Projections](akka-projection:r2dbc.html). You can configure how many data partitions that are needed. +A data partition corresponds to a separate database table. For example, 4 data partitions means that slice range +(0 to 255) maps to data partition 0, (256 to 511) to data partition 1, (512 to 767) to data partition 2, +and (768 to 1023) to data partition 3. + +Number of data partitions must be between 1 and 1024 and a whole number divisor of 1024 (number of slices), e.g. +2, 4, 8, 16. The tables will have the data partition as suffix, e.g. event_journal_0, event_journal_1. + +Those tables can be located in physically separate databases. Number of databases must be a whole number divisor +of number of partitions, and less than or equal to number of partitions. For example, 8 data partitions and 2 databases +means that there will be a total of 8 tables in 2 databases, i.e. 4 tables in each database. + +## Example + +If we configure 8 data partitions and 4 databases it will look like this: + +![Diagram of data partitions](images/data-partition.svg) + +Based on the persistence id an individual entity will map to a specific slice and the entity will read and write to +the table that covers corresponding slice range. + +If we have 16 projection instances each projection instance will consume events from 64 slices. The query to retrieve +events from these slices always map to one single table that covers the slice range. There can be more projection +instances than number of data partitions (tables), but not less. Less projection instances than number of data +partitions would result in queries that would span over more than one table, which would be inefficient and therefore +not allowed. + +Each database may host several of the data partition tables. Each database requires a separate connection factory +and connection pool. + +## Configuration + +The data partitions are configured with: + +@@snip [reference.conf](/core/src/main/resources/reference.conf) { #data-partition-settings } + +When using more than one database you must define corresponding number of connection factories. The connection-factory +setting will have the data partition range as suffix, e.g. with 8 data partitions and +4 databases the connection factory settings would be: + +```hcon +akka.persistence.r2dbc { + data-partition { + number-of-partitions = 8 + number-of-databases = 4 + } + + connection-factory = ${akka.persistence.r2dbc.postgres} + connection-factory-0-1 = ${akka.persistence.r2dbc.connection-factory} + connection-factory-0-1.host = ${?DB_HOST_0} + connection-factory-2-3 = ${akka.persistence.r2dbc.connection-factory} + connection-factory-2-3.host = ${?DB_HOST_1} + connection-factory-4-5 = ${akka.persistence.r2dbc.connection-factory} + connection-factory-4-5.host = ${?DB_HOST_2} + connection-factory-6-7 = ${akka.persistence.r2dbc.connection-factory} + connection-factory-6-7.host = ${?DB_HOST_3} +} +``` + +## Schema + +Each data partition corresponds to a table. You can copy the DDL statements for the tables and indexes from +@ref[Creating the schema](getting-started.md#creating-the-schema) but change the table and index names to include +data partition suffix. For example `event_journal_0`, `event_journal_0_slice_idx`, `event_journal_1`, `event_journal_1_slice_idx`. +Note that the index must also reference the parent table with same data partition suffix. + +## Changing data partitions + +The configuration of data partitions and databases **must not** be changed in a rolling update, since the data must +be moved between the tables and databases if the configuration is changed. The application must be stopped while moving +the data. + +The data can be copied between tables with SQL such as: +```sql +CREATE TABLE event_journal_0 AS (SELECT * FROM event_journal WHERE slice BETWEEN 0 AND 127); +CREATE TABLE event_journal_1 AS (SELECT * FROM event_journal WHERE slice BETWEEN 128 AND 255); +``` + +Remember to also @ref[create the slice index](#schema). + +Alternatively, @ref[create the tables](#schema) first and insert the data with SQL such as: +```sql +INSERT INTO event_journal_0 SELECT * FROM event_journal WHERE slice BETWEEN 0 AND 127; +INSERT INTO event_journal_1 SELECT * FROM event_journal WHERE slice BETWEEN 128 AND 255; +``` + +There are many other ways to move in an efficient way depending on what database you use, such as backups, sqldump and +other export/import tools. + +The number of tables and their names don't change by the number of configured databases. If you think you will +need more than one database in the future it can be good to start with for example 8 data partitions (tables) +in a single database. That will make it easier to move the full tables to the additional databases later. diff --git a/docs/src/main/paradox/images/data-partition.drawio b/docs/src/main/paradox/images/data-partition.drawio new file mode 100644 index 00000000..0aff133d --- /dev/null +++ b/docs/src/main/paradox/images/data-partition.drawio @@ -0,0 +1,163 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/src/main/paradox/images/data-partition.svg b/docs/src/main/paradox/images/data-partition.svg new file mode 100644 index 00000000..bf32bdf2 --- /dev/null +++ b/docs/src/main/paradox/images/data-partition.svg @@ -0,0 +1,3 @@ + + +
Entity A-1
slice 1
event_journal_0
slice range 0-127
event_journal_1
slice range 128-255
event_journal_2
slice range 256-383
event_journal_3
slice range 384-511
event_journal_4
slice range 512-639
event_journal_5
slice range 640-767
event_journal_6
slice range 768-895
event_journal_7
slice range 896-1023
Entity A-2
slice 7
Projection
slice range 0-63
Projection
slice range 64-127
Projection
slice range 128-191
...
...
...
Projection
slice range 832-895
Projection
slice range 896-959
Projection
slice range 960-1023
connection-factory-2-3
slice range 256-511
connection-factory-0-1
slice range 0-255
connection-factory-4-5
slice range 512-767
connection-factory-6-7
slice range 768-1023
Entity A-3
slice 132
Entity A-4
slice 500
Entity A-5
slice 710
\ No newline at end of file diff --git a/docs/src/main/paradox/index.md b/docs/src/main/paradox/index.md index d345bc02..76aa2128 100644 --- a/docs/src/main/paradox/index.md +++ b/docs/src/main/paradox/index.md @@ -15,6 +15,7 @@ The Akka Persistence R2DBC plugin allows for using SQL database with R2DBC as a * [PostgreSQL JSON](postgres_json.md) * [Projection](projection.md) * [Configuration](config.md) +* [Data partitioning](data-partition.md) * [Cleanup tool](cleanup.md) * [Migration tool](migration.md) * [Migration Guide](migration-guide.md)