-
Notifications
You must be signed in to change notification settings - Fork 26
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feature: add scripts to backup/restore a database (#2173)
* Add scripts to save/restore data from the emulator. Saving will copy the schema and data into a `backup` directory. Restore will load the schema and data from the `backup` directory respecting the table hierarchy. * test: add an integration test for save/restore * fix: according to review feedback * fix: break up large COPY operations * fix: use position in unique constraint --------- Co-authored-by: Knut Olav Løite <[email protected]>
- Loading branch information
Showing
5 changed files
with
535 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
backup | ||
create_foreign_keys.sql | ||
drop_foreign_keys.sql |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# Backup and restore a Spanner database | ||
|
||
This folder contains scripts to backup a Spanner database and restore it using | ||
local files. | ||
|
||
**NOTE:** While these scripts may work for production databases, it is | ||
currently intended for use with the Spanner emulator to achieve persistence, so | ||
*it has not been tested for production use*. | ||
|
||
## Try it out | ||
|
||
1. For usage with the emulator, please follow the | ||
[instructions](https://github.com/GoogleCloudPlatform/pgadapter/blob/-/docs/emulator.md#google-cloud-spanner-pgadapter---emulator) | ||
to setup PGAdapter and Spanner Emulator. | ||
|
||
2. Create a Spanner database, then create a schema and store data. | ||
|
||
3. Ensure the following environment variables are set: | ||
|
||
``` | ||
# For production usage, the full path of the Google Cloud credentials file that | ||
should be used for PGAdapter. | ||
GOOGLE_APPLICATION_CREDENTIALS=/local/path/to/credentials.json | ||
# Project, instance and database for Spanner. | ||
GOOGLE_CLOUD_PROJECT=my-project | ||
SPANNER_INSTANCE=my-instance | ||
SPANNER_DATABASE=my-database | ||
# Host and port the PGAdapter is deployed on. | ||
PGADAPTER_HOST=localhost | ||
PGADAPTER_PORT=5432 | ||
``` | ||
|
||
3. Save the schema and data of the Spanner database locally. | ||
|
||
```shell | ||
./backup-database.sh | ||
|
||
``` | ||
|
||
By default, the schema is saved in `./backup/schema.sql` and the data within | ||
individual files per table in the `./backup/data/` folder. | ||
|
||
4. If using the emulator, restart it then re-create the instance and database. | ||
|
||
5. Restore the schema and data back to Spanner from the locally stored sql files. | ||
|
||
```shell | ||
./restore-database.sh | ||
``` | ||
|
||
The restore database can be different from the source database. | ||
|
||
```shell | ||
spanner_restore_database=new-database ./restore-database.sh | ||
``` | ||
|
||
## Limitations | ||
|
||
- The scripts currently only support backing up and restoring data in the | ||
'public' schema. They will not backup and restore the schema or data under | ||
named schemas. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
#!/bin/bash | ||
|
||
# This script exports the schema and data from a Spanner database to a local | ||
# file. It also exports the DDL statements that are needed to re-create the | ||
# Spanner database. The combination of the pg_dump files and the schema files | ||
# can be used to re-create the Spanner database. | ||
|
||
set -e | ||
|
||
if [ -z "$schema_file" ]; then schema_file="./backup/schema.sql"; fi; | ||
if [ -z "$data_dir" ]; then data_dir="./backup/data"; fi; | ||
|
||
echo | ||
echo "--- BACKING UP SPANNER DATABASE ---" | ||
echo "Schema file will be written to $schema_file." | ||
echo "Data will be written to $data_dir. Any existing data in that directory will be overwritten." | ||
|
||
echo | ||
read -p "Continue? [Yn]" -n 1 -r | ||
echo | ||
if [[ $REPLY =~ [Nn]$ ]] | ||
then | ||
[[ "$0" = "$BASH_SOURCE" ]] && exit 1 || return 1 | ||
fi | ||
|
||
# Create the output folder if it does not already exist. | ||
mkdir -p "$data_dir" | ||
|
||
# Export the current schema of the Spanner database. | ||
echo "Exporting schema of the Spanner database $SPANNER_DATABASE into $schema_file" | ||
psql -v ON_ERROR_STOP=1 \ | ||
--host "$PGADAPTER_HOST" \ | ||
--port "$PGADAPTER_PORT" \ | ||
--dbname "$SPANNER_DATABASE" \ | ||
-qAtX \ | ||
-c "show database ddl" > "$schema_file" | ||
|
||
echo "COPYING DATA FROM SPANNER TO LOCAL FILE" | ||
|
||
# Then copy all data from Spanner to the local file. | ||
# We will do that at a fixed timestamp to ensure that we copy a consistent set of data. | ||
READ_TIMESTAMP=$(psql -h "$PGADAPTER_HOST" \ | ||
-p "$PGADAPTER_PORT" \ | ||
-d "$SPANNER_DATABASE" \ | ||
-c "set time zone utc; select now()" -qAtX) | ||
READ_TIMESTAMP="${READ_TIMESTAMP/ /T}" | ||
READ_TIMESTAMP="${READ_TIMESTAMP/+00/Z}" | ||
echo "Reading data from Spanner using timestamp $READ_TIMESTAMP" | ||
echo "set spanner.read_only_staleness='read_timestamp $READ_TIMESTAMP'" | ||
echo "" | ||
echo " --- COPYING DATA WITH MAX_PARALLELISM $max_parallelism. --- " | ||
echo "Set the 'max_parallelism' variable to increase the number of COPY operations that will run in parallel." | ||
echo "" | ||
|
||
table_names=() | ||
|
||
psql -v ON_ERROR_STOP=1 -h "$PGADAPTER_HOST" -p "$PGADAPTER_PORT" -d "$SPANNER_DATABASE" -qAtX \ | ||
-c "select tablename from pg_catalog.pg_tables where schemaname='public';" \ | ||
| while read table_name ; do | ||
column_names=$(psql -h "$PGADAPTER_HOST" -p "$PGADAPTER_PORT" -d "$SPANNER_DATABASE" -qAtX \ | ||
-c "select array_to_string(array_agg(column_name), ',') as c from information_schema.columns where table_name='$table_name' and table_schema='public' and not is_generated='ALWAYS';") | ||
|
||
sql_file="$data_dir/$table_name.sql" | ||
|
||
echo "" | ||
echo "Copying data for $table_name" | ||
echo "SET SPANNER.AUTOCOMMIT_DML_MODE='PARTITIONED_NON_ATOMIC';" > $sql_file | ||
echo "COPY $table_name ($column_names) FROM stdin;" >> $sql_file | ||
# Set the read_only_staleness to use in PGOPTIONS. This will automatically be included by psql | ||
# when connecting to PGAdapter. | ||
PGOPTIONS="-c spanner.read_only_staleness='read_timestamp $READ_TIMESTAMP'" \ | ||
psql -v ON_ERROR_STOP=1 \ | ||
-h "$PGADAPTER_HOST" \ | ||
-p "$PGADAPTER_PORT" \ | ||
-d "$SPANNER_DATABASE" \ | ||
-c "copy $table_name ($column_names) to stdout" \ | ||
>> $sql_file | ||
done | ||
|
||
echo "Finished exporting database $SPANNER_DATABASE" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,187 @@ | ||
#!/bin/bash | ||
|
||
# This script imports the schema and data from a local file to a Spanner | ||
# database. | ||
|
||
set -e | ||
|
||
if [ -z "$spanner_restore_database" ]; then spanner_restore_database="$SPANNER_DATABASE"; fi; | ||
if [ -z "$schema_file" ]; then schema_file="./backup/schema.sql"; fi; | ||
if [ -z "$data_dir" ]; then data_dir="./backup/data"; fi; | ||
|
||
echo | ||
echo "--- RESTORING SPANNER DATABASE ---" | ||
echo "Schema file will be loaded from $schema_file." | ||
echo "Data will be loaded from $data_dir." | ||
|
||
echo | ||
read -p "Continue? [Yn]" -n 1 -r | ||
echo | ||
if [[ $REPLY =~ [Nn]$ ]] | ||
then | ||
[[ "$0" = "$BASH_SOURCE" ]] && exit 1 || return 1 | ||
fi | ||
|
||
# Check whether the Spanner database is empty. That is; no user-tables. | ||
table_count=$(psql -v ON_ERROR_STOP=1 -h "$PGADAPTER_HOST" -p "$PGADAPTER_PORT" \ | ||
-d "$spanner_restore_database" \ | ||
-qAtX \ | ||
-c "select count(1) from information_schema.tables where table_schema not in ('information_schema', 'spanner_sys', 'pg_catalog')") | ||
if [[ ! $table_count = "0" ]] | ||
then | ||
echo "Number of tables found in the restore database: $table_count" | ||
echo "The destination Spanner database is not empty." | ||
echo "This script only supports restoring into an empty database." | ||
[[ "$0" = "$BASH_SOURCE" ]] && exit 1 || return 1 | ||
fi | ||
|
||
# Create the schema on the Spanner database. We run the script between | ||
# START BATCH DDL and RUN BATCH to run the entire script as one batch. | ||
echo "Creating schema on Spanner database $spanner_restore_database" | ||
psql -v ON_ERROR_STOP=1 -a \ | ||
--host "$PGADAPTER_HOST" \ | ||
--port "$PGADAPTER_PORT" \ | ||
-d "$spanner_restore_database" \ | ||
-c "start batch ddl" \ | ||
-f "$schema_file" \ | ||
-c "run batch" | ||
|
||
# Generate a script to drop all foreign keys on Spanner. | ||
psql -v ON_ERROR_STOP=1 \ | ||
-h "$PGADAPTER_HOST" \ | ||
-p "$PGADAPTER_PORT" \ | ||
-d "$spanner_restore_database" \ | ||
-qAtX \ | ||
-c "select 'alter table \"' || tc.table_schema || '\".\"' || tc.table_name || '\"' | ||
|| ' drop constraint \"' || fk.constraint_name || '\";' | ||
from information_schema.referential_constraints fk | ||
inner join information_schema.table_constraints tc | ||
on fk.constraint_catalog=tc.constraint_catalog | ||
and fk.constraint_schema=tc.constraint_schema | ||
and fk.constraint_name=tc.constraint_name;" > drop_foreign_keys.sql | ||
|
||
# Generate a script that can be used to re-create all foreign key constraints | ||
# on the database. | ||
psql -v ON_ERROR_STOP=1 \ | ||
-h "$PGADAPTER_HOST" \ | ||
-p "$PGADAPTER_PORT" \ | ||
-d "$spanner_restore_database" \ | ||
-qAtX \ | ||
-c "with fk_constraint_columns as ( | ||
select constraint_catalog, constraint_schema, constraint_name, | ||
array_to_string(array_agg('\"' || kcu.column_name || '\"'), ',') as column_names | ||
from ( | ||
select constraint_catalog, constraint_schema, constraint_name, column_name | ||
from information_schema.key_column_usage | ||
order by ordinal_position | ||
) kcu | ||
group by constraint_catalog, constraint_schema, constraint_name | ||
), | ||
pk_constraint_columns as ( | ||
select constraint_catalog, constraint_schema, constraint_name, | ||
array_to_string(array_agg('\"' || kcu.column_name || '\"'), ',') as column_names | ||
from ( | ||
select constraint_catalog, constraint_schema, constraint_name, column_name | ||
from information_schema.key_column_usage | ||
order by position_in_unique_constraint | ||
) kcu | ||
group by constraint_catalog, constraint_schema, constraint_name | ||
) | ||
select 'alter table \"' || tc.table_schema || '\".\"' || tc.table_name || '\"' | ||
|| ' add constraint \"' || fk.constraint_name | ||
|| '\" foreign key (' || fk_cc.column_names || ') references ' | ||
|| '\"' || pk_tc.table_schema || '\".\"' || pk_tc.table_name || '\"' | ||
|| ' (' || pk_cc.column_names || ');' | ||
from information_schema.referential_constraints fk | ||
inner join information_schema.table_constraints tc | ||
on fk.constraint_catalog=tc.constraint_catalog | ||
and fk.constraint_schema=tc.constraint_schema | ||
and fk.constraint_name=tc.constraint_name | ||
inner join fk_constraint_columns fk_cc | ||
on fk.constraint_catalog=fk_cc.constraint_catalog | ||
and fk.constraint_schema=fk_cc.constraint_schema | ||
and fk.constraint_name=fk_cc.constraint_name | ||
inner join pk_constraint_columns pk_cc | ||
on fk.unique_constraint_catalog=pk_cc.constraint_catalog | ||
and fk.unique_constraint_schema=pk_cc.constraint_schema | ||
and fk.unique_constraint_name=pk_cc.constraint_name | ||
inner join information_schema.table_constraints pk_tc | ||
on fk.unique_constraint_catalog=pk_tc.constraint_catalog | ||
and fk.unique_constraint_schema=pk_tc.constraint_schema | ||
and fk.unique_constraint_name=pk_tc.constraint_name;" > create_foreign_keys.sql | ||
|
||
# Drop all foreign keys in the Spanner database. | ||
echo "Dropping all foreign keys in the Spanner database before importing data" | ||
psql -v ON_ERROR_STOP=1 \ | ||
--host "$PGADAPTER_HOST" \ | ||
--port "$PGADAPTER_PORT" \ | ||
-d "$spanner_restore_database" \ | ||
-c "start batch ddl" \ | ||
-f "drop_foreign_keys.sql" \ | ||
-c "run batch" | ||
|
||
# We need to copy the data in hierarchical order: | ||
# 1. First all tables without a parent. | ||
# 2. Then all tables at hierarchical level 1, etc. | ||
|
||
# Get all tables without a parent. | ||
table_names=$(psql -v ON_ERROR_STOP=1 \ | ||
--host "$PGADAPTER_HOST" \ | ||
--port "$PGADAPTER_PORT" \ | ||
-d "$spanner_restore_database" \ | ||
-qAtX \ | ||
-c "select table_name from information_schema.tables | ||
where table_schema = 'public' | ||
and parent_table_name is NULL | ||
and table_type='BASE TABLE'") | ||
|
||
while : ; do | ||
delim="" | ||
parent_table_str="" | ||
# Copy data for each table at this level. | ||
for table_name in $table_names ; do | ||
sql_file="$data_dir/$table_name.sql" | ||
|
||
echo "" | ||
echo "Copying data for $table_name" | ||
psql -v ON_ERROR_STOP=1 \ | ||
--host "$PGADAPTER_HOST" \ | ||
--port "$PGADAPTER_PORT" \ | ||
-d "$spanner_restore_database" \ | ||
< $sql_file | ||
|
||
parent_table_str="$parent_table_str$delim'$table_name'" | ||
delim=", " | ||
done | ||
|
||
table_names="" | ||
if [[ ! $parent_table_str = "" ]] ; then | ||
# Get the tables that have the current set of tables as a parent. | ||
table_names=$(psql -v ON_ERROR_STOP=1 \ | ||
--host "$PGADAPTER_HOST" \ | ||
--port "$PGADAPTER_PORT" \ | ||
-d "$spanner_restore_database" \ | ||
-qAtX \ | ||
-c "select table_name from information_schema.tables | ||
where table_schema = 'public' | ||
and parent_table_name in ($parent_table_str) | ||
and table_type='BASE TABLE'") | ||
fi | ||
|
||
# No further child tables left to copy. | ||
if [[ $table_names = "" ]] ; then | ||
break | ||
fi | ||
done | ||
|
||
# Re-create all foreign keys in the Spanner database. | ||
echo "Re-creating all foreign keys in the Spanner database" | ||
psql -v ON_ERROR_STOP=1 -a \ | ||
--host "$PGADAPTER_HOST" \ | ||
--port "$PGADAPTER_PORT" \ | ||
-d "$spanner_restore_database" \ | ||
-c "start batch ddl" \ | ||
-f "create_foreign_keys.sql" \ | ||
-c "run batch" | ||
|
||
echo "Finished restoring database $spanner_restore_database" |
Oops, something went wrong.