Skip to content

Commit

Permalink
feature: add scripts to backup/restore a database (#2173)
Browse files Browse the repository at this point in the history
* Add scripts to save/restore data from the emulator.

Saving will copy the schema and data into a `backup` directory. Restore
will load the schema and data from the `backup` directory respecting
the table hierarchy.

* test: add an integration test for save/restore

* fix: according to review feedback

* fix: break up large COPY operations

* fix: use position in unique constraint

---------

Co-authored-by: Knut Olav Løite <[email protected]>
  • Loading branch information
skuruppu and olavloite authored Aug 16, 2024
1 parent 384455b commit 2da2cc9
Show file tree
Hide file tree
Showing 5 changed files with 535 additions and 4 deletions.
3 changes: 3 additions & 0 deletions local-backup-restore/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
backup
create_foreign_keys.sql
drop_foreign_keys.sql
63 changes: 63 additions & 0 deletions local-backup-restore/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Backup and restore a Spanner database

This folder contains scripts to backup a Spanner database and restore it using
local files.

**NOTE:** While these scripts may work for production databases, it is
currently intended for use with the Spanner emulator to achieve persistence, so
*it has not been tested for production use*.

## Try it out

1. For usage with the emulator, please follow the
[instructions](https://github.com/GoogleCloudPlatform/pgadapter/blob/-/docs/emulator.md#google-cloud-spanner-pgadapter---emulator)
to setup PGAdapter and Spanner Emulator.

2. Create a Spanner database, then create a schema and store data.

3. Ensure the following environment variables are set:

```
# For production usage, the full path of the Google Cloud credentials file that
should be used for PGAdapter.
GOOGLE_APPLICATION_CREDENTIALS=/local/path/to/credentials.json
# Project, instance and database for Spanner.
GOOGLE_CLOUD_PROJECT=my-project
SPANNER_INSTANCE=my-instance
SPANNER_DATABASE=my-database
# Host and port the PGAdapter is deployed on.
PGADAPTER_HOST=localhost
PGADAPTER_PORT=5432
```

3. Save the schema and data of the Spanner database locally.

```shell
./backup-database.sh

```

By default, the schema is saved in `./backup/schema.sql` and the data within
individual files per table in the `./backup/data/` folder.

4. If using the emulator, restart it then re-create the instance and database.

5. Restore the schema and data back to Spanner from the locally stored sql files.

```shell
./restore-database.sh
```

The restore database can be different from the source database.

```shell
spanner_restore_database=new-database ./restore-database.sh
```

## Limitations

- The scripts currently only support backing up and restoring data in the
'public' schema. They will not backup and restore the schema or data under
named schemas.
80 changes: 80 additions & 0 deletions local-backup-restore/backup-database.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
#!/bin/bash

# This script exports the schema and data from a Spanner database to a local
# file. It also exports the DDL statements that are needed to re-create the
# Spanner database. The combination of the pg_dump files and the schema files
# can be used to re-create the Spanner database.

set -e

if [ -z "$schema_file" ]; then schema_file="./backup/schema.sql"; fi;
if [ -z "$data_dir" ]; then data_dir="./backup/data"; fi;

echo
echo "--- BACKING UP SPANNER DATABASE ---"
echo "Schema file will be written to $schema_file."
echo "Data will be written to $data_dir. Any existing data in that directory will be overwritten."

echo
read -p "Continue? [Yn]" -n 1 -r
echo
if [[ $REPLY =~ [Nn]$ ]]
then
[[ "$0" = "$BASH_SOURCE" ]] && exit 1 || return 1
fi

# Create the output folder if it does not already exist.
mkdir -p "$data_dir"

# Export the current schema of the Spanner database.
echo "Exporting schema of the Spanner database $SPANNER_DATABASE into $schema_file"
psql -v ON_ERROR_STOP=1 \
--host "$PGADAPTER_HOST" \
--port "$PGADAPTER_PORT" \
--dbname "$SPANNER_DATABASE" \
-qAtX \
-c "show database ddl" > "$schema_file"

echo "COPYING DATA FROM SPANNER TO LOCAL FILE"

# Then copy all data from Spanner to the local file.
# We will do that at a fixed timestamp to ensure that we copy a consistent set of data.
READ_TIMESTAMP=$(psql -h "$PGADAPTER_HOST" \
-p "$PGADAPTER_PORT" \
-d "$SPANNER_DATABASE" \
-c "set time zone utc; select now()" -qAtX)
READ_TIMESTAMP="${READ_TIMESTAMP/ /T}"
READ_TIMESTAMP="${READ_TIMESTAMP/+00/Z}"
echo "Reading data from Spanner using timestamp $READ_TIMESTAMP"
echo "set spanner.read_only_staleness='read_timestamp $READ_TIMESTAMP'"
echo ""
echo " --- COPYING DATA WITH MAX_PARALLELISM $max_parallelism. --- "
echo "Set the 'max_parallelism' variable to increase the number of COPY operations that will run in parallel."
echo ""

table_names=()

psql -v ON_ERROR_STOP=1 -h "$PGADAPTER_HOST" -p "$PGADAPTER_PORT" -d "$SPANNER_DATABASE" -qAtX \
-c "select tablename from pg_catalog.pg_tables where schemaname='public';" \
| while read table_name ; do
column_names=$(psql -h "$PGADAPTER_HOST" -p "$PGADAPTER_PORT" -d "$SPANNER_DATABASE" -qAtX \
-c "select array_to_string(array_agg(column_name), ',') as c from information_schema.columns where table_name='$table_name' and table_schema='public' and not is_generated='ALWAYS';")

sql_file="$data_dir/$table_name.sql"

echo ""
echo "Copying data for $table_name"
echo "SET SPANNER.AUTOCOMMIT_DML_MODE='PARTITIONED_NON_ATOMIC';" > $sql_file
echo "COPY $table_name ($column_names) FROM stdin;" >> $sql_file
# Set the read_only_staleness to use in PGOPTIONS. This will automatically be included by psql
# when connecting to PGAdapter.
PGOPTIONS="-c spanner.read_only_staleness='read_timestamp $READ_TIMESTAMP'" \
psql -v ON_ERROR_STOP=1 \
-h "$PGADAPTER_HOST" \
-p "$PGADAPTER_PORT" \
-d "$SPANNER_DATABASE" \
-c "copy $table_name ($column_names) to stdout" \
>> $sql_file
done

echo "Finished exporting database $SPANNER_DATABASE"
187 changes: 187 additions & 0 deletions local-backup-restore/restore-database.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
#!/bin/bash

# This script imports the schema and data from a local file to a Spanner
# database.

set -e

if [ -z "$spanner_restore_database" ]; then spanner_restore_database="$SPANNER_DATABASE"; fi;
if [ -z "$schema_file" ]; then schema_file="./backup/schema.sql"; fi;
if [ -z "$data_dir" ]; then data_dir="./backup/data"; fi;

echo
echo "--- RESTORING SPANNER DATABASE ---"
echo "Schema file will be loaded from $schema_file."
echo "Data will be loaded from $data_dir."

echo
read -p "Continue? [Yn]" -n 1 -r
echo
if [[ $REPLY =~ [Nn]$ ]]
then
[[ "$0" = "$BASH_SOURCE" ]] && exit 1 || return 1
fi

# Check whether the Spanner database is empty. That is; no user-tables.
table_count=$(psql -v ON_ERROR_STOP=1 -h "$PGADAPTER_HOST" -p "$PGADAPTER_PORT" \
-d "$spanner_restore_database" \
-qAtX \
-c "select count(1) from information_schema.tables where table_schema not in ('information_schema', 'spanner_sys', 'pg_catalog')")
if [[ ! $table_count = "0" ]]
then
echo "Number of tables found in the restore database: $table_count"
echo "The destination Spanner database is not empty."
echo "This script only supports restoring into an empty database."
[[ "$0" = "$BASH_SOURCE" ]] && exit 1 || return 1
fi

# Create the schema on the Spanner database. We run the script between
# START BATCH DDL and RUN BATCH to run the entire script as one batch.
echo "Creating schema on Spanner database $spanner_restore_database"
psql -v ON_ERROR_STOP=1 -a \
--host "$PGADAPTER_HOST" \
--port "$PGADAPTER_PORT" \
-d "$spanner_restore_database" \
-c "start batch ddl" \
-f "$schema_file" \
-c "run batch"

# Generate a script to drop all foreign keys on Spanner.
psql -v ON_ERROR_STOP=1 \
-h "$PGADAPTER_HOST" \
-p "$PGADAPTER_PORT" \
-d "$spanner_restore_database" \
-qAtX \
-c "select 'alter table \"' || tc.table_schema || '\".\"' || tc.table_name || '\"'
|| ' drop constraint \"' || fk.constraint_name || '\";'
from information_schema.referential_constraints fk
inner join information_schema.table_constraints tc
on fk.constraint_catalog=tc.constraint_catalog
and fk.constraint_schema=tc.constraint_schema
and fk.constraint_name=tc.constraint_name;" > drop_foreign_keys.sql

# Generate a script that can be used to re-create all foreign key constraints
# on the database.
psql -v ON_ERROR_STOP=1 \
-h "$PGADAPTER_HOST" \
-p "$PGADAPTER_PORT" \
-d "$spanner_restore_database" \
-qAtX \
-c "with fk_constraint_columns as (
select constraint_catalog, constraint_schema, constraint_name,
array_to_string(array_agg('\"' || kcu.column_name || '\"'), ',') as column_names
from (
select constraint_catalog, constraint_schema, constraint_name, column_name
from information_schema.key_column_usage
order by ordinal_position
) kcu
group by constraint_catalog, constraint_schema, constraint_name
),
pk_constraint_columns as (
select constraint_catalog, constraint_schema, constraint_name,
array_to_string(array_agg('\"' || kcu.column_name || '\"'), ',') as column_names
from (
select constraint_catalog, constraint_schema, constraint_name, column_name
from information_schema.key_column_usage
order by position_in_unique_constraint
) kcu
group by constraint_catalog, constraint_schema, constraint_name
)
select 'alter table \"' || tc.table_schema || '\".\"' || tc.table_name || '\"'
|| ' add constraint \"' || fk.constraint_name
|| '\" foreign key (' || fk_cc.column_names || ') references '
|| '\"' || pk_tc.table_schema || '\".\"' || pk_tc.table_name || '\"'
|| ' (' || pk_cc.column_names || ');'
from information_schema.referential_constraints fk
inner join information_schema.table_constraints tc
on fk.constraint_catalog=tc.constraint_catalog
and fk.constraint_schema=tc.constraint_schema
and fk.constraint_name=tc.constraint_name
inner join fk_constraint_columns fk_cc
on fk.constraint_catalog=fk_cc.constraint_catalog
and fk.constraint_schema=fk_cc.constraint_schema
and fk.constraint_name=fk_cc.constraint_name
inner join pk_constraint_columns pk_cc
on fk.unique_constraint_catalog=pk_cc.constraint_catalog
and fk.unique_constraint_schema=pk_cc.constraint_schema
and fk.unique_constraint_name=pk_cc.constraint_name
inner join information_schema.table_constraints pk_tc
on fk.unique_constraint_catalog=pk_tc.constraint_catalog
and fk.unique_constraint_schema=pk_tc.constraint_schema
and fk.unique_constraint_name=pk_tc.constraint_name;" > create_foreign_keys.sql

# Drop all foreign keys in the Spanner database.
echo "Dropping all foreign keys in the Spanner database before importing data"
psql -v ON_ERROR_STOP=1 \
--host "$PGADAPTER_HOST" \
--port "$PGADAPTER_PORT" \
-d "$spanner_restore_database" \
-c "start batch ddl" \
-f "drop_foreign_keys.sql" \
-c "run batch"

# We need to copy the data in hierarchical order:
# 1. First all tables without a parent.
# 2. Then all tables at hierarchical level 1, etc.

# Get all tables without a parent.
table_names=$(psql -v ON_ERROR_STOP=1 \
--host "$PGADAPTER_HOST" \
--port "$PGADAPTER_PORT" \
-d "$spanner_restore_database" \
-qAtX \
-c "select table_name from information_schema.tables
where table_schema = 'public'
and parent_table_name is NULL
and table_type='BASE TABLE'")

while : ; do
delim=""
parent_table_str=""
# Copy data for each table at this level.
for table_name in $table_names ; do
sql_file="$data_dir/$table_name.sql"

echo ""
echo "Copying data for $table_name"
psql -v ON_ERROR_STOP=1 \
--host "$PGADAPTER_HOST" \
--port "$PGADAPTER_PORT" \
-d "$spanner_restore_database" \
< $sql_file

parent_table_str="$parent_table_str$delim'$table_name'"
delim=", "
done

table_names=""
if [[ ! $parent_table_str = "" ]] ; then
# Get the tables that have the current set of tables as a parent.
table_names=$(psql -v ON_ERROR_STOP=1 \
--host "$PGADAPTER_HOST" \
--port "$PGADAPTER_PORT" \
-d "$spanner_restore_database" \
-qAtX \
-c "select table_name from information_schema.tables
where table_schema = 'public'
and parent_table_name in ($parent_table_str)
and table_type='BASE TABLE'")
fi

# No further child tables left to copy.
if [[ $table_names = "" ]] ; then
break
fi
done

# Re-create all foreign keys in the Spanner database.
echo "Re-creating all foreign keys in the Spanner database"
psql -v ON_ERROR_STOP=1 -a \
--host "$PGADAPTER_HOST" \
--port "$PGADAPTER_PORT" \
-d "$spanner_restore_database" \
-c "start batch ddl" \
-f "create_foreign_keys.sql" \
-c "run batch"

echo "Finished restoring database $spanner_restore_database"
Loading

0 comments on commit 2da2cc9

Please sign in to comment.