- PostGIS
- PostgreSQL (Dockerized DB Instance)
- PostgreSQL (Simple ETL)
- MySQL
- SQLite
- SQL-Server (MSSQL)
- Oracle
# create directories to hold persistantDB data
mkdir -p $(pwd)/volumes/db/postgis_db_data
# create directories to hold downloaded data
mkdir -p $(pwd)/volumes/shapefiles
mkdir -p $(pwd)/volumes/zip
mkdir -p $(pwd)/volumes/sql
# create docker network
docker network create mynetwork
# run postgres docker container (with postgis enabled)
docker run -d \
-v $(pwd)/volumes/db/postgis_db_data:/var/lib/postgresql/data \
-e POSTGRES_PASSWORD=secret \
--network mynetwork \
--name gis_db \
-p 5432:5432 postgis/postgis
The source for the GIS data to be loaded into PostGIS (for this guide at least) comes in the form of shapefiles
.
You should have little trouble finding plenty of free shapefile data sets you can use as source data.
For this guide, you can pull down data for multiple states from this list of
OpenStreetMap data for the U.S.. There are multiple formats available, but the process described below uses the compressed .shp
files.
# download a target data set
curl http://download.geofabrik.de/north-america/us/florida-latest-free.shp.zip \
--output $(pwd)/volumes/zip/florida-latest-free.shp.zip
# create a directory for the uncompressed loot
mkdir $(pwd)/volumes/shapefiles/florida
# uncompress downloaded files
unzip $(pwd)/volumes/zip/florida-latest-free.shp.zip \
-d $(pwd)/volumes/shapefiles/florida
# remove compressed originals
rm $(pwd)/volumes/zip/florida-latest-free.shp.zip
Once you have the .shp
files available locally, you use them to derive .sql
files suitable for importing (via CLI) into the Dockerized PostGIS instance.
I believe there are multiple tools for completing this process -- but for this guide, we will use the shp2pgsql dataloader. While this may eventually be put into a container, for now just install & run from your local machine.
My system uses DNF as a package manager. Your system may use yum, apt, brew or something else entirely. If you need help, do a search with the string below (replacing the bracketed text with your OS version).
Install shp2pgsql on
[insert OS version here]
# install package containing `shp2pgsql` <- for .rpm-based distributions
sudo apt install postgis-utils
# https://computingforgeeks.com/how-to-install-postgis-on-ubuntu-debian/
Once you have shp2pgsql
available locally -- you can use it to transform shapefiles (.shp
) to SQL.
NOTE 1: I'm only processing a single shapefile (gis_osm_places_free_1
), but you can use these steps to process all eighteen (18) present in the archive.
NOTE 2: The value passed with the -s
flag is the SRID. Always make an attempt to determine the SRID
used in the source data, else the tool will fall back to a default that may not produce the projections you expect.
# transform shapefiles via shp2pgsql
shp2pgsql -s 4326 \
$(pwd)/volumes/shapefiles/florida/gis_osm_places_free_1 > \
$(pwd)/volumes/sql/florida.gis_osm_places_free_1.sql
Once you've got .sql
files, you can now execute them via CLI.
# set postgres password as ENV Variable
# and
# execute SQL against a remote postgres instance
PGPASSWORD=secret psql \
-h localhost -d postgres\
-U postgres \
-f $(pwd)/volumes/sql/florida.gis_osm_places_free_1.sql
TL;DR: You need to know the SRID to query correctly.
Many "Spatial Type" (ST
) query functions that you will want to use to search through your new data will reference an SRID which is a numerical ID used to encode metadata about the geo data you loaded.
If you use the incorrect SRID in your queries, you'll get an error.
As such -- you can check the SRIDs for your imported records with this query and use that to ensure the SRID you're setting in the query terminal matches that data inside the db.
SELECT f_table_name, f_geometry_column, srid FROM geometry_columns;
host: localhost
port: 5432
database: postgres
user: postgres
password: secret
A couple years ago, I published a project that loads a PostgreSQL Instance. If you're just looking to get a sample PostgreSQL instance running, That's probably a decent project to use.
# create directories to hold persistantDB data
mkdir -p $(pwd)/volumes/db/postgres_cms_data
# run postgres docker container (with data pre-loaded)
docker run -d \
-v $(pwd)/volumes/db/postgres_cms_data:/var/lib/postgresql/data \
-e POSTGRES_DB=govdata \
-e POSTGRES_USER=dbuser \
-e PGPASSWORD=dbpassword \
-e POSTGRES_PASSWORD=dbpassword \
--network mynetwork \
--name cms_db \
-p 15433:5432 sudowing/cms-utilization-db
NOTE 2: It will take several minutes (10+ mins) for the DB to be available after Docker container gets launched. There is data in the container and upon execution -- the startup calls an init
process that starts the load. You can keep an eye on the Docker logs to get an idea where the process is.
docker logs cms_db
host: localhost
port: 15433
database: govdata
user: dbuser
password: dbpassword
The demo data used below was published by the team at postgresqltutorial
I've just added the various CLI steps needed to pull down this data and load it into a dockerized PostgreSQL instance.
The order of these steps differs from some of the other DB dialects because we want to mount the directory holding our SQL (volumes/sql
) to the container running the DB. We are doing this so we can execute the import using the psql
client already present in the container.
# create directory to hold persistant data & downloaded data
mkdir -p $(pwd)/volumes/db/psql_db_data
mkdir -p $(pwd)/volumes/zip
mkdir -p $(pwd)/volumes/sql
# download a target data set
curl https://sp.postgresqltutorial.com/wp-content/uploads/2019/05/dvdrental.zip \
--output $(pwd)/volumes/zip/dvdrental.zip
# uncompress downloaded files
unzip $(pwd)/volumes/zip/dvdrental.zip \
-d $(pwd)/volumes/sql/dvdrental
# remove compressed originals
rm $(pwd)/volumes/zip/dvdrental.zip
Here you will see we mount the directory holding our SQL (volumes/sql
) to the container.
# create docker network
docker network create mynetwork
# run postgres docker container
docker run \
-d \
-v $(pwd)/volumes/db/psql_db_data:/var/lib/postgresql/data \
-v $(pwd)/volumes/sql/dvdrental:/data \
-e POSTGRES_PASSWORD=secret \
-e POSTGRES_USER=dvdrental \
--network mynetwork \
--name dvd_rental_db \
-p 15432:5432 postgres
Run a psql
process inside the container that references the mounted demo data.
# import demo data into db instance
docker exec \
-i dvd_rental_db sh \
-c 'exec psql -U dvdrental -d dvdrental -f /data/dvdrental.tar'
host: localhost
port: 15432
database: dvdrental
user: dvdrental
password: secret
# create directories to hold persistantDB data
mkdir -p $(pwd)/volumes/db/mysql_db_data
# create directories to hold downloaded data
mkdir -p $(pwd)/volumes/zip
# create docker network
docker network create mynetwork
# run mysql docker container
docker run -d \
-v $(pwd)/volumes/db/mysql_db_data:/var/lib/mysql \
-e MYSQL_DATABASE=demo \
-e MYSQL_USER=user \
-e MYSQL_PASSWORD=password \
-e MYSQL_ROOT_PASSWORD=secret \
--network mynetwork \
--name dev_mysql \
-p 3306:3306 mysql:latest
# download demo db
curl https://downloads.mysql.com/docs/world_x-db.zip \
--output $(pwd)/volumes/zip/world_x-db.zip
# uncompress downloaded files
unzip $(pwd)/volumes/zip/world_x-db.zip \
-d $(pwd)/volumes/zip/world_x-db
# remove compressed originals
rm $(pwd)/volumes/zip/world_x-db.zip
# import demo data into db instance
docker exec -i dev_mysql sh -c 'exec mysql -uroot -p"$MYSQL_ROOT_PASSWORD"' < $(pwd)/volumes/zip/world_x-db/world_x-db/world_x.sql
# download demo db
curl https://downloads.mysql.com/docs/sakila-db.zip \
--output $(pwd)/volumes/zip/sakila-db.zip
# uncompress downloaded files
unzip $(pwd)/volumes/zip/sakila-db.zip \
-d $(pwd)/volumes/zip/sakila-db
# remove compressed originals
rm $(pwd)/volumes/zip/sakila-db.zip
# import demo data into db instance (ddl)
docker exec -i dev_mysql sh -c 'exec mysql -uroot -p"$MYSQL_ROOT_PASSWORD"' < $(pwd)/volumes/zip/sakila-db/sakila-db/sakila-schema.sql
# import demo data into db instance (data)
docker exec -i dev_mysql sh -c 'exec mysql -uroot -p"$MYSQL_ROOT_PASSWORD"' < $(pwd)/volumes/zip/sakila-db/sakila-db/sakila-data.sql
host: localhost
port: 3306
database: demo
user: user
password: password
# create directories to hold persistantDB data
mkdir -p $(pwd)/volumes/db/sqlite_db_data
# create directories to hold downloaded data
mkdir -p $(pwd)/volumes/zip
# download demo db
curl https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite \
--output $(pwd)/volumes/db/sqlite_db_data/chinook.sqlite
db_location: ./volumes/db/sqlite_db_data/chinook.sqlite
Microsoft published multiple sample databases for usage in training and development. You can download a few using the links below.
# make the directory to hold the backups
mkdir -p ./volumes/mssql/backup
# download AdventureWorksDW2017
curl -L -o ./volumes/mssql/backup/AdventureWorksDW2017.bak https://github.com/Microsoft/sql-server-samples/releases/download/adventureworks/AdventureWorksDW2017.bak
# download WideWorldImportersDW
curl -L -o ./volumes/mssql/backup/WideWorldImportersDW-Full.bak 'https://github.com/Microsoft/sql-server-samples/releases/download/wide-world-importers-v1.0/WideWorldImportersDW-Full.bak'
docker volume create --driver local vlm_0001_mssql
docker volume create --driver local vlm_000_sqlserver
You will mount the two volumes created above for persisting DB data, and mount the directory holding the backups you downloaded to a location you can use to restore from the DB client.
docker run -d \
-e "ACCEPT_EULA=Y" \
-e "SA_PASSWORD=Alaska2017" \
-p 21143:1433 \
-v vlm_0001_mssql:/var/opt/mssql \
-v vlm_000_sqlserver:/var/opt/sqlserver \
-v $(pwd)/volumes/mssql/backup:/mssql_backups \
--name mssql19 \
mcr.microsoft.com/mssql/server:2019-latest
host: localhost
user: sa
password: Alaska2017
-- list all datafiles
RESTORE FILELISTONLY FROM DISK = '/mssql_backups/AdventureWorksDW2017.bak'
-- restore DB, explicitely moving datafiles
RESTORE DATABASE AdventureWorksDW2017 FROM DISK = '/mssql_backups/AdventureWorksDW2017.bak'
WITH
MOVE 'AdventureWorksDW2017' to '/var/opt/mssql/data/AdventureWorksDW2017.mdf'
,MOVE 'AdventureWorksDW2017_log' to '/var/opt/mssql/data/AdventureWorksDW2017_log.mdf'
-- list all datafiles
RESTORE FILELISTONLY FROM DISK = '/mssql_backups/WideWorldImportersDW-Full.bak'
-- restore DB, explicitely moving datafiles
RESTORE DATABASE WideWorldImportersDW FROM DISK = '/mssql_backups/WideWorldImportersDW-Full.bak'
WITH
MOVE 'WWI_Primary' to '/var/opt/mssql/data/WWI_Primary.mdf'
,MOVE 'WWI_UserData' to '/var/opt/mssql/data/WWI_UserData.mdf'
,MOVE 'WWI_Log' to '/var/opt/mssql/data/WWI_Log.mdf'
,MOVE 'WWIDW_InMemory_Data_1' to '/var/opt/mssql/data/WWIDW_InMemory_Data_1.mdf'
Running an Oracle DB locally via docker container is more complicated than the other DB dialects specifically because Oracle does not publish an image publically.
Instead, they provide guidance that can be used to build an image locally -- after the user downloads the software directly from their (login protected) software portal.
In order to build an image you must (1) download software from their platform, (2) clone a git repo they have published, (3) move the downloaded artifact into a specific directory in the cloned repo and (4) run a script held in the repo.
Each step is detailed below.
Visit Oracle Database Software Downloads and download 19.3 - Enterprise Edition (also includes Standard Edition 2)
for
Linux x86-64.
You must accept the Oracle License Agreement to download this software.
The downloaded filename should be:
LINUX.X64_193000_db_home.zip
git clone https://github.com/oracle/docker-images.git
# cd to directory holding build script
cd docker-images/OracleDatabase/SingleInstance/dockerfiles
# move downloaded artifact to proper location
cp ~/Downloads/LINUX.X64_193000_db_home.zip 19.3.0/LINUX.X64_193000_db_home.zip
# build docker image (-t flag is the docker tag)
./buildContainerImage.sh \
-v 19.3.0 \
-t oracle/database:19.3.0-ee \
-e
# wait...
# check to see new docker image
docker images | grep oracle/database
oracle/database 19.3.0-ee 8e1e74626c10 27 seconds ago 6.67GB
# create docker volume to persist db data
docker volume create oracle_db_data
# run newly created image mounting volume to persist data
docker run --rm -d \
-p 1521:1521 \
-p 5500:5500 \
-e ORACLE_PDB=alpha \
-e ORACLE_PWD=secret123 \
-e ORACLE_EDITION=enterprise \
-v oracle_db_data:/opt/oracle/oradata \
--name oracle_db \
oracle/database:19.3.0-ee
NOTE: There are several more ENV VARs
that are supported beyond the minimal set implemented above. Details can be found in the repo you cloned earlier.
host: localhost
port: 1521
database: alpha
user: system
password: secret123
Oracle provides several sample schemas for development/training purposes. The steps below walk users over how to load these schemas into an Oracle instance, but for more detail you can read their documentation.
Oracle provides many clients and tools, but we are going to only going to use the following for the rest of this guide:
- instantclient-basic
- instantclient-sqlplus
- instantclient-tools
# make directories to hold related files
mkdir -p $(pwd)/volumes/zip
mkdir -p $(pwd)/volumes/bin
# download basic instantclient-basic
curl https://download.oracle.com/otn_software/linux/instantclient/213000/instantclient-basic-linux.x64-21.3.0.0.0.zip \
--output $(pwd)/volumes/zip/instantclient-basic-linux.x64-21.3.0.0.0.zip
# uncompress downloaded files
unzip $(pwd)/volumes/zip/instantclient-basic-linux.x64-21.3.0.0.0.zip \
-d $(pwd)/volumes/bin
# download instantclient-sqlplus
curl https://download.oracle.com/otn_software/linux/instantclient/213000/instantclient-sqlplus-linux.x64-21.3.0.0.0.zip \
--output $(pwd)/volumes/zip/instantclient-sqlplus-linux.x64-21.3.0.0.0.zip
# uncompress downloaded files
unzip $(pwd)/volumes/zip/instantclient-sqlplus-linux.x64-21.3.0.0.0.zip \
-d $(pwd)/volumes/bin
# download instantclient-tools
curl https://download.oracle.com/otn_software/linux/instantclient/213000/instantclient-tools-linux.x64-21.3.0.0.0.zip \
--output $(pwd)/volumes/zip/instantclient-tools-linux.x64-21.3.0.0.0.zip
# uncompress downloaded files
unzip $(pwd)/volumes/zip/instantclient-tools-linux.x64-21.3.0.0.0.zip \
-d $(pwd)/volumes/bin
# create directory to hold persistant data & downloaded data
mkdir -p $(pwd)/volumes/sql
# download a target data set
curl https://codeload.github.com/oracle/db-sample-schemas/zip/refs/tags/v19.2 \
--output $(pwd)/volumes/zip/db-sample-schemas-19.2.zip
# uncompress downloaded files
unzip $(pwd)/volumes/zip/db-sample-schemas-19.2.zip \
-d $(pwd)/volumes/sql
# (optional) remove compressed originals
rm $(pwd)/volumes/zip/db-sample-schemas-19.2.zip
# replace path in template sql
cd $(pwd)/volumes/sql/db-sample-schemas-19.2
perl -p -i.bak -e 's#__SUB__CWD__#'$(pwd)'#g' *.sql */*.sql */*.dat
# cd into dir holding instantclient bins
cd $(pwd)/volumes/bin/instantclient_21_3
# exec mksample script to load sample schemas
@/home/sudowing/Documents/repos/guide-local-databases/volumes/sql/db-sample-schemas-19.2/mksample secret123 secret123 secret123 secret123 secret123 secret123 secret123 secret123 EXAMPLE TEMP /tmp/oracle_demo_log localhost:1521/alpha