Exporting DynamoDB to S3 for reading in Snowflake

This repository accompanies a blog post on Utilising DynamoDB Export to S3, to read NoSQL data from Snowflake.

We help accelerate organisations in their quest to harness value from their data. Get in touch with Mechanical Rock to find out more.

Quick Start

Create the .env file with your parameters, based on the env.sample file format
Create the Snowflake resources using the scripts in the sql/ folder
Deploy the infrastructure stack by running the deploy.sh script
Load the sample data by running the datafileload/load-data.sh script

Read on for a deeper understanding.

Architecture

The following diagram summarises what the solution achieves:

A CDK app to manage the deployment of AWS Infrastructure as code
Data files and a docker-compose configuration to load data into DynamoDB
SQL scripts to normalize the DynamoDB data on S3 into relational tables in Snowflake

Repository Layout

The following table describes some of the key assets in the repository and their purpose

folder	purpose
`bin/` `lib/` `test/`	CDK project files for deploying the AWS Infrastructure
`datafileload/`	files and scripts for loading sample CSV data to DynamoDB
`sql/`	SQL scripts to create storage integrations, stages, and sql views in Snowflake
`src/lambda/`	code for the lambda that performs DynamoDB Export to S3

Deploying the infrastructure

All of these actions will need current AWS credentials either in profile or in shell environment variables.

In addition, this deployment relies on environment variables being set - the simplest way to manage this is to create a .env file in the root directory, based on the sample provided in the env.sample file. The variables are described below.

variable	source / usage
`AWS_DEFAULT_REGION`	The region you wish to deploy to, e.g. `ap-southeast-2`
`AWS_ACCOUNT_ID`	The account number you will deploy in to. You can fetch this with the AWS CLI by calling `aws sts get-caller-identity`
`DYNAMODB_TABLE_NAME`	The name you want to give to your DynamoDB table
`SNOWFLAKE_ACCOUNT_ID`	The Snowflake account ID integration
`SNOWFLAKE_EXTERNAL_ID`	The Snowflake external ID from the storage integration

In order to get the Snowflake variables above, you will first need to create a new storage integration in Snowflake - sample scripts are available in the sql/storage-integration/ folder. When creating the storage integration you will need both the bucket name, and the IAM role that this stack will create. These follow the following format:

Bucket:

dynamodb-export-${AWS_ACCOUNT_ID}-${AWS_DEFAULT_REGION}

IAM role:

arn:aws:iam::${AWS_ACCOUNT_ID}:role/dynamo-snowflake-s3-access-role

Convenience scripts

There are 3 scripts that can help you manage this solution

deploy.sh - installs dependencies, runs tests, builds software and deploys to AWS
undeploy.sh - removes the infrastructure and cleans up some of the dependencies and generated assets in the project folder
datafileload/load-data.sh - uploads the data files in datafileload/data to your DynamoDB table. Note that this script relies on the .env file being configured with your table name and region.

Lambda Export Schedule

The Lambda function to start the DynamoDB Export to S3 runs at 1am UTC each day; it is configured by a cron schedule in the file lib/ddb2snow-export-dynamodb-stack.ts. However, if you want to test it immediately, perform the following steps:

Navigate to the AWS Lambda Console for your account and region specified above
Select the Function with a name beginning ddb2snowflake-DumpStack-dumpfunction...
Create a test event with the following data and save it as a named event

{
    "time": "2020-11-30T00:00:00.000Z"
}

Execute the Lambda Function with your named event

After a few minutes, you should be able to see your exported data files on S3.

Cleaning up

To destroy the AWS infrastructure and clear out the local assets, run the undeploy.sh script

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bin		bin
datafileload		datafileload
images		images
lib		lib
sql		sql
src/lambda		src/lambda
test		test
.gitignore		.gitignore
.npmignore		.npmignore
README.md		README.md
cdk.json		cdk.json
deploy.sh		deploy.sh
env.sample		env.sample
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
undeploy.sh		undeploy.sh
webpack.config.ts		webpack.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exporting DynamoDB to S3 for reading in Snowflake

Quick Start

Architecture

Repository Layout

Deploying the infrastructure

Convenience scripts

Lambda Export Schedule

Cleaning up

About

Releases

Packages

Languages

MechanicalRock/dynamodb-to-snowflake

Folders and files

Latest commit

History

Repository files navigation

Exporting DynamoDB to S3 for reading in Snowflake

Quick Start

Architecture

Repository Layout

Deploying the infrastructure

Convenience scripts

Lambda Export Schedule

Cleaning up

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages