Skip to content

outerbounds/terraform-aws-metaflow

Repository files navigation

Metaflow Terraform module

Terraform module that provisions AWS resources to run Metaflow in production.

This module consists of submodules that can be used separately as well:

modules diagram

You can either use this high-level module, or submodules individually. See each submodule's corresponding README.md for more details.

Here's a minimal end-to-end example of using this module with VPC:

# Random suffix for this deployment
resource "random_string" "suffix" {
  length  = 8
  special = false
  upper = false
}

locals {
  resource_prefix = "metaflow"
  resource_suffix = random_string.suffix.result
}

data "aws_availability_zones" "available" {
}

# VPC infra using https://github.com/terraform-aws-modules/terraform-aws-vpc
module "vpc" {
  source = "terraform-aws-modules/vpc/aws"
  version = "3.13.0"

  name = "${local.resource_prefix}-${local.resource_suffix}"
  cidr = "10.10.0.0/16"

  azs             = data.aws_availability_zones.available.names
  private_subnets = ["10.10.8.0/21", "10.10.16.0/21", "10.10.24.0/21"]
  public_subnets  = ["10.10.128.0/21", "10.10.136.0/21", "10.10.144.0/21"]

  enable_nat_gateway   = true
  single_nat_gateway   = true
  enable_dns_hostnames = true
}


module "metaflow" {
  source = "outerbounds/metaflow/aws"
  version = "0.3.0"

  resource_prefix = local.resource_prefix
  resource_suffix = local.resource_suffix

  enable_step_functions = false
  subnet1_id            = module.vpc.public_subnets[0]
  subnet2_id            = module.vpc.public_subnets[1]
  vpc_cidr_blocks       = module.vpc.vpc_cidr_blocks
  vpc_id                = module.vpc.vpc_id
  with_public_ip        = true

  tags = {
      "managedBy" = "terraform"
  }
}

# export all outputs from metaflow modules
output "metaflow" {
  value = module.metaflow
}

# The module will generate a Metaflow config in JSON format, write it to a file
resource "local_file" "metaflow_config" {
  content  = module.metaflow.metaflow_profile_json
  filename = "./metaflow_profile.json"
}

Note: You can find a more complete example that uses this module but also includes setting up sagemaker notebooks and other non-Metaflow-specific parts of infra in this repo.

Modules

Name Source Version
metaflow-common ./modules/common n/a
metaflow-computation ./modules/computation n/a
metaflow-datastore ./modules/datastore n/a
metaflow-metadata-service ./modules/metadata-service n/a
metaflow-step-functions ./modules/step-functions n/a
metaflow-ui ./modules/ui n/a

Inputs

Name Description Type Default Required
access_list_cidr_blocks List of CIDRs we want to grant access to our Metaflow Metadata Service. Usually this is our VPN's CIDR blocks. list(string) [] no
batch_type AWS Batch Compute Type ('ec2', 'fargate') string "ec2" no
compute_environment_desired_vcpus Desired Starting VCPUs for Batch Compute Environment [0-16] for EC2 Batch Compute Environment (ignored for Fargate) number 8 no
compute_environment_egress_cidr_blocks CIDR blocks to which egress is allowed from the Batch Compute environment's security group list(string)
[
"0.0.0.0/0"
]
no
compute_environment_instance_types The instance types for the compute environment list(string)
[
"c4.large",
"c4.xlarge",
"c4.2xlarge",
"c4.4xlarge",
"c4.8xlarge"
]
no
compute_environment_max_vcpus Maximum VCPUs for Batch Compute Environment [16-96] number 64 no
compute_environment_min_vcpus Minimum VCPUs for Batch Compute Environment [0-16] for EC2 Batch Compute Environment (ignored for Fargate) number 8 no
db_engine_version n/a string "11" no
db_instance_type RDS instance type to launch for PostgresQL database. string "db.t2.small" no
db_migrate_lambda_zip_file Output path for the zip file containing the DB migrate lambda string null no
enable_custom_batch_container_registry Provisions infrastructure for custom Amazon ECR container registry if enabled bool false no
enable_key_rotation Enable key rotation for KMS keys bool false no
enable_step_functions Provisions infrastructure for step functions if enabled bool n/a yes
extra_ui_backend_env_vars Additional environment variables for UI backend container map(string) {} no
extra_ui_static_env_vars Additional environment variables for UI static app map(string) {} no
force_destroy_s3_bucket Empty S3 bucket before destroying via terraform destroy bool false no
iam_partition IAM Partition (Select aws-us-gov for AWS GovCloud, otherwise leave as is) string "aws" no
launch_template_http_endpoint Whether the metadata service is available. Can be 'enabled' or 'disabled' string "enabled" no
launch_template_http_put_response_hop_limit The desired HTTP PUT response hop limit for instance metadata requests. Can be an integer from 1 to 64 number 2 no
launch_template_http_tokens Whether or not the metadata service requires session tokens, also referred to as Instance Metadata Service Version 2 (IMDSv2). Can be 'optional' or 'required' string "optional" no
metadata_service_container_image Container image for metadata service string "" no
metadata_service_enable_api_basic_auth Enable basic auth for API Gateway? (requires key export) bool true no
metadata_service_enable_api_gateway Enable API Gateway for public metadata service endpoint bool true no
resource_prefix string prefix for all resources string "metaflow" no
resource_suffix string suffix for all resources string "" no
subnet1_id First subnet used for availability zone redundancy string n/a yes
subnet2_id Second subnet used for availability zone redundancy string n/a yes
tags aws tags map(string) n/a yes
ui_alb_internal Defines whether the ALB for the UI is internal bool false no
ui_allow_list List of CIDRs we want to grant access to our Metaflow UI Service. Usually this is our VPN's CIDR blocks. list(string) [] no
ui_certificate_arn SSL certificate for UI. If set to empty string, UI is disabled. string "" no
ui_static_container_image Container image for the UI frontend app string "" no
vpc_cidr_blocks The VPC CIDR blocks that we'll access list on our Metadata Service API to allow all internal communications list(string) n/a yes
vpc_id The id of the single VPC we stood up for all Metaflow resources to exist in. string n/a yes
with_public_ip Enable public IP assignment for the Metadata Service. If the subnets specified for subnet1_id and subnet2_id are public subnets, you will NEED to set this to true to allow pulling container images from public registries. Otherwise this should be set to false. bool n/a yes

Outputs

Name Description
METAFLOW_BATCH_JOB_QUEUE AWS Batch Job Queue ARN for Metaflow
METAFLOW_DATASTORE_SYSROOT_S3 Amazon S3 URL for Metaflow DataStore
METAFLOW_DATATOOLS_S3ROOT Amazon S3 URL for Metaflow DataTools
METAFLOW_ECS_S3_ACCESS_IAM_ROLE Role for AWS Batch to Access Amazon S3
METAFLOW_EVENTS_SFN_ACCESS_IAM_ROLE IAM role for Amazon EventBridge to access AWS Step Functions.
METAFLOW_SERVICE_INTERNAL_URL URL for Metadata Service (Accessible in VPC)
METAFLOW_SERVICE_URL URL for Metadata Service (Accessible in VPC)
METAFLOW_SFN_DYNAMO_DB_TABLE AWS DynamoDB table name for tracking AWS Step Functions execution metadata.
METAFLOW_SFN_IAM_ROLE IAM role for AWS Step Functions to access AWS resources (AWS Batch, AWS DynamoDB).
api_gateway_rest_api_id_key_id API Gateway Key ID for Metadata Service. Fetch Key from AWS Console [METAFLOW_SERVICE_AUTH_KEY]
batch_compute_environment_security_group_id The ID of the security group attached to the Batch Compute environment.
datastore_s3_bucket_kms_key_arn The ARN of the KMS key used to encrypt the Metaflow datastore S3 bucket
metadata_svc_ecs_task_role_arn n/a
metaflow_api_gateway_rest_api_id The ID of the API Gateway REST API we'll use to accept MetaData service requests to forward to the Fargate API instance
metaflow_batch_container_image The ECR repo containing the metaflow batch image
metaflow_profile_json Metaflow profile JSON object that can be used to communicate with this Metaflow Stack. Store this in ~/.metaflow/config_[stack-name] and select with $ export METAFLOW_PROFILE=[stack-name].
metaflow_s3_bucket_arn The ARN of the bucket we'll be using as blob storage
metaflow_s3_bucket_name The name of the bucket we'll be using as blob storage
migration_function_arn ARN of DB Migration Function
ui_alb_arn UI ALB ARN
ui_alb_dns_name UI ALB DNS name