- Table of Contents
- SSH Authentication
- AWS Terms
- AWS Account Setup
- Setup NGINX Proxy
- Prepare app for deployment
- Setting up Terraform
- Configure GitLab CI/CD Flow
- Network Configuration
- Configure Database
- Update Bastion Configuration
- Setting up ECS
- Using Bastion
- Create Load Balancer
- Handling Media Uploads with S3
- Configure DNS and HTTPS
SSH keys are used to create secure connections between local machines and remote servers. Private keys are stored on the computer and are never shared, public keys are shared with the external servers. The public key matches with the private key to establish the connection.
In terminal:
ssh-keygen -t rsa -b 4096 -C "[name of key]"
-t
: type of rsa
-b
: size of 4096 bytes
-C
: "comment", or name of key (ex: isaac@mbp)
This creates a new ssh key in directory ~/.ssh
. Retrieve contents of public key with:
cat ~/.ssh/[file].pub
- IAM: sub user of account, should only contain permissions necessary to user's role.
- Group: Group of users, has default permissions
- AWS-Vault: local software provided by aws to securely authenticate and connect to your account. It should connect via IAM user. Keeps a connection open for a certain duration. Credentials are stored via Mac's keychain.
- ARN: long string representing a key, used to authenticate different services between each other.
- Budget: setting in aws that allows user to define different spending thresholds. Threshold will not stop spending, but will notify user when reached.
- ECR: Elastic Container Registry, service in aws to store docker images
- S3: Simple Storage Service, a file storage system meant for software
- DynamoDB: no SQL storage solution for simply storing files or state
- EC2 / Bastion: EC2 is a virtual machine used to create a bastion server, a bastion server is used to connect to a private network for admin purposes
- RDS: Relational Database Service, service used to create databases in AWS
- ECS: Elastic container sevice, used to run docker container for the app
- Cloudwatch: Used to monitor logs for application
It is important to always use an IAM user when managing settings, and to always have MFA setup on both root and IAM accounts.
-
Go to Billing, scroll down and activate IAM access to Billing Info.
-
To create policy to enable MFA by default:
- Go to IAM > Policies > Create Policy
- Insert following JSON:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowViewAccountInfo", "Effect": "Allow", "Action": [ "iam:GetAccountPasswordPolicy", "iam:GetAccountSummary", "iam:ListVirtualMFADevices", "iam:ListUsers" ], "Resource": "*" }, { "Sid": "AllowManageOwnPasswords", "Effect": "Allow", "Action": ["iam:ChangePassword", "iam:GetUser"], "Resource": "arn:aws:iam::*:user/${aws:username}" }, { "Sid": "AllowManageOwnAccessKeys", "Effect": "Allow", "Action": [ "iam:CreateAccessKey", "iam:DeleteAccessKey", "iam:ListAccessKeys", "iam:UpdateAccessKey" ], "Resource": "arn:aws:iam::*:user/${aws:username}" }, { "Sid": "AllowManageOwnSigningCertificates", "Effect": "Allow", "Action": [ "iam:DeleteSigningCertificate", "iam:ListSigningCertificates", "iam:UpdateSigningCertificate", "iam:UploadSigningCertificate" ], "Resource": "arn:aws:iam::*:user/${aws:username}" }, { "Sid": "AllowManageOwnSSHPublicKeys", "Effect": "Allow", "Action": [ "iam:DeleteSSHPublicKey", "iam:GetSSHPublicKey", "iam:ListSSHPublicKeys", "iam:UpdateSSHPublicKey", "iam:UploadSSHPublicKey" ], "Resource": "arn:aws:iam::*:user/${aws:username}" }, { "Sid": "AllowManageOwnGitCredentials", "Effect": "Allow", "Action": [ "iam:CreateServiceSpecificCredential", "iam:DeleteServiceSpecificCredential", "iam:ListServiceSpecificCredentials", "iam:ResetServiceSpecificCredential", "iam:UpdateServiceSpecificCredential" ], "Resource": "arn:aws:iam::*:user/${aws:username}" }, { "Sid": "AllowManageOwnVirtualMFADevice", "Effect": "Allow", "Action": ["iam:CreateVirtualMFADevice", "iam:DeleteVirtualMFADevice"], "Resource": "arn:aws:iam::*:mfa/${aws:username}" }, { "Sid": "AllowManageOwnUserMFA", "Effect": "Allow", "Action": [ "iam:DeactivateMFADevice", "iam:EnableMFADevice", "iam:ListMFADevices", "iam:ResyncMFADevice" ], "Resource": "arn:aws:iam::*:user/${aws:username}" }, { "Sid": "DenyAllExceptListedIfNoMFA", "Effect": "Deny", "NotAction": [ "iam:CreateVirtualMFADevice", "iam:EnableMFADevice", "iam:GetUser", "iam:ListMFADevices", "iam:ListVirtualMFADevices", "iam:ResyncMFADevice", "sts:GetSessionToken", "iam:ListUsers" ], "Resource": "*", "Condition": { "BoolIfExists": { "aws:MultiFactorAuthPresent": "false" } } } ] }
- Give it name, description, create.
-
Create group by going to Groups > Create Group.
- For admin user, select
AdministratorAccess
policy and newly created MFA policy.
- For admin user, select
-
To create IAM user, go to Users > Add User. Add name, select console access. Add to Group.
-
Save account ID. Log in as new IAM user (or whoever new account is for), and enter account id, username, password. Set up MFA. Log out, then log in again for full permissions.
- In IAM account, go to IAM > Users > Security Credentials, create new Access Key. (Leave success dialog/window open to copy private key)
- With aws-vault installed (using brew), enter following command
aws-vault add [username of IAM user]
Enter Access key and private key when prompted. This will create a new vault on mac, enter new password for that as well (if prompted). 3. Set up MFA for console by editing the config file. Access file by following command:
vim ~/.aws/config
Then add the following to file:
region=us-east-1
mfa_serial=[ARN for MFA device]
Find the ARN for the device in IAM > Users > Security Credentials, make sure to select the correct key - it should be in a section labeled Assigned MFA Device (or something similar). 4. Enable secure session with the following command:
aws-vault exec [username] --duration=12h
Duration can be configured for any amount of hours up to 12.
Creating an budget will tell AWS to notify the user via email when a budget threshold is reached. It is possible to go past this threshold.
- Go to Billing > Budgets, create custom cost budget.
- Name it, set it to monthly, make it fixed, and set the budget amount.
- Create alerts to enable users when account has used up to certain amount of budget.
Proxy servers exist to help django serve static files. It acts as a web server to make it more effecient to serve these files.
- create new GitLab project
- disable public pipelines, add protected branch
*-release
to ensure all branches flagged as-release
are only able to be created/modified by certain users (maintainers). Add*-release
to list of protected tags. - clone project via ssh to local. Passphrase is needed if it's set.
- create new ECR repo, this is where the dynamic containers will be created. Enable
scan on push
. - Create a custom aws policy with the json below:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["ecr:*"],
"Resource": "arn:aws:ecr:us-east-1:*:repository/recipe-app-api-proxy"
},
{
"Effect": "Allow",
"Action": ["ecr:GetAuthorizationToken"],
"Resource": "*"
}
]
}
- create a new IAM user with the newly created policy, this will be the "user" that is used for dynamic ECR tasks. Create access key.
- create the following protected/masked variables in gitlab:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- ECR_REPO (uri of the repo)
- create new branch called
feature/nginx-proxy
in local project - create
default.conf.tpl
file, define proxy server config by setting the port, and location directories - create (copy) the uwsgi_params file, include it in base location of proxy server config. Additionally set the
client_max_body_size
variable on the base location to define max file upload size (10M) - create
entrypoint.sh
command file to pass proxy config file to remote, turn off docker's daemon mode - forcing it (and nginx) to run in the foreground - create base image with
Dockerfile
file. Build image. - understand CI/CD pipeline with GitLab
- explains branch flow: https://docs.gitlab.com/ee/topics/gitlab_flow.html
- predefined variables: https://docs.gitlab.com/ee/ci/variables/predefined_variables.html
.gitlab-ci.yml
explainer: https://docs.gitlab.com/ee/ci/yaml/#rules
- create
.gitlab-ci.yml
file to define services, stages, and jobs. specifically the Dev and Push stages - define the jobs for the Build stage, including scripts and artifacts (saved images)
- define jobs for Push Dev (Push) stage, including scripts and rules
- define jobs for Push Release (Push) stage, including scripts and rules
- push to origin by setting upstream. Ensure this CI/CD job passes.
- create new merge request. Merge to main. Ensure this CI/CD job passes. This should have created an ECR image (named dev).
- create new minor version branch with release flag (ex: 1.0-release)
- create new patch version branch with release flag (ex: 1.0.0-release). Ensure this CI/CD job passes. This should have created a new ECR image with version name and "latest" (ex: 1.0.0, latest)
Terms:
- uWSGI: Web Server Gateway Interface, allows python to run a server in production.
Steps:
- create project on GitLab, or get project that is already created/forked, etc. Clone to local.
- change general settings to only allow members to configure pipelines. Uncheck public pipelines.
- on local, checkout to new branch (ex: feature/prod-setup)
- add uwsgi to requirements, create scripts directory
- in
scripts/
, createentrypoint.sh
to collect that static files, migrate db, and create uwsgi socket to connect to proxy, configure workers to fit needs - in
Dockerfile
, set scripts directory, enable copy to remote, change mode to executable, create entrypoint command - add
vol/
anddeploy/
to dockerignore - build docker image
- preappend
/static/
to both STATIC_URL and MEDIA_URL insettings.py
- configure env variables in django. set SECRET_KEY, DEBUG, and extend ALLOWED_HOSTS in
settings.py
. Make sure DEBUG is set indocker-compose.yml
- test config by running:
docker-compose up
- create new
docker-compose-proxy.yml
file, copy contents of original compose file, set new volume, remove commands, create proxy service to connect to proxy image, create volumes - build proxy app by navigating to proxy project and running:
docker build -t proxy .
- test the main app by running:
docker-compose -f docker-compose-proxy.yml up
- ensure service is accessible by the specified port (ex: 8000), and ensure no service is running errors
- push to GitLab, merge to main
Terraform needs the following AWS services to function:
- S3 bucket to store it's state, and act as a single source of truth
- Dynamo DB to handle TF Lock
- ECR Repo to push docker images
It is platform agnostic, make sure to include versions when possible
- create an S3 bucket, block all public access, enable versioning
- tf will store all of the infra data in this bucket
- create DynamoDB table, create id as LockID
- this forces tf to only run once at the same time, tf uses it to create a lock
- create new ECR repo with same name as git file (per convention)
- make sure on master branch locally, updated with origin. add tf files to .gitignore
- create
deploy/
directory. createmain.tf
. (terraform will merge all files named .tf, so order and naming doesn't matter.) - inside
main.tf
createterraform {}
block to define the S3 backend, optionally implementing the dynamodb if want to only have 1 tf running at one time. then create theprovider {}
block to set version of the aws api to use. - create new
docker-compose.yml
file insidedeploy/
to define docker image for terraform. for environment, make sure to have aws credentials dynamically pull from local machine (aws vault) - after opening session in aws vault, initialize terraform with (optionally with make command):
docker-compose -f deploy/docker-compose.yml run --rm terraform init
- create
bastion.tf
file. createami
data block to define image. In EC2, click "Launch Instance" and select Amazon Linux. Copy the AMI ID. Get name of ec2 image from aws by getting ami-id, going to images/ami, inserting id in search field, clicking on result to show AMI Name. past name into "filter[values[]]" list. replace numbers between2.0
and-x86
with a star to make a wildcard forcing it to get latest version. create resource block to define instance to be created (aws instance types). - optionally format tf files:
docker-compose -f deploy/docker-compose.yml run --rm terraform fmt
- optionally validate tf files:
docker-compose -f deploy/docker-compose.yml run --rm terraform validate
- apply terraform to aws structure by following two commands:
# shows changes that will be made
docker-compose -f deploy/docker-compose.yml run --rm terraform plan
# apply changes
docker-compose -f deploy/docker-compose.yml run --rm terraform apply
- verify instance is created in EC2 dashboard. once finished optionally destroy ec2 instance:
docker-compose -f deploy/docker-compose.yml run --rm terraform destroy
- create new tf workspace.
- workspaces are ways to manage different env within the same aws account. (ex: dev, stage, prod)
# list available workspaces
docker-compose -f deploy/docker-compose.yml run --rm terraform workspace list
# create new workspace named dev
docker-compose -f deploy/docker-compose.yml run --rm terraform workspace new dev
- to create standard variables throughout tf files, create
variables.tf
. create "prefix" variable to prefix all workspaces to identify project. - create locals block in
main.tf
to store dynamic variables. create prefix local var to identify different workspaces. - create
common_tags
item in locals block. add tags to resource block using themerge()
function. view tags in aws by selecting ec2 instance, and selecting the "Tags" tab at the bottom of the window. Example tags are "prefix", "project", "contact". - finish by running commit, tf plan, tf apply, git push, and creating new merge request.
References:
- https://docs.gitlab.com/ee/topics/gitlab_flow.html
- https://docs.gitlab.com/ee/ci/variables/predefined_variables.html
- https://docs.gitlab.com/ee/ci/yaml/
Difference between environment branches and release branches:
- environment branches include dev, stage, prod, and are better suited for applications that need "Rolling deployment of changes to the production env" like websites, services, etc.
- release branches are better suited for software so people can access different versions of the software.
- make sure main is up-to-date, checkout to new feature branch.
- create
.gitlab-ci.yml
file and define the stages that are going to be required, including:- Test and Lint ("run unit tests")
- Build and Push ("build in docker, push to ECR")
- Staging Plan
- Staging Apply ("push to EC2")
- Production Plan
- Production Apply ("push to EC2")
- Destroy
- Create jobs with same name for each of them. Each will need to specify the
stage
,script
,rules
, test first with filler echo script. - Push to origin. Create new merge request. This will have started a pipeline with
Test and Lint
andValidate Terraform
jobs. - Submit merge. This will have triggered the next pipeline with the remaining jobs.
- Create/checkout to new production branch (either in GUI or CLI). This will have started the production pipeline with the jobs
Test and Lint
,Build and Push
,Staging Plan
,Staging Apply
,Production Plan
,Production Apply
,Destroy
. - In GitLab, make sure production branch is protected by going to Settings > Repository > Protected Branch. Only allow Maintainers to perform actions on branch.
- In local, switch to new feature branch.
- Test/Lint: define the docker & docker-in-docker (dind) image and service resectively. Add script to install docker-compose. Add script to run testing and linting (wait_for_db, test, flake8).
- Validate TF: since most jobs will need tf image, define it in global scope. Configure entrypoint to be able to run scripts. Since each job builds a new filesystem, make a script to change dir to
deploy/
, then run tf scripts for init, validation, and formatting. Entrypoint code:
entrypoint: # overrides entrypoint to work with gitlab ci-cd
- "/usr/bin/env"
- "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
- Commit/push code to origin. Make merge request to main to test first two jobs. Accept merge request to finish pipeline. Make merge request to production to test first two jobs in production. Accept merge request to finish pipeline.
- In local, change branch to main, pull updates. Create new feature branch.
- Create new IAM user in AWS for GitLab pipeline. Use the following custom policy below. Make sure to change S3 bucket name.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "TerraformRequiredPermissions",
"Effect": "Allow",
"Action": ["ecr:GetAuthorizationToken", "ecr:BatchCheckLayerAvailability", "ec2:*"],
"Resource": "*"
},
{
"Sid": "AllowListS3StateBucket",
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::recipe-app-api-devops-tfstate"
},
{
"Sid": "AllowS3StateBucketAccess",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::recipe-app-api-devops-tfstate/*"
},
{
"Sid": "LimitEC2Size",
"Effect": "Deny",
"Action": "ec2:RunInstances",
"Resource": "arn:aws:ec2:*:*:instance/*",
"Condition": {
"ForAnyValue:StringNotLike": {
"ec2:InstanceType": ["t2.micro"]
}
}
},
{
"Sid": "AllowECRAccess",
"Effect": "Allow",
"Action": ["ecr:*"],
"Resource": "arn:aws:ecr:us-east-1:*:repository/recipe-app-api-devops"
},
{
"Sid": "AllowStateLockingAccess",
"Effect": "Allow",
"Action": ["dynamodb:PutItem", "dynamodb:DeleteItem", "dynamodb:GetItem"],
"Resource": ["arn:aws:dynamodb:*:*:table/recipe-app-api-devops-tf-state-lock"]
}
]
}
- In GitLab, in Settings > CI/CD > Variables, define
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
with values from new IAM user. Then, get URI from devops ECR Repo and create variableECR_REPO
. - Build/Push: in this job, define same docker image & service as test/lint. create scripts to install
awscli
, build docker image, authenticate into aws, push docker image, tag image again but with :latest tag, push that docker image to represent latest image. - Staging/Production Plan: for planning jobs, define script that changes to
deploy/
dir, export tf env variable for getting repo uri, run tf init, workspace select/new for staging/production, and plan. Staging and Production are same except for workspace name. Optionally create shell script, but not shown in course. - Staging/Production Apply: for apply jobs, make script to go to
deploy/
, export same variable as plan jobs, then run tf init, workspace select staging/production, apply -auto-approve. These jobs apply to AWS, so optionally make them manual or automatic depending on desired workflow. Only difference between staging and production is workspace name. - Staging/Production Destroy: for destroy jobs, make script for changing to
deploy/
directory, tf init, workspace select staging/production, destroy -auto-approve. Auto approve since they will be manual jobs. - Commit changes, push origin. Create merge request. Test full CI/CD pipeline:
- Make changes on local in a separate branch, push to origin (creating new branch if needed)
- Make a new merge request to main. This will trigger
Testing/Linting
andTF Validation
. - Manager will accept merge request, triggering the jobs:
Test/Linting
,Build/Push
,Staging Plan
,Staging Apply
, and a manualDestroy
. Once these pass, main will be updated. ECR will contain a new image. EC2 contain bastion staging instance. Once Staging site is no longer needed, can be destroyed. - Manager will merge main with production branch (if in CLI, push to origin). This will run production pipeline:
Test/Lint
,Build/Push
,Staging Plan
,Staging Apply
,Production Plan
,Production Apply
, and manual (blocked)Destroy
. Ensure ECR image was created for production build. Ensure 2 EC2 instances were created - production and staging. Optionally destroy.
- VPC: Virtual Private Cloud, isolates production, staging, and development environments from each other. Restricts access to all network resources, if one is compromised the rest are safe. Everything inside an environment shares the same VPC.
- Subnet: Subnetwork, contained inside vpc, used to run resources and determine access to internet
- Public Subnets: used to give resources access to the internet publicly.
- Private Subnet: runs resources that are used internally, and don't need public access. makes it more secure.
- Gateway: part of the subnet that allows directional public access.
- NAT Gateway: Network Address Translation Gateway, allows private subnets to have outbound access to the internet, but blocks the internet from having inbound access to them.
- Availability Zones: spread resources across multiple data centers, creates application resiliency.
- CIDR Block: Indicates what IP addresses will be available in the network. View this cheatsheet for determining short code.
- Availability Zones: Way of dividing regions up in to separate zones so that if one of the zones goes down, the other zone can take over and handle all of the traffic. Multiple AZs are required for load balancers.
- EIP: Elastic IP, way of creating ip address in aws vpc
- checkout to master, pull origin, checkout to new feature branch.
- create new
network.tf
file insidedeploy/
. - create main VPC resource, including cidr_block, and enable dns and hostnames. Add tags.
- create main gateway, connecting it to the main vpc.
- in
main.tf
, create a data block foraws_region
, this will allow access to info on the current region later. - create public subnet group 'a'.
- SUBNET: create subnet resource with cidr of type /24. Allow it to have public ip. connect to main VPC. Set availability zone. Create tags. ip used was
10.1.1.0
- ROUTE TABLE: create route table to connect to private subnet. connect to vpc.
- ROUTE TABLE ASSOCIATION: create association resource and connect to the public route table. This connects the route table and subnet.
- ROUTE: this makes subnet accessible to public. create the route resource and set the route table id to the public route table, set destination cidr to
0.0.0.0/0
- signifying public access. set gateway id to main gateway id. - EIP: create eip resource, connect to vpc.
- NAT GATEWAY: create the resource, set allocation id to the eip id. set subnet id to the public a subnet id.
- SUBNET: create subnet resource with cidr of type /24. Allow it to have public ip. connect to main VPC. Set availability zone. Create tags. ip used was
- do the same thing for subnet group 'b'. set the cidr_block to be a different ip for the subnet (make incremenetal for convention); ip used was
10.1.2.0/24
- create private subnet group 'a'.
- SUBNET: create resource with same /24 type cidr block. ip used was
10.1.10.0
. connect to vpc. set availability zone. create tags. - ROUTE TABLE: create resource, connect to vpc.
- ROUTE TABLE ASSOCIATION: create resource, connect to subnet, connect to route table.
- ROUTE: create resource, connect to route table, connect to public nat gateway, set destination cidr to
0.0.0.0/0
.
- SUBNET: create resource with same /24 type cidr block. ip used was
- do the same for private subnet 'b'. ip used for cidr block was
10.1.11.0/24
. - commit. push to gitlab, create merge request triggering first pipeline. accept merge request to trigger staging pipeline. this should have set up all of the resources. check aws to verify.
- add permissions to Devops CI IAM user in AWS to perform RDS tasks. Add the following code to the "TerraformRequiredPermissions" statement:
"rds:DeleteDBSubnetGroup",
"rds:CreateDBInstance",
"rds:CreateDBSubnetGroup",
"rds:DeleteDBInstance",
"rds:DescribeDBSubnetGroups",
"rds:DescribeDBInstances",
"rds:ListTagsForResource",
"rds:ModifyDBInstance",
"iam:CreateServiceLinkedRole",
"rds:AddTagsToResource"
- with main branch up to date, checkout to new feature branch.
- in
variables.tf
, create newdb_username
anddb_password
and add descriptions to both. these will be used to securely pass in username and password values to tf. - create new
database.tf
file, and create new subnet group resource. in the new resource, set subnet ids to both private subnets (a and b) along with the name and tags of the resource. This will add multiple subnets to the database. - create new securiy group resource. connect it to the main vpc. create ingress block to define inbound access rules.
- create db instance. set attributes including identifier (name of the instance), name (name of the db), allocated storage in GB, storage type (used 'gp2'), engine and engine version, instance class, subnet group to main subnet group name, username / password, backup retention period in days, if there should be multiple availability zones, if it should skip the final snapshot, and the vpc security group ids. set the tags.
- read this to see all rds options
- create
outputs.tf
, set output object with value ofaws_db_instance.main.address
- create
sample.tfvars
as an example file to store the db username and password variables, this will be committed. copy that file and create newterraform.tfvars
, this will be the main file and will not be committed. It is equivalent to.env
file. test it by runningterraform plan
command. - add TF_VAR_db_username and TF_VAR_db_password variables to GitLab CI/CD variables.
- commit changes on local. push origin. create merge request. accept request. pipline should have succeeded (if not, check verion of db or instance class for aws). check aws to make sure all instances are running.
Steps:
- update policy for CI IAM user in AWS to have the following additions to actions:
"iam:CreateRole",
"iam:GetInstanceProfile",
"iam:DeletePolicy",
"iam:DetachRolePolicy",
"iam:GetRole",
"iam:AddRoleToInstanceProfile",
"iam:ListInstanceProfilesForRole",
"iam:ListAttachedRolePolicies",
"iam:DeleteRole",
"iam:TagRole",
"iam:PassRole",
"iam:GetPolicyVersion",
"iam:GetPolicy",
"iam:CreatePolicyVersion",
"iam:DeletePolicyVersion",
"iam:CreateInstanceProfile",
"iam:DeleteInstanceProfile",
"iam:ListPolicyVersions",
"iam:AttachRolePolicy",
"iam:CreatePolicy",
"iam:RemoveRoleFromInstanceProfile",
"iam:ListRolePolicies"
- Set up EC2 key pairs with local SSH key by going to EC2 > Network and Security > Key Pairs and selecting "Import Key Pair". Add public key contents.
- on local devops project, checkout to master, pull code from remote. Create new feature branch. In
deploy/
, add new directorytemplates/bastion/
and createuser-data.sh
. The templates directory is used to store "templates", or scripts, passed on to AWS. - In the new
user-data.sh
bash file, write a script that installs docker and add ec2-user to user group in order for the user to manage docker. - in
bastion.tf
, reference the new file asuser_data
in the aws_instance resource. - Create instance profile for bastion. An instance profile is assigned to bastion in order to give it IAM role info. Create the profile by creating a new file inside the
templates/bastion
dir calledinstance-profile-policy.json
and pasting the following code:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Effect": "Allow"
}
]
}
- in
bastion.tf
, create newaws_iam_role
resource and set teh assume_role_policy attribute to new json file. This allows the bastion server to assume the new aws role. create newaws_iam_role_policy_attachment
resource and set the role and policy_arn in order to attach aws policy to the new role. - create new
aws_iam_instance_profile
resource and give it the name and role - using role resource defined before. setiam_instance_profile
attribute in aws_instance resource to the new instance profile resource. - create new variable named
bastion_key_name
invariables.tf
(needs to match up to name in aws ec2 key pair). add keyname attribute to _aws_instance resource inbastion.tf
. then set subnet_id attribbute to one of the public subnets. - create security group to only allow inbound access via port 22 (SSH) to bastion and outbound access via 443, 80, and 5432. do this by creating new resource
aws_security_group
inbastion.tf
and connect it to the vpc. create ingress and egress blocks for setting the inbound and outbound rules. in the aws_instance resource, connect the aws security group by setting attribute vpc_security_group_ids to new resource. - in
database.tf
, in the aws_security_group resource, inside the ingress block, set thesecurity_groups
attribute to the newly created security group created above. - in
outputs.tf
, create new output calledbastion_host
and set the value to bastion dns in order to see the host after it has been created by TF. - commit the changes. push to GitLab. create and accept merge request. after the pipeline succeeds, check EC2 in aws to make sure bastion instance is running. Check bastion by connecting to it on local terminal by running:
ssh ec2-user@[host name]
ECS (Elastic container service) is used to run and manage live docker containers. it can be used to create clusters of services for the project.
- Task execution role: a role that is used for starting a service (starting the service and giving it permissions)
- Log group: groups all the logs for particular task into one place.
- Container definition template: JSON file which contains details for teh container so AWS knows how to run it in production.
- ECS Service: actual service that runs the docker container
- add the following json to the CI IAM user policy actions:
"logs:CreateLogGroup",
"logs:DeleteLogGroup",
"logs:DescribeLogGroups",
"logs:ListTagsLogGroup",
"logs:TagLogGroup",
"ecs:DeleteCluster",
"ecs:CreateService",
"ecs:UpdateService",
"ecs:DeregisterTaskDefinition",
"ecs:DescribeClusters",
"ecs:RegisterTaskDefinition",
"ecs:DeleteService",
"ecs:DescribeTaskDefinition",
"ecs:DescribeServices",
"ecs:CreateCluster"
- In local, checkout to main, pull origin, checkout to new feature branch.
- Create new file
ecs.tf
indeploy/
directory. Create the cluster by creating anaws_ecs_cluster
resource and giving it a name. - create new file in
templates/
calledecs/task-exec-role.json
and past the following json below. This will allow the ecs task to retrieve teh image from ecr, put logs in the log stream, and create a new log stream. This creates the task execution role.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
}
]
}
- create new file in
templates/ecs/
calledassume-role-policy.json
and paste teh json below. This allows the ecs task to assume the defined role.
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Effect": "Allow"
}
]
}
- In
ecs.tf
, define theaws_iam_policy
for the task execution role policy and give it a name, path, description, and policy file path. This will create a new policy in aws for the task execution role given the json. - Create new
aws_iam_role
resource that points to the assume role policy (by giving it a name andassume_role_policy
) - create
aws_iam_role_policy_attachment
resource to set the policy to the role, giving it arole
andpolicy_arn
. This is for the task execution role. - create new
aws_iam_role
resource giving it a name, assume_role_policy and tags. This is for the app iam role. - create log group by creating
aws_cloudwatch_log_group
resource and giving it name and tags. - Create container definition template by making
container-definitions.json.tpl
file insidetemplates/ecs/
. Past the template from terraform. Add additional attributes from AWS docs. Optionally, paste complete code used. - Create new variable called
ecr_image_api
to hold the url for the ecr image api. In AWS/ECR dashboard, copy the URI for the devops ecr image. Past this into the default attribute with a tag:latest
. - Do the same for the proxy ECR image, creating an
ecr_image_proxy
variable. - Create variable for the django secret key calling it
django_secret_key
. Save a default value to sample.tfvars and terraform.tfvars. - in
ecs.tf
, create new data blocktemplate_file
for the container template definitions. point this to the container definitions json file with atempmlate
attribute. create vars block withapp_image
,proxy_image
,django_secret_key
,db_host
,db_name
,db_user
,db_pass
,log_group_name
,log_group_region
,allowed_hosts
. First three from variables file, db vars from aws_db_instance, log group name from aws_cloud_watch_log_group, region from data.aws_region. allowed_hosts is set to '*' temporarily. - create
aws_ecs_task_definition
resource. Review docs for required attributes. - Rerun
terraform init
to download new template. - in GitLab, create new variable
TF_VAR_django_secret_key
. - in
ecs.tf
, createaws_security_group
resource for the ecs_service. connect it to the main vpc. Set it to allow outbound requests from https by setting egress to 443 and the database by setting egress to 5432. Allow all internet access to proxy by setting ingress to 8000 and cidr_block to 0.0.0.0/0 - create
aws_ecs_service
service. connect it to the cluster via cluster attribute, set task definitions via task_definitions attribute. set desired_count, and launch_type to "FARGATE". add network configuration block withsubnets
,security_groups
, andassign_public_ip
. - Add
aws_security_group.ecs_service.id
to database rds security group to allow access to database. - Push to gitlab, make merge, all tasks should succeed.
- In aws, go to ecs to view cluster and logs.
In order to use the django cli, a superadmin must be created. The goal is to connect to bastion via ssh and execute commands through it to create a superuser and any other cli tasks.
- get bastion host from GitLab output
- connect to the host via the following shell command:
ssh ec2-user@[bastion_host]
- authenticate with docker with the following command:
$(aws ecr get-login --no-include-email --region us-east-1)
- run the following command to create a new superuser (input email and pass when prompted):
docker run -it \
-e DB_HOST=<DB_HOST> \
-e DB_NAME=recipe \
-e DB_USER=recipeapp \
-e DB_PASS=<DB_PASS> \
<ECR_REPO>:latest \
sh -c "python manage.py wait_for_db && python manage.py createsuperuser"
- test this by going to ECS instance in AWS, get public ip address, go to /admin, and try logging in.
- In aws, add the following permission to the CI policy:
"elasticloadbalancing:*"
- In local, checkout to master, pull origin, create new feature branch.
- Create new
load_balancer.tf
file insidedeploy/
directory - Create new
aws_lb
resource with a type ofapplication
. This specifies that the lb will handle requests at the http level vs at the network level (tcp, udp). Connect it to the public subnets, set the security groups, and tags. - Create new
aws_lb_target_group
resource to define group of servers the lb can forward requests to. Define protocol, vpc, target type of ip (will be assigning targets via ip address), port (proxy port), and path to health check page. - Create listener resource with
aws_lb_listener
. Defineload_balancer_arn
, port (80), protocol, and default action. The default action should forward requests to target group. - Create security group resource for the lb. Connect it to the main vpc. Define ingress and egress groups. Inbound access should be all from internet. Outbound should only be available to 8000.
- Allow task to register to load balancer. Do this by changing
allowed_hosts
in container definitions data block (ecs.tf
) from '*' toaws_lb.api.dns_name
. - Go to the ecs service security group and change the ingress block to only allow inbound access from the load balancer (load balancer security group)
- In the api ecs servcie (
aws_ecs_service
namedapi
), change the subnets to be private and removeassign_public_ip=True
. Add newload_balancer
block in that service and define the target group, container name and container port. This tells ecs service to register new tasks with the target group. - Add new output block in
outputs.tf
to show the dns name of the load balancer (to access the api endpoint). - In django
settings.py
, set a block to check if running in aws then add the hostname to allowed hosts. - Commit changes, push to remote. Create merge, ensure all jobs pass. Test lb by accessing dns output in logs.
Warning: I spent a few days debugging various bugs related to this. Ensure EVERYTHING is spelled correctly, make sure buckets are unique, and remember ACL 'read-only' is depricated so the methods used in s3.tf
are the updated version of the course materials to reflect that stupid change.
ECS has temporary storage only, so when restarts everything is deleted. So if a user uploads an image, it needs to persist. We use S3 for this.
- Add the following permission to the CI policy:
"s3:*"
- In local, checkout to main, pull, checkout to new feature branch.
- Create new file
s3.tf
. - Create new resource
aws_s3_bucket
. The following code ended up working:
resource "aws_s3_bucket" "app_public_files" {
bucket_prefix = "${local.prefix}-files"
force_destroy = true # allows tf to destroy
}
resource "aws_s3_bucket_ownership_controls" "app_public_files" {
bucket = aws_s3_bucket.app_public_files.id
rule {
object_ownership = "BucketOwnerPreferred"
}
}
resource "aws_s3_bucket_public_access_block" "app_public_files" {
bucket = aws_s3_bucket.app_public_files.id
block_public_acls = false
block_public_policy = false
ignore_public_acls = false
restrict_public_buckets = false
}
- In
ecs/container_definitions.json.tpl
, addS3_STORAGE_BUCKET_NAME
andS3_STORAGE_BUCKET_REGION
to the environment variables. - In
ecs.tf
, add the following lines to the container definitions vars block:
s3_storage_bucket_name = aws_s3_bucket.app_public_files.bucket
s3_storage_bucket_region = data.aws_region.current.name
- Create new file
deploy/templates/ecs/s3-write-policy.json.tpl
and paste the following policy definition to give ecs access to the bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObjectAcl",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject",
"s3:PutObjectAcl"
],
"Resource": ["${bucket_arn}/*", "${bucket_arn}"]
}
]
}
- In
ecs.tf
, create a newtemplate_file
data block and connect it to the new template created. Set the template and vars attributes, vars being set to a block that includes the bucket arn. - Create new
aws_iam_policy
resource for this policy, connect it to the template file data block with the policy attribute. - Create new
aws_iam_role_policy_attachment
resource for ecs s3 access. Connect it to the role and policy arn. - To connect django to s3 bucket, add the boto3 and django-storages dependencies to
requirements.txt
. In the course, boto3 v1.12.0 and django-storages v1.9.1 were used. Boto is used to interact with aws s3 api, and is needed for django-storages. - Add the following settings to
settings.py
:
S3_STORAGE_BACKEND = bool(int(os.environ.get('S3_STORAGE_BACKEND', 1))) # toggle s3 off/on
if S3_STORAGE_BACKEND is True:
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage' # set default file storage to aws
AWS_DEFAULT_ACL = 'public-read'
AWS_STORAGE_BUCKET_NAME = os.environ.get('S3_STORAGE_BUCKET_NAME')
AWS_S3_REGION_NAME = os.environ.get('S3_STORAGE_BUCKET_REGION', 'us-east-1')
AWS_QUERYSTRING_AUTH = False # query string authentication false
- In
docker-compose.yml
, addS3_STORAGE_BACKEND
to app environment variables. - Push changes up to gitlab, ensure merge pipelines work. Test that image uploads correctly using the api endpoint and mod headers to store tokens.
In the course, a custom domain name was registered in Route53. This domain was then hooked up to terraform. Steps start after domain is registered.
- Add the following permissions to the CI policy in aws:
"acm:DeleteCertificate",
"acm:DescribeCertificate",
"acm:ListTagsForCertificate",
"acm:RequestCertificate",
"acm:AddTagsToCertificate",
"route53:*"
- On local, checkout to main, pull origin, create new feature branch.
- In
variables.tf
, add newdns_zone_name
variable. Default to registered domain name. - Create new
subdomain
variable, set the type and default values. The default values will include order pairs for production, staging, and dev. - Create new
dns.tf
file. Addaws_route53_zone
data block to get the zone from route53 based on domain name. - Create new
aws_route53_record
resource to create record for load balancer. Set the zone_id, name, type, ttl, and records. - In order to use https, create new
aws_acm_certificate
resource. Set the domain_name, validation_method ("DNS"), tags. Inside, set lifecycle attribute to block withcreate_before_destroy = True
to keep tf running smooth when destroying. - Create new
aws_route53_record
resource to set the validation cname on the domain in order to validate. set the name, type, zone_id, records, and ttl. The name, type, and records will come from domain_validation. - Create new
aws_acm_certificate_validation
resource to trigger domain ssl validation. Set the certificate_arn and validation_record_fqdns attributes. - In
load_balancer.tf
, create newaws_lb_listener
resource for the https listener settings. Set the port to 443, protocol to "HTTPS", certificate_arn to validation certificate arn created indns.tf
. Set default action to forward and target_group_arn to the lb target group. - Change the http lb listener resource to "redirect" instead of "forward". Remove the target group arn. Add new redirect block inside
default_action
block and set port to 443, protocol to "HTTPS", and status_code to "HTTP_301". - In the load balancer security group, create a new ingress block in addition to the http ingress block, and set the from/to_port to 443 to accept https requests. Set the cidr blocks the wildcard value.
- In
ecs.tf
, in the template file resource for api_container_definitions, change the allowed_hosts variable to get the domain name from route53 (aws_route53_record.app.fqdn
) - In the aws_ecs_service resource, add depends_on attribute and reference it to the a
ws_lb_listener.api_https
resource to make sure the https resource runs first. - In
outputs.tf
file, change the value ofapi_endpoint
to reference the custom domain name in route53.