DevOps Course Notes

Diagram of services used.

Table of Contents
SSH Authentication
- Create SSH Key Pair
AWS Terms
AWS Account Setup
Setup NGINX Proxy
- Steps:
Prepare app for deployment
Setting up Terraform
Configure GitLab CI/CD Flow
- Steps
  - Setup
  - Configure Pipeline YML Jobs
Network Configuration
- Terms
- Setup
Configure Database
- Steps:
Update Bastion Configuration
Setting up ECS
- Terms
- Steps:
Using Bastion
- Steps
Create Load Balancer
- Steps:
Handling Media Uploads with S3
- Steps
Configure DNS and HTTPS
- Steps

SSH Authentication

SSH keys are used to create secure connections between local machines and remote servers. Private keys are stored on the computer and are never shared, public keys are shared with the external servers. The public key matches with the private key to establish the connection.

Create SSH Key Pair

In terminal:

ssh-keygen -t rsa -b 4096 -C "[name of key]"

-t: type of rsa
-b: size of 4096 bytes
-C: "comment", or name of key (ex: isaac@mbp)

This creates a new ssh key in directory ~/.ssh. Retrieve contents of public key with:

cat ~/.ssh/[file].pub

AWS Terms

IAM: sub user of account, should only contain permissions necessary to user's role.
Group: Group of users, has default permissions
AWS-Vault: local software provided by aws to securely authenticate and connect to your account. It should connect via IAM user. Keeps a connection open for a certain duration. Credentials are stored via Mac's keychain.
ARN: long string representing a key, used to authenticate different services between each other.
Budget: setting in aws that allows user to define different spending thresholds. Threshold will not stop spending, but will notify user when reached.
ECR: Elastic Container Registry, service in aws to store docker images
S3: Simple Storage Service, a file storage system meant for software
DynamoDB: no SQL storage solution for simply storing files or state
EC2 / Bastion: EC2 is a virtual machine used to create a bastion server, a bastion server is used to connect to a private network for admin purposes
RDS: Relational Database Service, service used to create databases in AWS
ECS: Elastic container sevice, used to run docker container for the app
Cloudwatch: Used to monitor logs for application

AWS Account Setup

It is important to always use an IAM user when managing settings, and to always have MFA setup on both root and IAM accounts.

Setup Users / Groups

Go to Billing, scroll down and activate IAM access to Billing Info.

To create policy to enable MFA by default:

Go to IAM > Policies > Create Policy
Insert following JSON:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowViewAccountInfo",
            "Effect": "Allow",
            "Action": [
                "iam:GetAccountPasswordPolicy",
                "iam:GetAccountSummary",
                "iam:ListVirtualMFADevices",
                "iam:ListUsers"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowManageOwnPasswords",
            "Effect": "Allow",
            "Action": ["iam:ChangePassword", "iam:GetUser"],
            "Resource": "arn:aws:iam::*:user/${aws:username}"
        },
        {
            "Sid": "AllowManageOwnAccessKeys",
            "Effect": "Allow",
            "Action": [
                "iam:CreateAccessKey",
                "iam:DeleteAccessKey",
                "iam:ListAccessKeys",
                "iam:UpdateAccessKey"
            ],
            "Resource": "arn:aws:iam::*:user/${aws:username}"
        },
        {
            "Sid": "AllowManageOwnSigningCertificates",
            "Effect": "Allow",
            "Action": [
                "iam:DeleteSigningCertificate",
                "iam:ListSigningCertificates",
                "iam:UpdateSigningCertificate",
                "iam:UploadSigningCertificate"
            ],
            "Resource": "arn:aws:iam::*:user/${aws:username}"
        },
        {
            "Sid": "AllowManageOwnSSHPublicKeys",
            "Effect": "Allow",
            "Action": [
                "iam:DeleteSSHPublicKey",
                "iam:GetSSHPublicKey",
                "iam:ListSSHPublicKeys",
                "iam:UpdateSSHPublicKey",
                "iam:UploadSSHPublicKey"
            ],
            "Resource": "arn:aws:iam::*:user/${aws:username}"
        },
        {
            "Sid": "AllowManageOwnGitCredentials",
            "Effect": "Allow",
            "Action": [
                "iam:CreateServiceSpecificCredential",
                "iam:DeleteServiceSpecificCredential",
                "iam:ListServiceSpecificCredentials",
                "iam:ResetServiceSpecificCredential",
                "iam:UpdateServiceSpecificCredential"
            ],
            "Resource": "arn:aws:iam::*:user/${aws:username}"
        },
        {
            "Sid": "AllowManageOwnVirtualMFADevice",
            "Effect": "Allow",
            "Action": ["iam:CreateVirtualMFADevice", "iam:DeleteVirtualMFADevice"],
            "Resource": "arn:aws:iam::*:mfa/${aws:username}"
        },
        {
            "Sid": "AllowManageOwnUserMFA",
            "Effect": "Allow",
            "Action": [
                "iam:DeactivateMFADevice",
                "iam:EnableMFADevice",
                "iam:ListMFADevices",
                "iam:ResyncMFADevice"
            ],
            "Resource": "arn:aws:iam::*:user/${aws:username}"
        },
        {
            "Sid": "DenyAllExceptListedIfNoMFA",
            "Effect": "Deny",
            "NotAction": [
                "iam:CreateVirtualMFADevice",
                "iam:EnableMFADevice",
                "iam:GetUser",
                "iam:ListMFADevices",
                "iam:ListVirtualMFADevices",
                "iam:ResyncMFADevice",
                "sts:GetSessionToken",
                "iam:ListUsers"
            ],
            "Resource": "*",
            "Condition": {
                "BoolIfExists": {
                    "aws:MultiFactorAuthPresent": "false"
                }
            }
        }
    ]
}

Give it name, description, create.

Create group by going to Groups > Create Group.
- For admin user, select AdministratorAccess policy and newly created MFA policy.
To create IAM user, go to Users > Add User. Add name, select console access. Add to Group.
Save account ID. Log in as new IAM user (or whoever new account is for), and enter account id, username, password. Set up MFA. Log out, then log in again for full permissions.

Set up AWS-Vault

In IAM account, go to IAM > Users > Security Credentials, create new Access Key. (Leave success dialog/window open to copy private key)
With aws-vault installed (using brew), enter following command

aws-vault add [username of IAM user]

Enter Access key and private key when prompted. This will create a new vault on mac, enter new password for that as well (if prompted). 3. Set up MFA for console by editing the config file. Access file by following command:

vim ~/.aws/config

Then add the following to file:

region=us-east-1
mfa_serial=[ARN for MFA device]

Find the ARN for the device in IAM > Users > Security Credentials, make sure to select the correct key - it should be in a section labeled Assigned MFA Device (or something similar). 4. Enable secure session with the following command:

aws-vault exec [username] --duration=12h

Duration can be configured for any amount of hours up to 12.

Create AWS Budget (optional)

Creating an budget will tell AWS to notify the user via email when a budget threshold is reached. It is possible to go past this threshold.

Go to Billing > Budgets, create custom cost budget.
Name it, set it to monthly, make it fixed, and set the budget amount.
Create alerts to enable users when account has used up to certain amount of budget.

Setup NGINX Proxy

Proxy servers exist to help django serve static files. It acts as a web server to make it more effecient to serve these files.

Steps:

create new GitLab project
disable public pipelines, add protected branch *-release to ensure all branches flagged as -release are only able to be created/modified by certain users (maintainers). Add *-release to list of protected tags.
clone project via ssh to local. Passphrase is needed if it's set.
create new ECR repo, this is where the dynamic containers will be created. Enable scan on push.
Create a custom aws policy with the json below:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": ["ecr:*"],
            "Resource": "arn:aws:ecr:us-east-1:*:repository/recipe-app-api-proxy"
        },
        {
            "Effect": "Allow",
            "Action": ["ecr:GetAuthorizationToken"],
            "Resource": "*"
        }
    ]
}

create a new IAM user with the newly created policy, this will be the "user" that is used for dynamic ECR tasks. Create access key.
create the following protected/masked variables in gitlab:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- ECR_REPO (uri of the repo)
create new branch called feature/nginx-proxy in local project
create default.conf.tpl file, define proxy server config by setting the port, and location directories
create (copy) the uwsgi_params file, include it in base location of proxy server config. Additionally set the client_max_body_size variable on the base location to define max file upload size (10M)
create entrypoint.sh command file to pass proxy config file to remote, turn off docker's daemon mode - forcing it (and nginx) to run in the foreground
create base image with Dockerfile file. Build image.
understand CI/CD pipeline with GitLab
- explains branch flow: https://docs.gitlab.com/ee/topics/gitlab_flow.html
- predefined variables: https://docs.gitlab.com/ee/ci/variables/predefined_variables.html
- .gitlab-ci.yml explainer: https://docs.gitlab.com/ee/ci/yaml/#rules
create .gitlab-ci.yml file to define services, stages, and jobs. specifically the Dev and Push stages
define the jobs for the Build stage, including scripts and artifacts (saved images)
define jobs for Push Dev (Push) stage, including scripts and rules
define jobs for Push Release (Push) stage, including scripts and rules
push to origin by setting upstream. Ensure this CI/CD job passes.
create new merge request. Merge to main. Ensure this CI/CD job passes. This should have created an ECR image (named dev).
create new minor version branch with release flag (ex: 1.0-release)
create new patch version branch with release flag (ex: 1.0.0-release). Ensure this CI/CD job passes. This should have created a new ECR image with version name and "latest" (ex: 1.0.0, latest)

Prepare app for deployment

Terms:

uWSGI: Web Server Gateway Interface, allows python to run a server in production.

Steps:

create project on GitLab, or get project that is already created/forked, etc. Clone to local.
change general settings to only allow members to configure pipelines. Uncheck public pipelines.
on local, checkout to new branch (ex: feature/prod-setup)
add uwsgi to requirements, create scripts directory
in scripts/, create entrypoint.sh to collect that static files, migrate db, and create uwsgi socket to connect to proxy, configure workers to fit needs
in Dockerfile, set scripts directory, enable copy to remote, change mode to executable, create entrypoint command
add vol/ and deploy/ to dockerignore
build docker image
preappend /static/ to both STATIC_URL and MEDIA_URL in settings.py
configure env variables in django. set SECRET_KEY, DEBUG, and extend ALLOWED_HOSTS in settings.py. Make sure DEBUG is set in docker-compose.yml
test config by running:

docker-compose up

create new docker-compose-proxy.yml file, copy contents of original compose file, set new volume, remove commands, create proxy service to connect to proxy image, create volumes
build proxy app by navigating to proxy project and running:

docker build -t proxy .

test the main app by running:

docker-compose -f docker-compose-proxy.yml up

ensure service is accessible by the specified port (ex: 8000), and ensure no service is running errors
push to GitLab, merge to main

Setting up Terraform

Terraform needs the following AWS services to function:

S3 bucket to store it's state, and act as a single source of truth
Dynamo DB to handle TF Lock
ECR Repo to push docker images

It is platform agnostic, make sure to include versions when possible

create an S3 bucket, block all public access, enable versioning
- tf will store all of the infra data in this bucket
create DynamoDB table, create id as LockID
- this forces tf to only run once at the same time, tf uses it to create a lock
create new ECR repo with same name as git file (per convention)
make sure on master branch locally, updated with origin. add tf files to .gitignore
create deploy/ directory. create main.tf. (terraform will merge all files named .tf, so order and naming doesn't matter.)
inside main.tf create terraform {} block to define the S3 backend, optionally implementing the dynamodb if want to only have 1 tf running at one time. then create the provider {} block to set version of the aws api to use.
create new docker-compose.yml file inside deploy/ to define docker image for terraform. for environment, make sure to have aws credentials dynamically pull from local machine (aws vault)
after opening session in aws vault, initialize terraform with (optionally with make command):

docker-compose -f deploy/docker-compose.yml run --rm terraform init

create bastion.tf file. create ami data block to define image. In EC2, click "Launch Instance" and select Amazon Linux. Copy the AMI ID. Get name of ec2 image from aws by getting ami-id, going to images/ami, inserting id in search field, clicking on result to show AMI Name. past name into "filter[values[]]" list. replace numbers between 2.0 and -x86 with a star to make a wildcard forcing it to get latest version. create resource block to define instance to be created (aws instance types).
optionally format tf files:

docker-compose -f deploy/docker-compose.yml run --rm terraform fmt

optionally validate tf files:

docker-compose -f deploy/docker-compose.yml run --rm terraform validate

apply terraform to aws structure by following two commands:

# shows changes that will be made
docker-compose -f deploy/docker-compose.yml run --rm terraform plan
# apply changes
docker-compose -f deploy/docker-compose.yml run --rm terraform apply

verify instance is created in EC2 dashboard. once finished optionally destroy ec2 instance:

docker-compose -f deploy/docker-compose.yml run --rm terraform destroy

create new tf workspace.
- workspaces are ways to manage different env within the same aws account. (ex: dev, stage, prod)

# list available workspaces
docker-compose -f deploy/docker-compose.yml run --rm terraform workspace list
# create new workspace named dev
docker-compose -f deploy/docker-compose.yml run --rm terraform workspace new dev

to create standard variables throughout tf files, create variables.tf. create "prefix" variable to prefix all workspaces to identify project.
create locals block in main.tf to store dynamic variables. create prefix local var to identify different workspaces.
create common_tags item in locals block. add tags to resource block using the merge() function. view tags in aws by selecting ec2 instance, and selecting the "Tags" tab at the bottom of the window. Example tags are "prefix", "project", "contact".
finish by running commit, tf plan, tf apply, git push, and creating new merge request.

Configure GitLab CI/CD Flow

References:

Difference between environment branches and release branches:

environment branches include dev, stage, prod, and are better suited for applications that need "Rolling deployment of changes to the production env" like websites, services, etc.
release branches are better suited for software so people can access different versions of the software.

Steps

Setup

make sure main is up-to-date, checkout to new feature branch.
create .gitlab-ci.yml file and define the stages that are going to be required, including:
- Test and Lint ("run unit tests")
- Build and Push ("build in docker, push to ECR")
- Staging Plan
- Staging Apply ("push to EC2")
- Production Plan
- Production Apply ("push to EC2")
- Destroy
Create jobs with same name for each of them. Each will need to specify the stage, script, rules, test first with filler echo script.
Push to origin. Create new merge request. This will have started a pipeline with Test and Lint and Validate Terraform jobs.
Submit merge. This will have triggered the next pipeline with the remaining jobs.
Create/checkout to new production branch (either in GUI or CLI). This will have started the production pipeline with the jobs Test and Lint, Build and Push, Staging Plan, Staging Apply, Production Plan, Production Apply, Destroy.
In GitLab, make sure production branch is protected by going to Settings > Repository > Protected Branch. Only allow Maintainers to perform actions on branch.
In local, switch to new feature branch.

Configure Pipeline YML Jobs

Test/Lint: define the docker & docker-in-docker (dind) image and service resectively. Add script to install docker-compose. Add script to run testing and linting (wait_for_db, test, flake8).
Validate TF: since most jobs will need tf image, define it in global scope. Configure entrypoint to be able to run scripts. Since each job builds a new filesystem, make a script to change dir to deploy/, then run tf scripts for init, validation, and formatting. Entrypoint code:

entrypoint: # overrides entrypoint to work with gitlab ci-cd
    - "/usr/bin/env"
    - "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

Commit/push code to origin. Make merge request to main to test first two jobs. Accept merge request to finish pipeline. Make merge request to production to test first two jobs in production. Accept merge request to finish pipeline.
In local, change branch to main, pull updates. Create new feature branch.
Create new IAM user in AWS for GitLab pipeline. Use the following custom policy below. Make sure to change S3 bucket name.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "TerraformRequiredPermissions",
            "Effect": "Allow",
            "Action": ["ecr:GetAuthorizationToken", "ecr:BatchCheckLayerAvailability", "ec2:*"],
            "Resource": "*"
        },
        {
            "Sid": "AllowListS3StateBucket",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::recipe-app-api-devops-tfstate"
        },
        {
            "Sid": "AllowS3StateBucketAccess",
            "Effect": "Allow",
            "Action": ["s3:GetObject", "s3:PutObject"],
            "Resource": "arn:aws:s3:::recipe-app-api-devops-tfstate/*"
        },
        {
            "Sid": "LimitEC2Size",
            "Effect": "Deny",
            "Action": "ec2:RunInstances",
            "Resource": "arn:aws:ec2:*:*:instance/*",
            "Condition": {
                "ForAnyValue:StringNotLike": {
                    "ec2:InstanceType": ["t2.micro"]
                }
            }
        },
        {
            "Sid": "AllowECRAccess",
            "Effect": "Allow",
            "Action": ["ecr:*"],
            "Resource": "arn:aws:ecr:us-east-1:*:repository/recipe-app-api-devops"
        },
        {
            "Sid": "AllowStateLockingAccess",
            "Effect": "Allow",
            "Action": ["dynamodb:PutItem", "dynamodb:DeleteItem", "dynamodb:GetItem"],
            "Resource": ["arn:aws:dynamodb:*:*:table/recipe-app-api-devops-tf-state-lock"]
        }
    ]
}

In GitLab, in Settings > CI/CD > Variables, define AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY with values from new IAM user. Then, get URI from devops ECR Repo and create variable ECR_REPO.
Build/Push: in this job, define same docker image & service as test/lint. create scripts to install awscli, build docker image, authenticate into aws, push docker image, tag image again but with :latest tag, push that docker image to represent latest image.
Staging/Production Plan: for planning jobs, define script that changes to deploy/ dir, export tf env variable for getting repo uri, run tf init, workspace select/new for staging/production, and plan. Staging and Production are same except for workspace name. Optionally create shell script, but not shown in course.
Staging/Production Apply: for apply jobs, make script to go to deploy/, export same variable as plan jobs, then run tf init, workspace select staging/production, apply -auto-approve. These jobs apply to AWS, so optionally make them manual or automatic depending on desired workflow. Only difference between staging and production is workspace name.
Staging/Production Destroy: for destroy jobs, make script for changing to deploy/ directory, tf init, workspace select staging/production, destroy -auto-approve. Auto approve since they will be manual jobs.
Commit changes, push origin. Create merge request. Test full CI/CD pipeline:
1. Make changes on local in a separate branch, push to origin (creating new branch if needed)
2. Make a new merge request to main. This will trigger Testing/Linting and TF Validation.
3. Manager will accept merge request, triggering the jobs: Test/Linting, Build/Push, Staging Plan, Staging Apply, and a manual Destroy. Once these pass, main will be updated. ECR will contain a new image. EC2 contain bastion staging instance. Once Staging site is no longer needed, can be destroyed.
4. Manager will merge main with production branch (if in CLI, push to origin). This will run production pipeline: Test/Lint, Build/Push, Staging Plan, Staging Apply, Production Plan, Production Apply, and manual (blocked) Destroy. Ensure ECR image was created for production build. Ensure 2 EC2 instances were created - production and staging. Optionally destroy.

Network Configuration

Terms

VPC: Virtual Private Cloud, isolates production, staging, and development environments from each other. Restricts access to all network resources, if one is compromised the rest are safe. Everything inside an environment shares the same VPC.
Subnet: Subnetwork, contained inside vpc, used to run resources and determine access to internet
Public Subnets: used to give resources access to the internet publicly.
Private Subnet: runs resources that are used internally, and don't need public access. makes it more secure.
Gateway: part of the subnet that allows directional public access.
NAT Gateway: Network Address Translation Gateway, allows private subnets to have outbound access to the internet, but blocks the internet from having inbound access to them.
Availability Zones: spread resources across multiple data centers, creates application resiliency.
CIDR Block: Indicates what IP addresses will be available in the network. View this cheatsheet for determining short code.
Availability Zones: Way of dividing regions up in to separate zones so that if one of the zones goes down, the other zone can take over and handle all of the traffic. Multiple AZs are required for load balancers.
EIP: Elastic IP, way of creating ip address in aws vpc

Setup

checkout to master, pull origin, checkout to new feature branch.
create new network.tf file inside deploy/.
create main VPC resource, including cidr_block, and enable dns and hostnames. Add tags.
create main gateway, connecting it to the main vpc.
in main.tf, create a data block for aws_region, this will allow access to info on the current region later.
create public subnet group 'a'.
1. SUBNET: create subnet resource with cidr of type /24. Allow it to have public ip. connect to main VPC. Set availability zone. Create tags. ip used was 10.1.1.0
2. ROUTE TABLE: create route table to connect to private subnet. connect to vpc.
3. ROUTE TABLE ASSOCIATION: create association resource and connect to the public route table. This connects the route table and subnet.
4. ROUTE: this makes subnet accessible to public. create the route resource and set the route table id to the public route table, set destination cidr to 0.0.0.0/0 - signifying public access. set gateway id to main gateway id.
5. EIP: create eip resource, connect to vpc.
6. NAT GATEWAY: create the resource, set allocation id to the eip id. set subnet id to the public a subnet id.
do the same thing for subnet group 'b'. set the cidr_block to be a different ip for the subnet (make incremenetal for convention); ip used was 10.1.2.0/24
create private subnet group 'a'.
1. SUBNET: create resource with same /24 type cidr block. ip used was 10.1.10.0. connect to vpc. set availability zone. create tags.
2. ROUTE TABLE: create resource, connect to vpc.
3. ROUTE TABLE ASSOCIATION: create resource, connect to subnet, connect to route table.
4. ROUTE: create resource, connect to route table, connect to public nat gateway, set destination cidr to 0.0.0.0/0.
do the same for private subnet 'b'. ip used for cidr block was 10.1.11.0/24.
commit. push to gitlab, create merge request triggering first pipeline. accept merge request to trigger staging pipeline. this should have set up all of the resources. check aws to verify.

Configure Database

Steps:

add permissions to Devops CI IAM user in AWS to perform RDS tasks. Add the following code to the "TerraformRequiredPermissions" statement:

"rds:DeleteDBSubnetGroup",
"rds:CreateDBInstance",
"rds:CreateDBSubnetGroup",
"rds:DeleteDBInstance",
"rds:DescribeDBSubnetGroups",
"rds:DescribeDBInstances",
"rds:ListTagsForResource",
"rds:ModifyDBInstance",
"iam:CreateServiceLinkedRole",
"rds:AddTagsToResource"

with main branch up to date, checkout to new feature branch.
in variables.tf, create new db_username and db_password and add descriptions to both. these will be used to securely pass in username and password values to tf.
create new database.tf file, and create new subnet group resource. in the new resource, set subnet ids to both private subnets (a and b) along with the name and tags of the resource. This will add multiple subnets to the database.
create new securiy group resource. connect it to the main vpc. create ingress block to define inbound access rules.
create db instance. set attributes including identifier (name of the instance), name (name of the db), allocated storage in GB, storage type (used 'gp2'), engine and engine version, instance class, subnet group to main subnet group name, username / password, backup retention period in days, if there should be multiple availability zones, if it should skip the final snapshot, and the vpc security group ids. set the tags.
- read this to see all rds options
create outputs.tf, set output object with value of aws_db_instance.main.address
create sample.tfvars as an example file to store the db username and password variables, this will be committed. copy that file and create new terraform.tfvars, this will be the main file and will not be committed. It is equivalent to .env file. test it by running terraform plan command.
add TF_VAR_db_username and TF_VAR_db_password variables to GitLab CI/CD variables.
commit changes on local. push origin. create merge request. accept request. pipline should have succeeded (if not, check verion of db or instance class for aws). check aws to make sure all instances are running.

Update Bastion Configuration

Steps:

update policy for CI IAM user in AWS to have the following additions to actions:

"iam:CreateRole",
"iam:GetInstanceProfile",
"iam:DeletePolicy",
"iam:DetachRolePolicy",
"iam:GetRole",
"iam:AddRoleToInstanceProfile",
"iam:ListInstanceProfilesForRole",
"iam:ListAttachedRolePolicies",
"iam:DeleteRole",
"iam:TagRole",
"iam:PassRole",
"iam:GetPolicyVersion",
"iam:GetPolicy",
"iam:CreatePolicyVersion",
"iam:DeletePolicyVersion",
"iam:CreateInstanceProfile",
"iam:DeleteInstanceProfile",
"iam:ListPolicyVersions",
"iam:AttachRolePolicy",
"iam:CreatePolicy",
"iam:RemoveRoleFromInstanceProfile",
"iam:ListRolePolicies"

Set up EC2 key pairs with local SSH key by going to EC2 > Network and Security > Key Pairs and selecting "Import Key Pair". Add public key contents.
on local devops project, checkout to master, pull code from remote. Create new feature branch. In deploy/, add new directory templates/bastion/ and create user-data.sh. The templates directory is used to store "templates", or scripts, passed on to AWS.
In the new user-data.sh bash file, write a script that installs docker and add ec2-user to user group in order for the user to manage docker.
in bastion.tf, reference the new file as user_data in the aws_instance resource.
Create instance profile for bastion. An instance profile is assigned to bastion in order to give it IAM role info. Create the profile by creating a new file inside the templates/bastion dir called instance-profile-policy.json and pasting the following code:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "sts:AssumeRole",
            "Principal": {
                "Service": "ec2.amazonaws.com"
            },
            "Effect": "Allow"
        }
    ]
}

in bastion.tf, create new aws_iam_role resource and set teh assume_role_policy attribute to new json file. This allows the bastion server to assume the new aws role. create new aws_iam_role_policy_attachment resource and set the role and policy_arn in order to attach aws policy to the new role.
create new aws_iam_instance_profile resource and give it the name and role - using role resource defined before. set iam_instance_profile attribute in aws_instance resource to the new instance profile resource.
create new variable named bastion_key_name in variables.tf (needs to match up to name in aws ec2 key pair). add keyname attribute to _aws_instance resource in bastion.tf. then set subnet_id attribbute to one of the public subnets.
create security group to only allow inbound access via port 22 (SSH) to bastion and outbound access via 443, 80, and 5432. do this by creating new resource aws_security_groupin bastion.tf and connect it to the vpc. create ingress and egress blocks for setting the inbound and outbound rules. in the aws_instance resource, connect the aws security group by setting attribute vpc_security_group_ids to new resource.
in database.tf, in the aws_security_group resource, inside the ingress block, set the security_groups attribute to the newly created security group created above.
in outputs.tf, create new output called bastion_host and set the value to bastion dns in order to see the host after it has been created by TF.
commit the changes. push to GitLab. create and accept merge request. after the pipeline succeeds, check EC2 in aws to make sure bastion instance is running. Check bastion by connecting to it on local terminal by running:

ssh ec2-user@[host name]

Setting up ECS

ECS (Elastic container service) is used to run and manage live docker containers. it can be used to create clusters of services for the project.

Terms

Task execution role: a role that is used for starting a service (starting the service and giving it permissions)
Log group: groups all the logs for particular task into one place.
Container definition template: JSON file which contains details for teh container so AWS knows how to run it in production.
ECS Service: actual service that runs the docker container

Steps:

add the following json to the CI IAM user policy actions:

"logs:CreateLogGroup",
"logs:DeleteLogGroup",
"logs:DescribeLogGroups",
"logs:ListTagsLogGroup",
"logs:TagLogGroup",
"ecs:DeleteCluster",
"ecs:CreateService",
"ecs:UpdateService",
"ecs:DeregisterTaskDefinition",
"ecs:DescribeClusters",
"ecs:RegisterTaskDefinition",
"ecs:DeleteService",
"ecs:DescribeTaskDefinition",
"ecs:DescribeServices",
"ecs:CreateCluster"

In local, checkout to main, pull origin, checkout to new feature branch.
Create new file ecs.tf in deploy/ directory. Create the cluster by creating an aws_ecs_cluster resource and giving it a name.
create new file in templates/ called ecs/task-exec-role.json and past the following json below. This will allow the ecs task to retrieve teh image from ecr, put logs in the log stream, and create a new log stream. This creates the task execution role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

create new file in templates/ecs/ called assume-role-policy.json and paste teh json below. This allows the ecs task to assume the defined role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "sts:AssumeRole",
            "Principal": {
                "Service": "ecs-tasks.amazonaws.com"
            },
            "Effect": "Allow"
        }
    ]
}

In ecs.tf, define the aws_iam_policy for the task execution role policy and give it a name, path, description, and policy file path. This will create a new policy in aws for the task execution role given the json.
Create new aws_iam_role resource that points to the assume role policy (by giving it a name and assume_role_policy)
create aws_iam_role_policy_attachment resource to set the policy to the role, giving it a role and policy_arn. This is for the task execution role.
create new aws_iam_role resource giving it a name, assume_role_policy and tags. This is for the app iam role.
create log group by creating aws_cloudwatch_log_group resource and giving it name and tags.
Create container definition template by making container-definitions.json.tpl file inside templates/ecs/. Past the template from terraform. Add additional attributes from AWS docs. Optionally, paste complete code used.
Create new variable called ecr_image_api to hold the url for the ecr image api. In AWS/ECR dashboard, copy the URI for the devops ecr image. Past this into the default attribute with a tag :latest.
Do the same for the proxy ECR image, creating an ecr_image_proxy variable.
Create variable for the django secret key calling it django_secret_key. Save a default value to sample.tfvars and terraform.tfvars.
in ecs.tf, create new data block template_file for the container template definitions. point this to the container definitions json file with a tempmlate attribute. create vars block with app_image, proxy_image, django_secret_key, db_host, db_name, db_user, db_pass, log_group_name, log_group_region, allowed_hosts. First three from variables file, db vars from aws_db_instance, log group name from aws_cloud_watch_log_group, region from data.aws_region. allowed_hosts is set to '*' temporarily.
create aws_ecs_task_definition resource. Review docs for required attributes.
Rerun terraform init to download new template.
in GitLab, create new variable TF_VAR_django_secret_key.
in ecs.tf, create aws_security_group resource for the ecs_service. connect it to the main vpc. Set it to allow outbound requests from https by setting egress to 443 and the database by setting egress to 5432. Allow all internet access to proxy by setting ingress to 8000 and cidr_block to 0.0.0.0/0
create aws_ecs_service service. connect it to the cluster via cluster attribute, set task definitions via task_definitions attribute. set desired_count, and launch_type to "FARGATE". add network configuration block with subnets, security_groups, and assign_public_ip.
Add aws_security_group.ecs_service.id to database rds security group to allow access to database.
Push to gitlab, make merge, all tasks should succeed.
In aws, go to ecs to view cluster and logs.

Using Bastion

In order to use the django cli, a superadmin must be created. The goal is to connect to bastion via ssh and execute commands through it to create a superuser and any other cli tasks.

Steps

get bastion host from GitLab output
connect to the host via the following shell command:

ssh ec2-user@[bastion_host]

authenticate with docker with the following command:

$(aws ecr get-login --no-include-email --region us-east-1)

run the following command to create a new superuser (input email and pass when prompted):

docker run -it \
    -e DB_HOST=<DB_HOST> \
    -e DB_NAME=recipe \
    -e DB_USER=recipeapp \
    -e DB_PASS=<DB_PASS> \
    <ECR_REPO>:latest \
    sh -c "python manage.py wait_for_db && python manage.py createsuperuser"

test this by going to ECS instance in AWS, get public ip address, go to /admin, and try logging in.

Create Load Balancer

Steps:

In aws, add the following permission to the CI policy:

"elasticloadbalancing:*"

In local, checkout to master, pull origin, create new feature branch.
Create new load_balancer.tf file inside deploy/ directory
Create new aws_lb resource with a type of application. This specifies that the lb will handle requests at the http level vs at the network level (tcp, udp). Connect it to the public subnets, set the security groups, and tags.
Create new aws_lb_target_group resource to define group of servers the lb can forward requests to. Define protocol, vpc, target type of ip (will be assigning targets via ip address), port (proxy port), and path to health check page.
Create listener resource with aws_lb_listener. Define load_balancer_arn, port (80), protocol, and default action. The default action should forward requests to target group.
Create security group resource for the lb. Connect it to the main vpc. Define ingress and egress groups. Inbound access should be all from internet. Outbound should only be available to 8000.
Allow task to register to load balancer. Do this by changing allowed_hosts in container definitions data block (ecs.tf) from '*' to aws_lb.api.dns_name.
Go to the ecs service security group and change the ingress block to only allow inbound access from the load balancer (load balancer security group)
In the api ecs servcie (aws_ecs_service named api), change the subnets to be private and remove assign_public_ip=True. Add new load_balancer block in that service and define the target group, container name and container port. This tells ecs service to register new tasks with the target group.
Add new output block in outputs.tf to show the dns name of the load balancer (to access the api endpoint).
In django settings.py, set a block to check if running in aws then add the hostname to allowed hosts.
Commit changes, push to remote. Create merge, ensure all jobs pass. Test lb by accessing dns output in logs.

Handling Media Uploads with S3

Warning: I spent a few days debugging various bugs related to this. Ensure EVERYTHING is spelled correctly, make sure buckets are unique, and remember ACL 'read-only' is depricated so the methods used in s3.tf are the updated version of the course materials to reflect that stupid change.

ECS has temporary storage only, so when restarts everything is deleted. So if a user uploads an image, it needs to persist. We use S3 for this.

Steps

Add the following permission to the CI policy:

"s3:*"

In local, checkout to main, pull, checkout to new feature branch.
Create new file s3.tf.
Create new resource aws_s3_bucket. The following code ended up working:

resource "aws_s3_bucket" "app_public_files" {
  bucket_prefix = "${local.prefix}-files"
  force_destroy = true # allows tf to destroy
}

resource "aws_s3_bucket_ownership_controls" "app_public_files" {
  bucket = aws_s3_bucket.app_public_files.id
  rule {
    object_ownership = "BucketOwnerPreferred"
  }
}

resource "aws_s3_bucket_public_access_block" "app_public_files" {
  bucket = aws_s3_bucket.app_public_files.id

  block_public_acls       = false
  block_public_policy     = false
  ignore_public_acls      = false
  restrict_public_buckets = false
}

In ecs/container_definitions.json.tpl, add S3_STORAGE_BUCKET_NAME and S3_STORAGE_BUCKET_REGION to the environment variables.
In ecs.tf, add the following lines to the container definitions vars block:

s3_storage_bucket_name   = aws_s3_bucket.app_public_files.bucket
s3_storage_bucket_region = data.aws_region.current.name

Create new file deploy/templates/ecs/s3-write-policy.json.tpl and paste the following policy definition to give ecs access to the bucket:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObjectAcl",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:PutObjectAcl"
            ],
            "Resource": ["${bucket_arn}/*", "${bucket_arn}"]
        }
    ]
}

In ecs.tf, create a new template_file data block and connect it to the new template created. Set the template and vars attributes, vars being set to a block that includes the bucket arn.
Create new aws_iam_policy resource for this policy, connect it to the template file data block with the policy attribute.
Create new aws_iam_role_policy_attachment resource for ecs s3 access. Connect it to the role and policy arn.
To connect django to s3 bucket, add the boto3 and django-storages dependencies to requirements.txt. In the course, boto3 v1.12.0 and django-storages v1.9.1 were used. Boto is used to interact with aws s3 api, and is needed for django-storages.
Add the following settings to settings.py:

S3_STORAGE_BACKEND = bool(int(os.environ.get('S3_STORAGE_BACKEND', 1)))  # toggle s3 off/on
if S3_STORAGE_BACKEND is True:
    DEFAULT_FILE_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'  # set default file storage to aws

AWS_DEFAULT_ACL = 'public-read'
AWS_STORAGE_BUCKET_NAME = os.environ.get('S3_STORAGE_BUCKET_NAME')
AWS_S3_REGION_NAME = os.environ.get('S3_STORAGE_BUCKET_REGION', 'us-east-1')
AWS_QUERYSTRING_AUTH = False  # query string authentication false

In docker-compose.yml, add S3_STORAGE_BACKEND to app environment variables.
Push changes up to gitlab, ensure merge pipelines work. Test that image uploads correctly using the api endpoint and mod headers to store tokens.

Configure DNS and HTTPS

In the course, a custom domain name was registered in Route53. This domain was then hooked up to terraform. Steps start after domain is registered.

Steps

Add the following permissions to the CI policy in aws:

"acm:DeleteCertificate",
"acm:DescribeCertificate",
"acm:ListTagsForCertificate",
"acm:RequestCertificate",
"acm:AddTagsToCertificate",
"route53:*"

On local, checkout to main, pull origin, create new feature branch.
In variables.tf, add new dns_zone_name variable. Default to registered domain name.
Create new subdomain variable, set the type and default values. The default values will include order pairs for production, staging, and dev.
Create new dns.tf file. Add aws_route53_zone data block to get the zone from route53 based on domain name.
Create new aws_route53_record resource to create record for load balancer. Set the zone_id, name, type, ttl, and records.
In order to use https, create new aws_acm_certificate resource. Set the domain_name, validation_method ("DNS"), tags. Inside, set lifecycle attribute to block with create_before_destroy = True to keep tf running smooth when destroying.
Create new aws_route53_record resource to set the validation cname on the domain in order to validate. set the name, type, zone_id, records, and ttl. The name, type, and records will come from domain_validation.
Create new aws_acm_certificate_validation resource to trigger domain ssl validation. Set the certificate_arn and validation_record_fqdns attributes.
In load_balancer.tf, create new aws_lb_listener resource for the https listener settings. Set the port to 443, protocol to "HTTPS", certificate_arn to validation certificate arn created in dns.tf. Set default action to forward and target_group_arn to the lb target group.
Change the http lb listener resource to "redirect" instead of "forward". Remove the target group arn. Add new redirect block inside default_action block and set port to 443, protocol to "HTTPS", and status_code to "HTTP_301".
In the load balancer security group, create a new ingress block in addition to the http ingress block, and set the from/to_port to 443 to accept https requests. Set the cidr blocks the wildcard value.
In ecs.tf, in the template file resource for api_container_definitions, change the allowed_hosts variable to get the domain name from route53 (aws_route53_record.app.fqdn)
In the aws_ecs_service resource, add depends_on attribute and reference it to the aws_lb_listener.api_https resource to make sure the https resource runs first.
In outputs.tf file, change the value of api_endpoint to reference the custom domain name in route53.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
app		app
deploy		deploy
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
architecture-v3.png		architecture-v3.png
docker-compose-proxy.yml		docker-compose-proxy.yml
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

License

IkeHunter/Recipe-App-Api-DevOps

Folders and files

Latest commit

History

Repository files navigation

DevOps Course Notes

Table of Contents

SSH Authentication

Create SSH Key Pair

AWS Terms

AWS Account Setup

Setup Users / Groups

Set up AWS-Vault

Create AWS Budget (optional)

Setup NGINX Proxy

Steps:

Prepare app for deployment

Setting up Terraform

Configure GitLab CI/CD Flow

Steps

Setup

Configure Pipeline YML Jobs

Network Configuration

Terms

Setup

Configure Database

Steps:

Update Bastion Configuration

Setting up ECS

Terms

Steps:

Using Bastion

Steps

Create Load Balancer

Steps:

Handling Media Uploads with S3

Steps

Configure DNS and HTTPS

Steps

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages