Skip to content

Latest commit

 

History

History

part1projectsetupinaws

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Project Setup in AWS

Follow the below steps to setup the project environment in AWS

1. Create an AWS EC2 instance

Create a t2.2xlarge EC2 instance with Amazon Linux 2 AMI (HVM) with Storage equal to 96 GB and Allow ports in the range of 0-10000. While creating the instance, generate an RSA encrypted public and private key pair with the name hive_docker_setup. After successful instance creation, a PEM (Privacy Enhanced Mail) file will be generated by the name: hive_docker_setup.pem, download and save it in a secure place on your local machine. This file allows you to securely SSH into your instance. So, do not share this with anyone.


2. Connecting to AWS EC2 instance

An AWS EC2 insance has its own properties which can be used to connect to it securely via SSH. Below are those properties:

  • a default user with username ec2-user
  • Public IPv4 DNS hostname (Instance IP) of the form: ec2-3-108-167-166.ap-south-1.compute.amazonaws.com. We will refer this as my-aws-ip-dns-hostname from here on. This changes everytime you stop and start an EC2 instance.

These properties can be found in the Instances page of AWS Management Console. We will need three things to connect to the instance: a PEM key pair file (from Step 1), Instance username and Instance IP. Once you have these, run the below command in command prompt (Windows users: make sure you have enabled OpenSSH Client/Server optional feature in Settings)

$ ssh -i "hive_docker_setup.pem" ec2-user@my-aws-ip-dns-hostname

Make sure you have hive_docker_setup.pem file placed in the directory from where you are running the command.


3. Install Docker on EC2 instance

Once you have successfully connected to the instance, run the below commands to install Docker and setup docker-compose:

$ sudo yum update -y

$ sudo yum install docker

$ sudo curl -L "https://github.com/docker/compose/releases/download/1.29.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

$ sudo chmod +x /usr/local/bin/docker-compose

$ sudo gpasswd -a $USER docker

$ newgrp docker

4. Copy docker setup files from your local machine to AWS EC2 instance

Place hive_docker_setup.pem file and docker_exp folder in your command prompt directory and run:

$ scp -r -i "hive_docker_setup.pem" docker_exp ec2-user@my-aws-ip-dns-hostname:/home/ec2-user/docker_exp

/home/ec2-user/ is your EC2 instance home directory.


5. Start the docker containers on EC2 instance

# Start Docker
$ sudo systemctl start docker

# below commands in EC2 instance terminal
$ sudo chmod -R 755 docker_exp
$ cd docker_exp

# Start docker containers
$ docker-compose up

# Port Forwarding to access services locally
$ ssh -i "hive_docker_setup.pem" ec2-user@my-aws-ip-dns-hostname -o "ServerAliveInterval 30" -L 2081:localhost:2041 -L 4888:localhost:4888 -L 4889:localhost:4889 -L 2080:localhost:2080 -L 8050:localhost:8050 -L 8051:localhost:8051 -L 4141:localhost:4141 -L 4090:localhost:4090 -L 3180:localhost:3180 -L 50075:localhost:50075 -L 50070:localhost:50070 -L 50010:localhost:50010 -L 3077:localhost:3077 -L 4080:localhost:4080 -L 9870:localhost:9870 -L 8188:localhost:8188 -L 9864:localhost:9864 -L 8042:localhost:8042 -L 8088:localhost:8088 -L 8080:localhost:8080 -L 8081:localhost:8081 -L 10000:localhost:10000 -L 6080:localhost:6080 -L 8998:localhost:8998 -L 3306:localhost:3306

# Check status of all running containers and get their ports and names
$ docker ps

6. Getting into bash shell of different containers

Once you complete Step 4, plenty of Docker containers (details in docker-compose.yml) will be created and you can access bash shell of each containers like below:

# Sqoop container
$ docker exec -it ra_sqoop bash

# Spark container
$ docker exec -it hdp_spark-master bash

6.1 Accessing MySQL and Hive prompt

# MySQL
$ docker exec -it ra_mysql bash
$ mysql -u root -p (password will be example)

# Hive
$ docker exec -it ra_hive-server bash
$ hive

7. Add Hive-Spark dependencies

Download postgresql-42.3.1.jar JAR from the internet. Then copy JAR from local machine to EC2 instance by running the below command in your local terminal:

$ scp -r -i "hive_docker_setup.pem" postgresql-42.3.1.jar ec2-user@my-aws-ip-dns-hostname:/home/ec2-user/

Start the docker containers on AWS and run the below commands:

# 1.3.5 copy from home dir to spark/jars dir in spark container
$ docker cp /home/ec2-user/postgresql-42.3.1.jar hdp_spark-master:/spark/jars

# 2. To copy hive-site.xml from hive and put inside spark conf directory(Run this command from ec2 home directory).
$ docker cp ra_hive-server:/opt/hive/conf/hive-site.xml /home/ec2-user/
$ docker cp hive-site.xml hdp_spark-master:/spark/conf/hive-site.xml

This step is essential when we do XML data processing in spark container and load the processed data into a table in Hive container during the data warehouse creation part.


8. Shut down and exit commands

Make sure to run the below commands and stop your EC2 instance once you are done with your job. Because running AWS EC2 instance costs you real money. From the /home/ec2-user/ directory in EC2 instance:

$ cd docker_exp
$ docker-compose stop
$ cd ..
$ sudo systemctl stop docker

Exit from SSH and then stop your EC2 instance in the AWS Management Console.