High Availability Hadoop in Docker with JournalNode Quorum and Zookeeper, with Kubernetes in mind. Based on the official guide Hadoop HA with JournalNode Quorum.
- Java 9
- Hadoop 3.2.1
For the sake of simplicity we will use 1 Zookeeper, 1 JournalNode, 2 NameNodes and 2 DataNode to be able to run on a single Docker host.
It goes without saying that you should adjust these numbers in production.
- Create a common docker network
docker network create hadoop
- Start Zookeeper
docker run --net=hadoop --name zk-1 --restart always -d zookeeper
- Start JournalNode
docker run -d --name=jn-1 -e "NNODE1_IP=nn1" -e "NNODE2_IP=nn2" -e "JN_IPS=jn-1:8485" -e "ZK_IPS=zk-1:2181" --net=hadoop -v /tmp/hadoop-jn:/mnt/hadoop tbont/hadoop-ha-docker /etc/bootstrap.sh -d journalnode
- Start hosts NameNodes (separate terminals)
- first
docker run --hostname=nn1 --name=nn1 -it -e "NNODE1_IP=nn1" -e "NNODE2_IP=nn2" -e "JN_IPS=jn-1:8485" -e "ZK_IPS=zk-1:2181" --net=hadoop -v /tmp/hadoop-nn1:/mnt/hadoop tbont/hadoop-ha-docker bash
- second
docker run --hostname=nn2 --name=nn2 -it -e "NNODE1_IP=nn1" -e "NNODE2_IP=nn2" -e "JN_IPS=jn-1:8485" -e "ZK_IPS=zk-1:2181" --net=hadoop -v /tmp/hadoop-nn2:/mnt/hadoop -v /tmp/hadoop-nn1:/mnt/shared/nn1 tbont/hadoop-ha-docker bash
- Format the active NameNode and Sync the initial state to the standby NameNode (separate terminals)
- first (in
root@nn1:/
)/etc/bootstrap.sh -d format
- second (in
root@nn2:/
)/etc/bootstrap.sh -d standby
Notice that the volume from nn1 - which now holds the initial cluster state - is just mounted to a certain directory where all data will be copied to nn2's volume.
At this point both volumes hold the initial cluster state and can be used as a mountpoint in actual NameNode images.
- Start both NameNodes (separate terminals)
- first (in
root@nn1:/
)/etc/bootstrap.sh -d namenode
- second (in
root@nn2:/
)/etc/bootstrap.sh -d namenode
Now both NameNodes should be running, check it with command root@nn2:/usr/local/hadoop/bin# bash hdfs haadmin -getServiceState nn1
. nn2 should be standby
while nn1 is active
.
- Start DataNodes (separate terminals)
- first (in
root@nn1:/
)/etc/bootstrap.sh -d datanode
- second (in
root@nn2:/
)/etc/bootstrap.sh -d datanode
- Kill the active NameNode to trigger failover
Just press CTRL-C on the terminal which is attached to the active NameNode. DataNodes are still connected. Wait a bit and check it with command root@nn2:/usr/local/hadoop/bin# bash hdfs haadmin -getServiceState nn1
. nn2 should be active
while nn1 is "nn1":8020; java.net.UnknownHostException
.
-d
- Runs the service continuously instead of auto quiting-b
- Opens a bash terminal after starting to inspect its internal file system.
By default the cluster will start with the name "cluster". You can set this name with $CLUSTER_NAME
For the containers to run you need to set 3 environment variable on the docker run command.
NNODE1_IP
: Address to NameNode 1 without portNNODE2_IP
: Address to NameNode 2 without portJN_IPS
: Comma separated addresses for JournalNodes with port 8485. (At least 3 or more as long as it is an uneven number.)ZK_IPS
: Comma separated addresses for Zookeeper nodes with port (default is 2181). (At least 3 or more as long as it is an uneven number.)
/mnt/hadoop/dfs/name
- For NameNode storage/mnt/hadoop/dfs/data
- For DataNode storage/mnt/hadoop/journal/data
- For Journal storage/usr/local/hadoop/logs/
- For the logs. You can also replace the/usr/local/hadoop/etc/log4j.properties
with an attach docker volume to that file to customize the logging settings
It is important that the IPs of the namenode are the IP address/DNS name of the containers because Hadoop actually binds to those addresses.
In certain situations the NameNodes need to fence for a proper failover. Now the Fence will always return true without doing anything. Replace /etc/fence.sh
with a docker volume attach for your own fencing algorithm. Probably something like a call to your docker scheduler to close down the other NameNode.