GitHub - HiteshRepo/Kubernetes-101

High Level Arch

Master Node
1. Api server - face of k8 master, every c’ happens via api server
2. Schedulers - schedule workloads to worker nodes
3. Control manager - compare state mentioned in request [desired] and actual state, then act accordingly
4. Etcd - distributed key value store - only stateful component - source of truth
Worker Nodes
1. Kubelet - take request from master and fulfil them, reports to master node
2. Docker runtime - to run containers - OCI compliant container engine, deals with container abstraction
3. Kube proxy - manage n/w b/w worker nodes, Assigns IP to each pod with the help of CNI provider
4. Pods
Flow
1. Client sends a request - To keep infra in a particular state
2. Api server receives request and save it to Etcd
3. Ctrl manager keeps looking at Etcd to notice any differences b/w current state and desired state
4. Once decision has been made on what needs to be changed in pods, scheduler assign actual pod configuration to worker node
5. Kubelet in worker node keeps listening to the api server in Master node
6. Kubelet uses docker runtime to spin up new pods with mentioned configuration
7. The new IPs of pods and routes definition are done by Kube proxy - IP table route
Even Kubernetes components like api server, controller, scheduler, kubeproxy, etc run as pods

Kubectl

CLI to communicate with k8 api server
Restful communication
kubectl [command] [type] [name] [flags]
Commands - get, patch, delete
Type - pods, services, jobs
Flags - -o (wide)
Connects to API server of K8 master node
Use rest apis to do that
Kubeconfig - info related to :
1. Cluster info
2. User info
3. Namespace
Default loc of kubeconfig - $HOME/.kube/config
KUBECONFIG env var

k8 commands

kubectl version
kubectl version —short - client version and server version
kubectl get nodes
kubectl get nodes -o wide
kubectl config view - to get cluster info, user info and namespace
kubectl get config get-contexts
kubectl get pods
kubectl get pods -A -o wide
kubectl apply -f file.yaml
kubectl delete /
kubectl describe /
kubectl get pods —show-labels
kubectl get svc
kubectl get endpoints
kubectl describe endpoints svc-name
kubectl rollout history deployment/
kubectl rollout undo deployment/ --to-revision=1
kubectl cordon node-name -> no further pod will be scheduled here -> STATUS: SchedulingDisabled
kubectl replace -f -> Replaces existing configuration with latest, works same as apply
kubectl scale --replicas=6
minikube ip
minikube ssh - to connect to minikube
eval $(minikube docker-env) - to make docker point to minikube docker context

Formatting o/p

-o json -> in json formatted API object
-o name -> only name of the resource
-o wide -> additional info in plain-text format
-o yaml -> YAML formatted API object

Minikube Objects

Persistent entity in K8s system and rep state of system
Includes:
1. Spec - desired/requested state
2. Status - current state
Also called API resources
Smallest deployable unit - pods
Abstraction on top of pods - replica-set, stateful-set, daemon-set, job and cron-job, services and ingress
Abstraction on top of Replicaset - Deployment
Volumes, PVC,PV, Storage Class
ConfigMap and Secrets
Object descriptor YAML - to communicate our desired state
Parts of object descriptor file:
1. apiVersion,
2. kind [of object],
3. metadata [info about object, name - unique identifier, labels]
4. Spec - actual specification of the object to be created
Replication controller (same purpose as Replica Set) -
1. Replica set is recommended,
2. Replication controller is an older concept
3. Replication controller does not have 'selector' under spec, but Replica Set has
4. Selector helps Replica Set to attach any already running pods to itself or any other pods that can be started individually in future

Pods

Smallest unit
Run inside nodes
Can run multiple pods in 1 node
Pods are a wrapper over containers
Multiple containers in a pod is possible and they share the same container env, but best practice is to run 1 container/pod unless other containers are monitoring/tracking apps
Ring-fenced env
1. Network stack
2. Volume mounts
3. Kernel namespace
High level Pod lifecycle -
1. Kubectl -> API server
2. API server -> Etcd
3. Scheduler reads from Etcd -> Node [kubelet/worker]
4. Pod - pending
5. Pod - Running / Failed
6. Pod - Success
Intra pod communication
1. Containers within pod talk to each other via localhost
2. Share same n/w namespace, hence same IP and Port
3. Container within Pod to avoid same port, use to avoid port binding error
Inter pod communication
1. Each pod gets own private IP from k8 cluster vpn
Container specs tags
1. name
2. image
3. command
4. args
5. workingDir
6. ports
7. env
8. resources
9. volumeMounts
10. livenessProbe
11. readinessProbe
12. lifecycle
13. terminationMessagePath
14. imagePullPolicy
15. securityContext
16. stdin
17. stdinOnce
18. tty

Replica sets

Abstraction over pods, which ensures that a particular no. of pods is always running in the cluster
Uses Reconciliation control loop -> Current state - Desired State - Observe-Diff-Act
Ensures that a pod or homogeneous set of pods are always available
Maintains desired no. of pods:
1. Excess pods - killed
2. Launch new pod - in case of fail/deleted/terminated
Associated with pods via matching labels
Labels: Key-Value pair attached to objects like pod - user defined
Selectors: Help identify objects in cluster - equality based / set based
apiVersion - apps/v1
kind - ReplicaSet
metadata - name, labels…
spec -
1. replicas
2. selector - matchLabels - app
3. template - pod specification - prevents specifying separate pod yaml
Distributes pods evenly across nodes
Deleting replica set -> deletes associated pods as well

Health check probes for containers:

These diagnostics are performed periodically - in template section of replicaset/deployments - httpGet [path] /exec [command] - initialDelaySeconds and periodSeconds

readinessProbe - indicates if container is ready to serve requests, halts sending new requests until probe succeed - in template section of replicaset/deployments - httpGet/exec - initialDelaySeconds and periodSeconds
livenessProbe - indicates whether the container is running healthy, if fails, declares container unhealthy and restarts container
startupProbe - protect slow starting containers with startup probes

Supported check types

httpGet - /health endpoint
exec - shell script or command to exit successfully with return code 0
tcpSocket - open a socket to container on specified port successfully

Services

Pods are ephemeral
They are recreated and not resurrected
Services are abstraction of a way to expose an app running on a set of pods by reliable network svc.
Exposes pod over a reliable IP, Port, DNS
Associated with pods via matching labels
Also used for inter pod communication
Client -> service [DNS/IP] -> Endpoint object [list of all pod IP address associated with svc, keeps getting updated]
Types:
1. ClustedIP - default - cluster-internal IP only access within n/w
2. NodePort - exposes node on a static port - NodeIP:NodePort
3. LoadBalancer - Exposes service publicly
apiVersion - v1
kind - service
metadata - name
spec - type, selector - app [same as replicaset/template/metadata/name or pod/metadata/name]
ports - protocol, port, targetPort
Deleting pod or replica sets does not affect svc but just removes them from endpoints. Upon new spin ups, services will update the endpoints based on label-selector
Readiness and Liveliness probe also affect the endpoints

Deployments

How to deploy a new version of app?
How to roll back?
Is replica set good enough?
Change in rs.pod spec - no effect
Delete and re-deploy rs - change effected
Updates with zero downtime
Rollbacks
A higher level of abstraction over replica set, provides declarative way of upgrading and rollbacks to pods
Flow:
1. Current state - RS 1
2. Client -> Revision 2 -> API server
3. Scheduler + Control Manager -> spin up RS 2, pods created
4. Terminate pods in RS1
5. RS 1 still persists -> so that during rollback, the can be used
The diff b/w replica-set and deployment is the kind
Default strategy - RollingUpdate - maxSurge, maxUnavailable
Recreate strategy -> downtime

Volumes

Containers are ephemeral
We require persistent storage
Types:
1. emptyDir -
  1. No data at start,
  2. created when pods get created,
  3. mounted and accessible across all containers in the pod
  4. Help sharing data across containers
  5. spec -> volumes/name : html, volumes/emptyDir: {}
  6. spec/containers -> volumeMounts/name : html, volumeMounts/mountPath:
  7. Good option to share data b/w container but data is lost once pod goes down
2. hostPath -
  1. Storage from backing Node [Host] is mounted inside container [Pod]
  2. Data retained on Node even after Pod goes down
  3. Data not available if Pod is scheduled on another Node
  4. Cant save data from Node outage
  5. spec -> volumes/name : html, volumes/hostPath/path: , volumes/hostPath/type: Directory
  6. spec/containers -> volumeMounts/name : html, volumeMounts/mountPath:
  7. Good option to shared data across pods in a Node
3. Cloud volume type -
  1. awsEBS
  2. gcePersistentDisk
  3. azureDisk
4. Nfs

PV and PVC

Abstracts how storage is provided and how storage is consumed
PV
1. Represent actual volume
2. Provisioned by Admin or dynamically provisioned using StorageClass
3. Lifecycle <-> Pod
PVC
1. Represent request for volume by user
2. Abstract the storage resource without exposing details how those volumes are implemented
3. Claims are fulfilled by PV hence PVC is linked with PV
Retain - Actual volume is retained even after PV and PVC is deleted
Delete - Actual physical storage is deleted, default
Recycle - Deprecated
Access modes
1. ReadWriteOnce - RWO - volume can mounted by read-write by single node
2. ReadOnlyMany - ROX - read-only by many nodes
3. ReadWriteMany - RWX - read-write by many nodes

Storage Class

Provisioning
1. Static:
  1. Admin creates a number of PVC
  2. Cluster matches one of the PV for a PVC
  3. Only one PVC can be attached for a PV
2. Dynamic:
  1. Allows storage volumes to be created on-demand as per the request
  2. Claims are fulfilled by PV, hence PVC are linked to PV
Helps create dynamic on-demand PVs
PVC refers storage class, Storage class provisions PVC on demand, Deployment/ReplicaSet/Pod mount the PV via PVC
Basically storage class are template for PVs
Provisioners - cloud service providers
Parameters - specific to provisioners
If PVC is deleted, PV is also gone, id reclaim policy is not set to ‘retain’

Other sources

Link to K8 commands compilation: https://www.evernote.com/shard/s645/sh/18a2e56b-3451-90a2-75b5-2f91ec5ac6ef/3e5b88d59f5bb686d5fb7350cf823e63

Namespaces

resource address format: ...cluster.local
kubectl create -f --namespace=
Also, namespace can be mentioned in metadata of the resource
kubectl create namespace
kubectl config set-context $(kubectl config current-context) --namespace=

Resource Quota

apiVersion: v1
kind: ResourceQuota
metadata:
    name: compute-quota
    namespace: dev
spec:
    hard:
        pods: "10"
        requests.cpu: "4"
        requests.memory: 5Gi
        limits.cpu: "10"
        limits.memory: 10Gi

Imperative commands

--dry-run=client -> resource won't be created, instead will tell if resource would be created or not
-o yaml -> resource definition in YAML format
kubectl run nginx --image=nginx --dry-run=client -o yaml : will not create the resource 'pod' but will give pod declarative definition
kubectl create deployment --image=nginx nginx --dry-run -o yaml : will not create the resource 'deployment' but will give deployment declarative definition
kubectl create deployment nginx --image=nginx--dry-run=client -o yaml > nginx-deployment.yaml : saves definition to a file
kubectl expose pod redis --port=6379 --name redis-service --dry-run=client -o yaml : will not create the resource 'service' but will give service declarative definition

Commands in Docker/Kubernetes

CMD vs EntryPoint - command line args replace CMD while it gets appended in EntryPoint
Default can be specified by having both CMD and EntryPoint - CMD instructions are appended to EntryPoint
ENTRYPOINT (docker) -> command (k8)
CMD (docker) -> args (k8)

Editing properties of a running Pod

Specifications of an existing POD, CANNOT be edited other than the below:
1. spec.containers[*].image
2. spec.initContainers[*].image
3. spec.activeDeadlineSeconds
4. spec.tolerations
The environment variables, service accounts, resource limits of a running pod cannot be edited
There are 2 options to achieve though:
1. Approach 1:
  1. kubectl edit pod -> This will open up pod specification in a vi editor
  2. Change the specifications and try to save it -> will through error but will save the changed specifications in a temp file
  3. delete the existing pod: kubectl delete pod <pod-name>
  4. create the changed pod: kubectl create -f <tmp file path>
2. Approach 2:
  1. Extract the pod definition in YAML format to a file using the command: kubectl get pod <pod-name> -o yaml > my-new-pod.yaml
  2. vi my-new-pod.yaml: changes specifications and save
  3. kubectl delete pod
  4. kubectl create -f my-new-pod.yaml
For deployments: kubectl edit deployment my-deployment, the new changes will be applied to the pods (running pods will be terminated and new pods with latest specifications will be created)

Environment variables

In pod specifications, under 'env' attribute. This is an array of (Key value pair) name & value.
Other ways of specifying env vars are: ConfigMap and Secrets
Example of direct key-value pair under 'env'

env:
    - name: APP_COLOR
      value: pink

Example of config-map under 'env'

env:
    - name: APP_COLOR
      valueFrom:
        configMapKeyRef: <config-map-name>

Example of secret under 'env'

env:
    - name: APP_COLOR
      valueFrom:
        secretKeyRef: <secret-name>

ConfigMaps

Centralized way of configuring configuration data in the form of key-value pairs.
When pods are created, these configuration data are injected to the apps inside the container inside the pod for usage
Phases: Create config map, inject them into pod
Imperative ways of creating a config map

kubectl create configmap <config-map-name> --from-literal=key1=value1 --from-literal=key2=value2
kubectl create configmap <config-map-name> --from-file=<path-to-file>

Declarative way of creating a config map: apiVersion, kind, metadata, data (ke-value pairs)

kubectl apply -f <config-map-definition-file-path>

kubectl get configmaps
kubectl describe configMaps
Map config map to pod definition/template

envFrom:
    - configMapRef:
        name: <config-map-name>

volumes:
- name: <volume-name>
  configMap:
    name: <config-map-name>

Secrets

Imperative way to create a secret:

kubectl create secret generic <secret-name> --from-literal=<key>=<value>
kubectl create secret generic <secret-name> --from-file=<path-to-file>

Declarative way to create a secret

kubectl create -f <secret-file-name>

Encoded data values in secret definition. Although just encoding is not enough, so it is better to use some KMS decryption
kubectl get secrets
kubectl describe secrets
kubectl describe secrets -o yaml : to view the hashed secrets
Map secret to pod definition/template

envFrom:
    - secretRef:
        name: <secret-name>

volumes:
- name: <volume-name>
  secret:
    secretName: <secret-name>

If secret is used as volume mount, each attribute in secret is creates its own file and with value as contents in it

Docker Security

Host itself runs a set of processes, docker daemon, ssh-server, etc.
Docker containers unlike VMs share same linux kernel as the hosts' but they are separated by namespaces
Container has its own namespace and host has its own
All processes run on container in fact run on host itself but in a different namespace (namespace of container)
Docker container can see only see its own processes only
Listing processes in a container (ps aux) will only show processes within container
Listing processes in the host (ps aux) will show all processes within and out of container(s)
Docker container has a set of users root users and a set of non-root users
By default, docker runs processes within container as root users
User can be changed, user can be set using while running docker using --user flag: docker run --user=1000 ubuntu sleep 1000
Another way to set user is creating a custom image from an existing image and setting used in the docker file itself Example dockerfile:

FROM ubuntu
USER 1000

building the above custom image

docker build -t my-ubuntu-image .

run the image w/o specifying the user 12. If we run container as a root user, is it not dangerous? 1. Docker implements the set of security features that limits the capability of the root user within the container 2. Root user within the container is not really same as root user on host 3. Docker uses linux capabilities to achieve this 4. Root user is the most powerful user in a system and can do set of these ops: CHOWN, DAC, KILL, SETGID, SETUID, NET_ADMIN, KILL, etc. 5. The process running as a root user too has unrestricted access of the system 6. Docker's root user by default has limited capabilities, they do not have all the privilleges 7. We can add more capabilities to the container's user while running it: docker run --cap-add KILL ubuntu 8. We can drop capabilities of the container's user while running it: docker run --cap-drop MAC_ADMIN ubuntu 9. We can run container with all privileges as well: docker run --privilleged ubuntu

Security contexts

Configuring user id of a container, adding/removing privileges of a user in a k8 is also possible
Security settings can be configured at container/pod level
If we set at pod level the settings will be applied to all containers within pod
If we set at both pod and container level, then settings of container level will take precedence over pod settings
Configuration

apiVersion: v1
kind: Pod
metadata:
    name: web-app
spec:
    securityContext:
        runAsUser: 1000 #all conatainers within this pod will run with user id 1000
    containers:
        - name: ubuntu
          image: ubuntu
          command: ["sleep", "1000"]
          securityContext:
            runAsUser: 2000 #the user id for this container would be 2000 overrinding 1000
            capabilities: 
                add: ["MAC_ADMIN", "KILL"]

Service Accounts

Two types of account in K8: User a/c and Service a/c.
User account: used by humans, Service account: for automated tasks(by machines)
User account types (not limited to): Admin (to perform admin tasks), Developer(to access the cluster and deploy apps)
Service account types are used my an app to interact with k8 cluster, examples:
1. A monitoring app like Prometheus uses service a/c to poll k8 metrics/logs to come up with performance metrics
2. An automated build tool like Jenkins uses service a/c to deploy app on the cluster
To create a service a/c: kubectl create serviceaccount <account-name>
To view all service a/c: kubectl get service a/c
On creation of service a/c a token is created automatically: kubectl describe serviceaccount <acocunt-name> - see Tokens
The above token can be used by the external apps for authentication of kube-api as a bearer token.
Token is stored as a secret object.
To view the secret object: kubectl describe secret <secret-name>
Steps:
1. create a service a/c
2. assign role based permissions/access control mechanisms
3. export the token
4. use it in external app while making kube api requests
If the external app itself is hosted in K8 cluster, the exporting can be made simpler by mounting the secret as a volume to the application.
To view the secret files in the pod (which has secret mounted as volume):
1. exec into the pod: kubectl exec -it
2. ls /var/run/secrets/kubernetes.io/serviceaccount -> ca.crt, namespace, token
3. cat /var/run/secrets/kubernetes.io/serviceaccount/token
Default service accounts are mounted automatically to every pods, which has limited permissions.
To assign a service account: spec/serviceAccountName: <service a/c name>
To prevent k8 from automatically mounting default service a/c : spec/automountServiceAccountToken: false

Resource Requirements

Scheduler decides which node the pod goes to.
1. Scheduler takes into consideration: the amount of resources by a pod and availability of it in node.
If there is no sufficient resources available on any of the nodes, K8 keeps the po in pending state with event reason as insufficient CPU/memory/disk
Default CPU: 0.5, MEM: 256 Mi, Disk: (Resource Request)

spec/conatiners:

resources:
 requests:
     memory: "1Gi"
     cpu: 1

cpu 0.1 means 100m (m -> milli)
cpu can be requested as low as 1m
1 cpu equivalent to
1. 1 AWS vCPU
2. 1 GCP core
3. 1 Azure core
4. 1 Hyperthread
1Gi memory means 1 Gibibyte while 1G means 1 Gigabyte
set limits under spec/conatiners/resources, to prevent pod from consuming too much resources and suffocating other pods
```
limits:
 memory: "2Gi"
 cpu: 2
```
when pod tries to go beyond the limit cpu, k8 tries to throttle the cpu so that pod will not be able to consume more cpu
when pod tries to go beyond the limit mem, k8 terminates the pod
The status OOMKilled indicates that it is failing because the pod ran out of memory. Identify the memory limit set on the POD

Taints and Tolerations

Taints and tolerations are used to set restrictions on what pods can be scheduled on which node.
They have nothing to do with security.
Lets' take a use case:
1. We have 4 pods: A, B, C, D
2. We have 3 nodes: Node1, Node2, Node3
3. Now if there are no taints and tolerations configured, then A, B, C, D will be placed on nodes via load balancing/resource management
4. But suppose we want to place pods like D (running same as in D) to be scheduled only on Node1
5. Then we apply a taint on Node1, so since until now none of the pods have any sort of tolerations configured, none of the pods will be scheduled in Node1
6. Now we can enable pod D to be placed on Node1, by adding a toleration on pod D.
Taints are placed on nodes and Tolerations are placed on pods.
Apply Taints to nodes: kubectl taint node <node-name> <key>=<value>:<taint-effect>
Taint-Effect determine what happens to the pod if they DO NOT TOLERATE this taint, there are 3 taint-effects
1. NoSchedule: Pods will not be scheduled
2. PreferNoSchedule: K8 will try not to schedule pods but with no guarantee
3. NoExecute: New pods will not be scheduled, but if already there are few pods in the node they will be evicted.

Apply Tolerations to pods (@ spec/containers):

tolerations:
    - key: "app"
      operator: "Equal"
      value: "blue"
      effect: "NoSchedule"

Taints and tolerations do not guarantee that certain pods will be scheduled on certain nodes only. They enable nodes to accept certain pods but those pods can very well be placed on other nodes. as well.
Scheduler does not place any pod on master node: because when K8 cluster is first set up a taint is applied on the master node automatically that prevents placing of other pods on master node.
To see the above taint in master node: kubectl describe node kubemaster | grep Taint

Node Selectors

There might be use cases where we will require placing certain pods only certain nodes.
For example,
1. There are 3 nodes (2 nodes with low resources and 1 node with high resources).
2. We would like to place pods running high processing apps in node with higher resources.
The default setup places pods in nodes based on load balancing and resource availability strategy.
Also, with taints and tolerations, we can guarantee nodes to accept certain pods but not guarantee placing pods on certain nodes.
A simple way to achieve this is using Node Selectors.
An example of Pod configuration using node selector

apiVersion: v1
kind: Pod
metadata:
    name: myapp-pod
spec:
    containers:
        - name: data-processor
          image: data-processor
    nodeSelector:
        size: Large

The key value pair (size: Large) are in fact labels assigned to nodes. Scheduler uses these to assign pods to specific Nodes.
To label a node: kubectl label nodes <node-name> <key>:<value>
Limitations:
1. Cannot serve complex requirements: if we want to place pod on a large or medium nodes instead of small.
Node affinity is the solution here.

Node Affinity

Complex requirements can be executed in Node Affinity.
The example used in Node Selectors can be re-defined as this:

apiVersion: v1
kind: Pod
metadata:
    name: myapp-pod
spec:
    containers:
        - name: data-processor
          image: data-processor
    affinity:
        nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorsTerms:
                - matchExpressions:
                  - key: size
                    operator: In #NotIn, Exists,...
                    values:
                    - Large

If node affinity does not match any of the rules:
Node affinity types:
1. requiredDuringSchedulingIgnoredDuringExecution: Pod will not be scheduled if rules do not match (Pods remain in pending state), but pods already running are ignored (irrespective of the rules).
2. preferredDuringSchedulingIgnoredDuringExecution: Pod will be scheduled in available node if rules do not match, and pods already running are ignored (irrespective of the rules).
3. requiredDuringSchedulingRequiredDuringExecution: Pod will not be scheduled if rules do not match (Pods remain in pending state), and pods already running are evicted if rules do not match.

Node Affinity vs Taints and Toleration

Lets' take a use case
1. There are 3 nodes: Red, blue and green. There are other nodes as well.
2. There are 3 pods: Red, blue and green. There are other pods as well.
3. Our aim is to put red pod in red node, green pod in green node and blue pod in blue node.
4. We also do not want any other pods to be placed in our (red, green and blue) nodes.
5. We also do not want our pods to be placed on other nodes.
How to achieve this:
1. Lets' try with Taints and Toleration first
  1. We apply taints red, blue and green to nodes.
  2. Then we apply tolerations red, blue and green to pods.
  3. This will help in placing pods with appropriate tolerance end up in corresponding tainted node but this does nt guarantee pod ending up in nodes that do not have taints.
2. Lets' try with Node Affinity
  1. We apply key-value pair labels on nodes.
  2. We then configure nodes with appropriate affinity.
  3. This will help us in placing pods in appropriate nodes but other pods also might end up in our nodes.
So a combination of both Taints and Toleration and node affinity is used.

Multi Container Pods

Microservices enable us to develop small, independent, reusable code.
Also, it helps us in scaling them.
However, at times two services are required to work together such as a web server and a log agent.
We want a web server and a log agent paired together, we do not want to merge them and bloat the code though.
So we need multi-container pods that share same lifecycle, network space and storage volumes.
An example of multi-container setup looks something like below:

apiVersion: v1
kind: Pod
metadata:
    name: simple-webapp
    labels:
        name: simple-webapp
spec:
    containers:
    - name: simple-webapp
      image: simple-webapp
      ports:
        - containerPort: 8080
    - name: log-agent
      image: log-agent

Common design patterns:
1. SIDECAR: we can run a logging agent along with the main app that will push logs on to a centralized logs-storage
2. ADAPTER: sometimes each application produces different format of logs and hence we need to format them before pushing them to centralized system
3. AMBASSADOR: very often, it is required to connect to different databases based on env. So based on the env we connect to that DB instance. This logic can be extracted out to an ambassador container which can act as a proxy.

Readiness and Liveness probe

A pod has a pod status.
The pod status states where is the pod in its lifecycle.
If pod is first created, it is in pending state. This is when the scheduler tries to figure out where to place the pod.
If scheduler cannot find a node to place the pod, then it remains in pending state.
Once the pod is scheduled, it goes into containercreating status, it is when the image is pulled and containers are created.
Once all the containers in the pod starts, pod status changes to running state.
The pod status remains in running state, unless program in the container is completed or the pod is terminated.
So complete and terminating are the other pod statuses.
Pod conditions
1. PodScheduled
2. Initialized
3. ContainersReady
4. Ready - indicate app inside the pod is running and ready to accept requests
Container could be running various apps within them
1. A Simple script performing a job, a db service, or a large web server serving end users.
2. The script may take few milliseconds to get ready
3. The db service may tale few milliseconds to connect to db and run migration scripts
4. The webserver might require some seconds to powerup before serving requests
5. So the apps are not yet ready for those milliseconds to serve any requests
W/o readiness probe, the pod continues to indicate being ready even though the underlying containers are powering up
So readiness probes are important to let k8s know of the actual state of the containers
If Pod is not ready k8s service will not divert request on to it because k8s service relies on pod's ready state to route traffic
As developers, we know that when exactly the app is ready to serve requests
So we need a way to tie up the actual app's ready state with k8s status indicating ready or not
There are a few ways to do so:
1. HTTP test: /api/ready is responding with correct status code or not
2. TCP test: TCP socket is up or not
3. exec command: if command gets executed successfully or not
Example of HTTP test readiness probe:

apiVersion: v1
kind: Pod
metadata:
    name: simple-webapp
    labels:
        name: simple-webapp
spec:
    conatiners:
    - name: simple-webapp
      image: simple-webapp
      ports:
        - containerPort: 8080
      readinessProbe:
        httpGet: 
            path: /api/ready
            port: 8080

Example of TCP test:

readinessProbe:
    tcpSocket:
        port: 3306

Example of Exec Command test:

readinessProbe:
    exec:
        command:
            - cat
            - /app/is_ready

We can add additional delay to the probe considering that app might take a few more time to start and hence requires readiness probe to be tested after that time. This can be achieved by 'initialDelaySeconds':

readinessProbe:
    httpGet: 
        path: /api/ready
        port: 8080
    initialDelaySeconds: 10

If we wish to run the probe periodically and change the state of the container based on it. We cam achieve it by 'periodSeconds':

readinessProbe:
    httpGet: 
        path: /api/ready
        port: 8080
    initialDelaySeconds: 10
    periodSeconds: 5

By default, if app is not ready after 3 attempts, the probe will stop and pod will not be sent request to. But we can configure the number of fail attempts by 'failureThreshold':

readinessProbe:
    httpGet: 
        path: /api/ready
        port: 8080
    initialDelaySeconds: 10
    periodSeconds: 5
    failureThreshold: 8

Liveness probe is very much similar as in readiness probe. But in this case the pod is killed upon failing and new instance of the pod is respawned.
The configurations stay similar to readiness probe
HTTP test

apiVersion: v1
kind: Pod
metadata:
    name: simple-webapp
    labels:
        name: simple-webapp
spec:
    conatiners:
    - name: simple-webapp
      image: simple-webapp
      ports:
        - containerPort: 8080
      livenessProbe:
        httpGet: 
            path: /api/healthy
            port: 8080
        initialDelaySeconds: 10
        periodSeconds: 5
        failureThreshold: 8

TCP test

livenessProbe:
    tcpSocket:
        port: 3306
    initialDelaySeconds: 10
    periodSeconds: 5
    failureThreshold: 8

Exec command test

livenessProbe:
    exec:
        command:
            - cat
            - /app/is_ready
    initialDelaySeconds: 10
    periodSeconds: 5
    failureThreshold: 8

Container logging

to view logs of a container: kubectl logs -f <pod-name> (f option is to stream logs live).
if multiple containers are running in a single pod, it would ask for the container name, else it would fail: kubectl logs -f <pod-name> <container-name>.

Monitoring

What to Monitor:
1. Count of nodes in cluster
2. Healthy nodes count
3. Performance metrics: CPU usage, memory, n/w and disk utilization
4. Pod level metrics: number of them and performance metrics of each pod
Tools to integrate with k8s:
1. Metrics server
2. Prometheus
3. Elastic stack
4. Datadog
5. Dynatrace
Heapster - Original project to enable monitoring and analytics on k8s objects - deprecated
Metrics server
1. A trimmed down version of it
2. 1 Metrics server per cluster
3. Gets metrics from each node, pods, aggregates them and stores them
4. In memory monitory solution - no historical data
Kubelet runs on each node
1. it has a sub-component called cAdvisor
2. cAdvisor is responsible for retrieving performance metrics and put them to kubelet API
minikube enable addons metrics-server
git clone https://github.com/kubernetes-incubator/metrics-server.git - download the deployment binaries
1. kubectl create -f deploy/1.8+/ - creates set of pods, services and roles to enable metric server to poll for performance metrics of cluster
kubectl top node - to view the metrics of nodes
kubectl top pod - to view the metrics of pods

Labels, Selectors and Annotations

Ability to group kubernetes objects together and filter them based on needs is achieved using labels and selectors.
Labels are basically properties attached to each item.
Selectors help us filter kubernetes objects based on the attached properties (labels).
An example of labels and selectors would be:
1. When we create pods, we attach some labels.
2. And then when we create service to redirect requests to the pods, we create selectors and matchLabels to link service and pods
An example of a pod with labels is as below (here app: mock-app and function: backend are the labels):

apiVersion: v1
kind: Pod
metadata:
    name: simple-webapp
    labels:
        app: mock-app
        function: backend
spec:
    containers:
    - name: simple-webapp
      image: simple-webapp
      ports:
        - containerPort: 8080

After creating a pod with certain labels, we can filter it by: kubectl get pods --selector app=mock-app
An example of a service using selector to attach itself to pods (here app: mock-app and function: backend under spec/selector are the selectors)

apiVersion: v1
kind: Service
metadata:
    name: my-service
spec:
    selector:
        app: mock-app
        function: backend
    ports:
    - protocol: TCP
      port: 80
      targetPort: 9376

Having one selector is enough unless further nested filtering is required.
Annotations are used to record other details for informatory purposes. For details like build information, name or contact.

Rolling updates & rollbacks in Deployments

When we first create a deployment, it creates a rollout.
A new rollout creates a new revision.
In the future, when a new deployment (of same name) is triggered, a new rollout is created with increased version.
This helps us keeps track of changes made and enables us to rollback to previous version deployment.
To check status of rollout: kubectl rollout status deployment myapp-deployment
To check the history, revision and change-cause of rollout: kubectl rollout history deployment myapp-deployment
Deployment strategies:
1. Recreate:
  1. Suppose there are 5 instances of your app running
  2. When deploying a new version, we can destroy the 5 instances of older version and then deploy 5 instances of newer version
  3. The issue is there will be a downtime
  4. This is majorly done during major changes, breaking changes or when backward compatibility is not possible
  5. This is not default strategy
2. Rolling update
  1. In this strategy, we do not drop all the already running instances
  2. We drop instances by a certain percentage at a time and simultaneously spawn equal percentage of newer version pods.
  3. This upgrade is default strategy
  4. This has no downtime
For example
1. suppose there is an already existing deployment running 3 replicas of a pod with image nginx:1.7.0
2. now you wish to change the version of the image
3. this can be done by changing the version of the image in deployment file and running the command: kubectl appy -f <deployment file path>
4. this can also be done by: kubectl set image deployment myapp-deployment nginx=nginx.1.7.1
5. but if we do step #4, then there will be inconsistency in the actual file and the deployment definition in the cluster
run command: kubectl describe deployment <deployment name> to see the details of deployment, and notice the difference in both strategies
How upgrades work under the hood:
1. When a deployment is applied, it creates a replica-set and spins up pods with number of instances as mentioned in the deployment configuration
2. Then, when the deployment is re-applied with changes, it creates another replica-set and spins up pods with number of instances as mentioned in the deployment configuration and drops pod simultaneously from older replica-set.
3. But the thing to note is, the older replica-set still exists, which will be used for rollback if required
To rollback a deployment: kubeclt rollout undo deployment myapp-deployment - this will also run in the similar sequence as it happened while upgrade
After rollback the new replicaset still persists.
Remember in order to see change cause of historical revisions, we need to add --record flag while editing/applying deployments (needs to be set once per deployment)
When we do a rollback, the revision to which the rollback happens is removed from history and a new entry is made in the history instead.
If any error occurs during upgrade, kubernetes will proactively stop the upgrade and stop dropping previously running instances

Jobs

There are broadly 2 types of workloads:
1. Longer running time workloads: DB, Services, Web-servers, etc. Manually stopped if required.
2. Short runtime workloads: Batch processing, analytics, reporting, etc. Stops after finishing the task.
Let us create a pod definition file (simple-sum.yaml) to do some computational work

apiVersion: v1
kind: Pod
metadata:
   name: math-pod
spec:
   containers:
   - name: math-add
     image: ubuntu
     command: ['expr', '3', '+', '2']

now run command: kubectl apply -f simple-sum.yaml
status of pod (kubectl get pods) changes from creating -> running -> completed
But the problem is, as soon as the pod goes to completed state (since it has done with the operation), kubernetes restarts it and the cycle continues
Because kubernetes wants to keep pods running forever by default. There is a property called restartPolicy which is set to Always by default
We can override this property to either 'Never' or 'OnFailure'.
We want to make sure that all pods doing some computational work get created and do a certain job successfully and then are dropped. For this we require a manager which is also known as a Job.
ReplicaSet ensure running pods forever while Job ensures creating pods and doing assigned tasks successfully
An example of Job

apiVersion: batch/v1
kind: Job
metadata:
   name: math-add-job
spec:
   completions: 3
   parallelism: 3
   template: 
      spec:
         containers:
         - name: math-add
           image: ubuntu
           command: ['expr', '3', '+', '2']
         restartPolicy: Never

'completions' is analogous to 'replicas'.
If one of the pod fails, the job tries spin up pods until required completions are not meant
'parallelism' forces kubernetes to create pods for a job at the same time

CronJobs

A Job that can be scheduled is called CronJob
Template of CronJob is as follows:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
   name: reporting-cron-job
spec:
   schedule: "*/1 * * * *"
   jobTemplate: 
      spec:
         completions: 3
         parallelism: 3
         template:
            spec:
               containers:
               - name: math-add
                 image: ubuntu
                 command: ['expr', '3', '+', '2']
               restartPolicy: Never

schedule “30 21 * * *” implies that this ob will run at 2130 hrs everyday.
One thing to notice is that it has 3 ’spec’s. 1st spec is for CronJob itself. 2nd spec is for Job (because CronJob is an abstraction over Job). 3rd spec is for the underlying container.

Services

Services enable communication between components within and outside the applications.
Services enable applications to connect with other resources like: db pods, other services (frontend/backend)
They kind of enable loose coupling b/w microservices in our application setup.
Lets understand a default setup w/o services:
1. Our pod (lets say it is a FE app that says Hello World!) is within a K8s Node.
2. Node IP is 192.168.1.2, Node uses the same n/w as our system.
3. So our system IP will also fall in the same IP range: 192.168.1.10
4. But the Pod has different n/w (say 10.244.0.0).
5. So the Pod IP can be 10.244.0.2.
6. In order to access th application which runs in the Pod, we have to ssh into the Node and then do a curl http://10.244.0.2
7. But this is inside the K8s cluster, we need to be able to access it from our system by doing curl http://192.168.1.2.
8. So we need something in the middle of Node and Pod to redirect the request.
This is where K8s services come into play.
The K8s services are like any other K8s objects, one of the use case of services is to listen to the Node port and forward the request to a target pod port.
This type of service is called a NodePort service as this service listens to a pod of Node and forwards to a pod port.
ClusterIP: This type of service create a virtual IP inside the cluster to enable communication b/w sets of services within the cluster itself.
LoadBalancer: This type of service distributes the load across the web servers that it caters to.
A template of service looks like this:

apiVersion: v1
kind: Service
metadata:
    name: my-service
spec:
    type: NodePort
    ports:
        - targetPort: 80
          port: 80
          nodePort: 30008
    selector:
        app: myapp
        type: frontend

If 'port' under spec/ports is not defined, then it is defaulted to 'targetPort' under spec/ports.
If 'nodePort' under spec/ports is not defined, then it is defaulted to anything in the range: 30000 to 32767
The selector is used to link services to the pods.
The key-value pairs under selectors should match the labels of the pod.
To view services: kubectl get services.
No we can use the port '30008' to access the app: curl http://192.169.1.10:30008.
By default, the algorithm used is Algorithm:Random and SessionAffinity: Yes.
If pods are distributed across Nodes, the K8s automatically creates service that spans nodes and maps target port to same nodePort on all nodes.
Another use case for internal communication:
1. Suppose we have multiple frontend pods
2. We also have multiple backend pods
3. We have multiple db pods too
4. frontend pods needs to interact with backend pods and in turn backend pods need to interact with db pods.
5. Now each frontend pod do not know exactly which backend pod to connect to similar issue also exist b/w backend and db pods.
6. Again even if we somehow map ips, the pods are ephemeral and the ip of pods keep changing.
7. Hence, clusterIp services provide us a single interface that group pods together to access the pods of similar types.
A template of clusterIP service looks like this:

apiVersion: v1
kind: Service
metadata:
    name: backend-service
spec:
    type: ClusterIP
    ports:
        - targetPort: 80
          port: 80
    selector:
        app: myapp
        type: backend

Ingress

Take a look at why an ingress is required . Then come back here to see some configuration details.
Ingress Controller:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
        name: nginx-ingress-controller
    spec:
        replicas: 1
        selector:
            matchLabels:
                name: nginx-ingress
        template:
            metadata:
                labels:
                    name: nginx-ingress
            spec:
                containers:
                    - name: nginx-ingress-controller
                      image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0
                args:
                    - /nginx-ingress-controller
                    - --configmap=$(POD_NAMESPACE)/nginx-configuration
                env:
                    - name: POD_NAME
                      valueFrom:
                        fieldRef:
                            fieldPath: metadata.name
                    - name: POD_NAMESPACE
                      valueFrom:
                        fieldRef:
                            fieldPath: metadata.namespace
                ports:
                    - name: http
                      containerPort: 80
                    - name: https
                      containerPort: 443

apiVersion: v1
kind: ConfigMap
metadata:
    name: nginx-configuration

apiVersion: v1
kind: Service
metadata:
    name: nginx-ingress
spec:
    type: NodePort
    ports:
    - port: 80
      targetPort: 80
      protocol: TCP
      name: http
    - port: 443
      targetPort: 443
      protocol: TCP
      name: https
    selector:
      name: nginx-ingress

apiVersion: v1
kind: ServiceAccount
metadata: 
    name: nginx-ingress-serviceaccount

There are four K8s objects involved in setting an ingress controller:
1. Deployment.
2. Service: To expose the Deployment.
3. Config Map: to feed nginx configuration data like sslprotocol, logpath,
4. ServiceAccount: To apply Ingress resource configurations. The service accounts must have right set of roles, clusterroles and rolebindings configured.
Ingress Resource:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
    name: ingress-service
spec:
    rules:
    - host: my-apparelstore.com
      http:
        paths:
        - path: /app1
          backend:
            serviceName: app1
            servicePort: 8080
    - host: my-apparelstore.com
      http:
        paths: 
        - path: /app2
          backend:
            serviceName: app2
            servicePort: 8080

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
ckad_exercises		ckad_exercises
custom-controller		custom-controller
k8s_ilt_rotc		k8s_ilt_rotc
k8s_rotc		k8s_rotc
kodekloud		kodekloud
.gitignore		.gitignore
.talismanrc		.talismanrc
Readme.md		Readme.md

HiteshRepo/Kubernetes-101

Folders and files

Latest commit

History

Repository files navigation

High Level Arch

Kubectl

k8 commands

Formatting o/p

Minikube Objects

Pods

Replica sets

Health check probes for containers:

Supported check types

Services

Deployments

Volumes

PV and PVC

Storage Class

Other sources

Namespaces

Resource Quota

Imperative commands

Commands in Docker/Kubernetes

Editing properties of a running Pod

Environment variables

ConfigMaps

Secrets

Docker Security

Security contexts

Service Accounts

Resource Requirements

Taints and Tolerations

Node Selectors

Node Affinity

Node Affinity vs Taints and Toleration

Multi Container Pods

Readiness and Liveness probe

Container logging

Monitoring

Labels, Selectors and Annotations

Rolling updates & rollbacks in Deployments

Jobs

CronJobs

Services

Ingress

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages