gitops-k8s

This document aims to provide an opinionated working solution leveraging Kubernetes and proven GitOps techniques to have a resilient, composable and scalable Kubernetes platform.

Nothing outlined below is new or innovative, but it should be at least a good starting point to have a cluster up and running pretty quickly and give you a chance to remain focused and try out new ideas.

Feedback and help are always welcome!

Introduction
Argo CD
Applications

Introduction

TL;DR

Kubernetes is a declarative system
Git can be used to describe infrastructure and applications
Git repository is the source of truth and represents a cluster
GitOps is a way to do Continuous Delivery and operate Kubernetes via Git pull requests
GitOps empowers developers to do operations
CI pipelines should only run builds, tests and publish images
In a pull-based approach, an operator deploys new images from inside of the cluster
You can only observe the actual state of the cluster and react when it diverges from the desired state

Imperative vs Declarative

In an imperative system, the user knows the desired state, determines the sequence of commands to transition the system to the desired state and supplies a representation of the commands to the system.

By contrast, in a declarative system, the user knows the desired state, supplies a representation of the desired state to the system, then the system reads the current state and determines the sequence of commands to transition the system to the desired state.

Declarative systems have the distinct advantage of being able to react to unintended state changes without further supervision. In the event of an unintended state change leading to a state drift, the system may autonomously determine and apply the set of mitigating actions leading to a state match. This process is called a control loop, a popular choice for the implementation of controllers.

What is GitOps?

GitOps is the art and science of using Git pull requests to manage infrastructure provisioning and software deployment.

The concept of GitOps originated at Weaveworks, whose developers described how they use Git to create a single source of truth. Kubernetes is a declarative system and by using declarative tools, the entire set of configuration files can be version controlled in Git.

More generally, GitOps is a way to do Continuous Delivery and operate Kubernetes via Git.

Push vs Pull

In a push-based pipeline, the CI system runs build and tests, followed by a deployment directly to Kubernetes. This is an anti-pattern. CI server is not an orchestration tool. You need something that continually attempts to make progress until there are no more diffs because CI fails when it encounters a difference and then you could end up being in a partial and unknown state.

In a pull-based pipeline, a Kubernetes operator deploys new images from inside of the cluster. The operator notices when a new image has been pushed to the registry. Convergence of the cluster state is then triggered and the new image is pulled from the registry, the manifest is automatically updated and the new image is deployed to the cluster.

A CI pipeline should be used to merge and integrate updates with master, while with GitOps you should rely on Kubernetes or the cluster to internally manage deployments based on those master updates.

You could potentially have multiple cluster pointing to the same GitOps repository, but you won't have a centralized view of them, all the clusters will be independent.

Observability

Git provides a source of truth for the desired state of the system and observability provides a source of truth for the actual state of the running system.

You cannot say what actual state is in the cluster. You can only observe it. This is why diffs are so important.

A system is observable if developers can understand its current state from the outside. Observability is a property of systems like Availability and Scalability. Monitoring, Tracing and Logging are techniques for baseline observations.

Observability is a source of truth for the actual running state of the system right now. You observe the running system in order to understand and control it. Observed state must be compared with the desired state in Git and usually you want to monitor and alert when the system diverge from the desired state.

Resources

Imperative vs Declarative
GitOps - Operations by Pull Request (Part 1)
The GitOps Pipeline (Part 2)
GitOps - Observability (Part 3)
GitOps - Application Delivery Compliance and Secure CICD (Part 4)
Making the Leap from Continuous Integration to Continuous Delivery (Whitepaper)
What is GitOps really?
Why is a PULL vs a PUSH pipeline important?
Kubernetes anti-patterns: Let's do GitOps, not CIOps!
GitOps: High velocity CICD for Kubernetes
GitOps - What you need to know
GitOps for Kubernetes - A DevOps Iteration Focused on Declarative Infrastructure
Automating continuous delivery with Kubernetes, Google Cloud and Git
Continuous Delivery the Hard Way

Argo CD

Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes. It automates the deployment of the desired application states in the specified target environments. In this project Kubernetes manifests are specified as helm charts.

This guide will explain how to setup in few steps the whole infrastructure via GitOps with Argo CD. Note that it's not tightly coupled to any specific vendor and you should be able to easily run it on DigitalOcean, EKS or GKE for example.

Most of the steps have been kept manual on purpose, but they should be automated in a production enviroment.

Prerequisites

Setup required tools
Create a Kubernetes cluster locally or with your favourite provider

Download the cluster configs and test connection

export KUBECONFIG=~/.kube/<CLUSTER_NAME>-kubeconfig.yaml
kubectl get nodes

Bootstrap

TODO Setup secrets (optional)
Setup Argo CD and all the applications
```
make bootstrap
```

Access Argo CD

# username: admin
# password: (autogenerated) the pod name of the Argo CD API server
kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-server -o name | cut -d'/' -f 2

# port forward the service
kubectl port-forward service/argocd-server -n argocd 8080:443

# from the UI
[open|xdg-open] https://localhost:8080
# from the CLI
argocd login localhost:8080 --username admin

You might need to Allow invalid certificates for resources loaded from localhost on Chrome enabling the flag chrome://flags/#allow-insecure-localhost to access it

First time only sync all the OutOfSync applications

manually
TODO with a cronjob (optional)
verify guestbook example

# port forward the service
kubectl port-forward service/guestbook-ui -n guestbook 8081:80
# open browser
[open|xdg-open] http://localhost:8081

This is how it should looks like on the UI

Resources

Introducing Argo CD
Argo CD - Declarative Continuous Delivery for Kubernetes

Applications

Applications in this repository are defined in the parent applications chart and are logically split into folders which represent Kubernetes namespaces.

ambassador namespace is dedicated for Ambassador, a lightweight Kubernetes-native microservices API gateway built on the Envoy Proxy which is mainly used for routing and supports canary deployments, traffic shadowing, rate limiting, authentication and more

# retrieve EXTERNAL-IP
kubectl get service ambassador -n ambassador
[open|xdg-open] http://<EXTERNAL-IP>/ambassador
[open|xdg-open] http://<EXTERNAL-IP>/httpbin/
[open|xdg-open] http://<EXTERNAL-IP>/guestbook

# debug ambassador
kubectl port-forward service/ambassador-admins 8877 -n ambassador
[open|xdg-open] http://localhost:8877/ambassador/v0/diag

Ambassador is disabled by default because the recommended way is to use host-based routing which requires a domain

For a working example on DigitalOcean using external-dns you can have a look at niqdev/do-k8s

TODO Service mesh

Istio
A Crash Course For Running Istio

observe namespace is dedicated for observability and in the specific Monitoring, Alerting and Logging

prometheus-operator provides monitoring and alerting managing Prometheus, Alertmanager and Grafana

# prometheus
kubectl port-forward service/prometheus-operator-prometheus 8001:9090 -n observe

# alertmanager
kubectl port-forward service/prometheus-operator-alertmanager 8002:9093 -n observe

# grafana
# username: admin
# password: prom-operator
kubectl port-forward service/prometheus-operator-grafana 8003:80 -n observe

kube-ops-view provides a read-only system dashboard for multiple k8s clusters
```
kubectl port-forward service/kube-ops-view -n observe 8004:80
```

EFK stack for logging

elasticsearch is a distributed, RESTful search and analytics engine and it's is used for log storage
```
kubectl port-forward service/elasticsearch-master 9200:9200 -n observe
```

cerebro is an Elasticsearch web admin tool

kubectl port-forward service/cerebro 9000:80 -n observe

kibana visualize and query the log data stored in an Elasticsearch index
```
kubectl port-forward service/kibana-kibana 9001:5601 -n observe
```
fluentbit is a fast and lightweight Log Processor and Forwarder
elasticsearch-curator or curator helps to curate, or manage, Elasticsearch indices and snapshots

Resources

Prometheus
Prometheus Operator - Getting Started Guide
Grafana - Dashboards
Fluent Bit
Logging Best Practices for Kubernetes using Elasticsearch, Fluent Bit and Kibana
Exporting Kubernetes Logs to Elasticsearch Using Fluent Bit
Fluentd vs. Fluent Bit: Side by Side Comparison
Logging & Monitoring of Kubernetes Applications: Requirements & Recommended Toolset
Loki

kube-system namespace is reserved for Kubernete system applications

kubernetes-dashboard is a general purpose, web-based UI for Kubernetes clusters. It allows users to manage applications running in the cluster and troubleshoot them, as well as manage the cluster itself
```
kubectl port-forward service/kubernetes-dashboard -n kube-system 8000:443
```
metrics-server is an add-on which extends the metrics api group and enables the Kubernetes resource HorizontalPodAutoscaler
```
kubectl top node
kubectl top pod --all-namespaces
```
spotify-docker-gc performs garbage collection in the Kubernetes cluster and the default configurations have the gc running once a day which:
- removes containers that exited more than a hour ago
- removes images that don't belong to any container
- removes volumes that are not associated to any remaining container

TODO (not in order)

bump argocd to latest version
argocd: example secrets for private charts
argocd: override default admin.password
argocd-bootstrap: open source and explain solution of how to sync automatically first time with cronjob
expose argocd over http i.e. --insecure flag
configure TLS/cert and authentication on ambassador for all services
centralize auth on ambassador/istio
Jaeger tracing
kube-monkey or chaoskube
explain how to switch cluster via DNS
Kafka from public chart + JMX fix
stateless vs stateful: disaster recovery stratecy e.g S3 backup/restore
example with multiple providers: DigitalOcean, EKS, GKE
add prometheus adapter for custom metrics that can be used by the HorizontalPodAutoscaler
explain how to test a branch i.e. change target revision from the UI
TODO fix alertmanager: error: unrecognized log format "<nil>", try --help
add screenshots to readme for each app
explain how to add grafana dashboards with ConfigMap
add alerting example on Slack/PagerDuty
add example of prometheus ServiceMonitor + dashboard
explain how to init es index on kibana for logging + screenshot
add kubefwd to docs
argocd issue: Add support for secrets in Application parameters
argocd issue: Helm repository as first class Argo CD Application source

Provide feedback

Saved searches

Use saved searches to filter your results more quickly