👷 Update structure of infra and add README explaining the components …

…and deploy process (#148)
fastapi · Jul 15, 2024 · 4c20b9a · 4c20b9a
1 parent 0b03a97
commit 4c20b9a
Show file tree

Hide file tree

Showing 11 changed files with 167 additions and 3 deletions.
diff --git a/infra/README.md b/infra/README.md
@@ -0,0 +1,164 @@
+# FastAPI Cloud Infra
+
+This is the infra for FastAPI Cloud. This document contains a short tour.
+
+It will evolve over time. There's a chance that when you read this some things have changed and we forgot to update the README. Sorry. 🙈
+
+## Deploy GitHub Action
+
+`.github/workflows/deploy.yml` is the GitHub Action that starts the deployment.
+
+It will run on **staging** when merging to `master` and on **production** when making a release. There are **two AWS accounts**, one for staging and one for production.
+
+It uses GitHub Action Secrets to configure credentials for Pulumi and AWS (staging or production).
+
+Then it runs the Pulumi deployment (covered below).
+
+It will also run on "workflow dispatch", which means, run manually from the GitHub Actions UI. When it's run that way, there's an option to run it with debug enabled. When that is done, at the end, it starts a Tmate session that can be used to manually deploy the rest of the Kubernetes components (covered below).
+
+## Pulumi
+
+`infra/__main__.py` has the main Pulumi code.
+
+### AWS Resources
+
+It will create a VPC with the network tags needed by the load balancers used with Kubernetes.
+
+Then it creates an EKS cluster with that VPC.
+
+The infra will need one or more AWS Load Balancers later. Those will be created by another component **from inside of Kubernetes** called "AWS Load Balancer Controller" that we'll install later. That component lives in the Kubernetes side, not the AWS side, but controls things from the AWS side (load balancers). That AWS Load Balancer Controller needs a Kubernetes "Service Account" that is connected to an AWS IAM Role with an AWS IAM Policy that allows it to manage the Load Balancers.
+
+The Pulumi code creates that IAM Policy, the IAM Role, connects them, and makes them accessible via a Kubernetes Service Account name (that we haven't created yet, but we will, using that same name).
+
+Up to this point, Pulumi has been creating and managing AWS resources only.
+
+### Kubernetes Resources
+
+Then, Pulumi uses the EKS Kubernetes created as a Pulumi "provider" that is used for the next parts involving Kubernetes.
+
+It creates a Kubernetes Service Account with the same name we defined above, and it attaches the AWS IAM Role we created above.
+
+This part of the Kubernetes setup is done in Pulumi because it's strongly connected to the AWS resources just created, so it's easier to do that in the Pulumi side.
+
+Next we will install some Helm Charts. And although Pulumi has (in theory) some support for Helm Charts with two different resources, I couldn't make it work yet. So, for now, we won't use Pulumi for the rest of the Kubernetes setup.
+
+### Pulumi Export
+
+At the end, Pulumi exports several values from the outcome. Some can be useful during debugging, one in particular is needed for the rest of the setup, the Kubernetes "Kubeconfig".
+
+## Tmate
+
+When the GitHub Action is run with debug enabled, after the Pulumi code is run, it starts a Tmate session.
+
+It is shown in the logs with a line that says something like:
+
+```
+SSH: ssh [email protected]
+or: ssh -i <path-to-private-SSH-key> [email protected]
+```
+
+If you have SSH keys configured in your GitHub account, it will use them to enable you to SSH into the GitHub Action runner machine by executing that command:
+
+```bash
+ssh [email protected]
+```
+
+It will only allow you to SSH into the machine using your SSH key, just by typing that same command.
+
+Once you are in the machine, you can continue the deployment manually.
+
+You will have some environment variables available, defined in the GitHub Action. For example, including the Kubeconfig that will be needed to interact with the Kubernetes cluster.
+
+You will be shown some instructions, after reading, you can type `q` to have the remote SSH session. Once you are done, you can create a file called `continue` in the main directory, e.g.:
+
+```bash
+touch continue
+```
+
+Tmate will read it, terminate the SSH session and continue the GitHub Action execution (just finishing it).
+
+In the Tmate session, you can continue the deployment of Kubernetes resources manually.
+
+This step was done once by Sebastián, it (normally) doesn't need to be done again, unless we are updating versions or similar.
+
+## Kubernetes Deployment
+
+Once you are in the Tmate SSH session, you can deploy the rest of the Kubernetes components.
+
+### Helm Charts
+
+Start with `infra/deploy-helm.sh`:
+
+```bash
+bash ./infra/deploy-helm.sh
+```
+
+This script will:
+
+* Set up the `kubeconfig.json` so that `kubectl` works out of the box. This uses the Pulumi Kubeconfig output. The authentication and login works with the AWS credentials environment variables set up in the GitHub Action.
+* Use Helm to install the following Helm Charts:
+    * AWS Load Balancer Controller: this creates the AWS Load Balancers allow the external world to communicate with our Kubernetes cluster. This uses the Kubernetes Service Account we created with Pulumi.
+    * Cert Manager: this will manage the TLS (HTTPS) certificates obtained from Let's Encrypt. We'll configure more things later for it.
+    * Ingress Nginx Controller: this manages ingress resources we deploy manually, with Kubernetes Deployments (currently none), in contrast to doing that with Knative. This uses customization values from `infra/helm-values/ingress-nginx-external-values.yaml` to enable it to use the AWS Load Balancer Controller.
+* At the end it obtains and shows the DNS name of the load balancer.
+
+We need to update the DNS records in Cloudflare to use it (with a `CNAME`). This is normally done once (by Sebastián) and left configured. Later we might want to set up ExternalDNS to do it automatically.
+
+* `fastapicloud.com` points to production at the Ingress Nginx. This would be used by the main frontend and our admin.
+* `fastapicloud.work` points at staging at the Ingress Nginx.
+
+* `fastapicloud.dev` is configured later, for production for Knative.
+* `fastapicloud.club` is configured later, for staging for Knative.
+
+### Kubernetes Manifests
+
+In the same Tmate SSH session, after having the Helm Charts installed, we will install other non-Helm Kubernetes resources.
+
+Continue with: `infra/deploy-kubectl.sh`:
+
+```bash
+bash ./infra/deploy-kubectl.sh
+```
+
+This script will:
+
+* Set up the Kubeconfig again (in case it was removed or not set yet).
+* Add the Cloudflare token, so that the Cert Manager can create the DNS records needed for TLS (HTTPS) certificates from Let's Encrypt.
+* Add the Cert Manager resources: the staging and production issuers and the wildcard certificates for the Ingress Nginx controller and Knative.
+* Install the Knative CRDs (Custom Resource Definitions for Kubernetes).
+* Install Knative Serving and Kourier as the Knative network layer. This uses Kustomize with the files and directories in `infra/k8s/knative` (described below).
+* Show the DNS of the Knative AWS Load Balancer to update the DNS records in Cloudflare (with a `CNAME`).
+
+We need to update the DNS records in Cloudflare manually with this DNS name.
+
+* `fastapicloud.dev` points to production at the Knative ingress with Kourier, people's apps would have a subdomain here.
+* `fastapicloud.club` points at staging at the Knative ingress with Kourier.
+
+`fastapicloud.com` and `fastapicloud.work` were configured previously.
+
+### Knative Kustomize
+
+Knative uses Kourier for the network. Kourier is in charge of handling the network traffic from the outside world into Knative, through Kubernetes.
+
+Kourier uses the AWS Load Balancer Controller to create an AWS Load Balancer.
+
+To explore the Knative Kustomize installation start with `infra/k8s/knative/base/kustomization.yaml`.
+
+It uses the Knative and Kourier releases from GitHub (as described in the Knative and Kourier installation instructions).
+
+Then it uses Kustomize to patch (update) several configs.
+
+* It patches Knative to make it use Kourier.
+* It patches Knative to update the default domain name for apps, to make them top-level sub-domains of `fastapicloud.dev` or `fastapicloud.club`.
+  * That way the final customer app is something like `my-awesome-app-12acs.fastapicloud.dev`, otherwise, it would include the Kubernetes namespace by default, like `my-awesome-app-12acs.team-avengers.fastapicloud.dev`.
+  * But second level sub-domains are not allowed by Let's Encrypt for wildcard certificates, only one level of wildcard certs. Having a wildcard certificate we can use allows us to start serving an app right after it's deployed, without waiting for the potentially slow dance to acquire certs from Let's Encrypt.
+* It patches Kourier to make it use the AWS Load Balancer Controller so that it creates an AWS Load Balancer to communicate with the external world.
+* It patches Kourier to use the custom certificate for the Knative domain created before.
+
+### Knative Kustomize Overlay
+
+The previous file is not really used directly.
+
+Instead, one of two Kustomize "overlays" is used, one for production and one for staging.
+
+These overlays extend the `base` Kustomize configuration and add the Knative domain used for staging (`fastapicloud.club`) or production (`fastapicloud.dev`).
diff --git a/infra/deploy-kubectl.sh b/infra/deploy-kubectl.sh
@@ -18,8 +18,8 @@ KNATIVE_VERSION="1.14.1"
 echo "Add Cloudflare token secret"
 envsubst < k8s/cert-manager/cloudflare-token.yaml | kubectl apply -f -
 
-echo "Add Cert Manager issuer prod"
-kubectl apply -f k8s/cert-manager/issuer-prod.yaml
+echo "Add Cert Manager issuer production"
+kubectl apply -f k8s/cert-manager/issuer-production.yaml
 
 echo "Add Cert Manager issuer staging"
 kubectl apply -f k8s/cert-manager/issuer-staging.yaml
@@ -35,7 +35,7 @@ KNATIVE_TAG="knative-v${KNATIVE_VERSION}"
 kubectl apply -f https://github.com/knative/serving/releases/download/${KNATIVE_TAG}/serving-crds.yaml
 
 echo "Install Knative Serving and Kourier"
-kubectl apply -k "k8s/overlays/${DEPLOY_ENVIRONMENT}"
+kubectl apply -k "k8s/knative/overlays/${DEPLOY_ENVIRONMENT}"
 
 echo "Add DNS record for Knative:"
 

diff --git a/infra/k8s/base/kustomization.yaml → infra/k8s/knative/base/kustomization.yaml b/infra/k8s/base/kustomization.yaml → infra/k8s/knative/base/kustomization.yaml
diff --git a/infra/k8s/base/patch-domain-template.yaml → ...s/knative/base/patch-domain-template.yaml b/infra/k8s/base/patch-domain-template.yaml → ...s/knative/base/patch-domain-template.yaml
diff --git a/.../k8s/base/patch-kourier-custom-certs.yaml → ...tive/base/patch-kourier-custom-certs.yaml b/.../k8s/base/patch-kourier-custom-certs.yaml → ...tive/base/patch-kourier-custom-certs.yaml
diff --git a/.../patch-kourier-service-load-balancer.yaml → .../patch-kourier-service-load-balancer.yaml b/.../patch-kourier-service-load-balancer.yaml → .../patch-kourier-service-load-balancer.yaml
diff --git a/infra/k8s/base/patch-use-kourier.yaml → ...a/k8s/knative/base/patch-use-kourier.yaml b/infra/k8s/base/patch-use-kourier.yaml → ...a/k8s/knative/base/patch-use-kourier.yaml
diff --git a/...8s/overlays/production/kustomization.yaml → ...ve/overlays/production/kustomization.yaml b/...8s/overlays/production/kustomization.yaml → ...ve/overlays/production/kustomization.yaml
diff --git a/...lays/production/patch-knative-domain.yaml → ...lays/production/patch-knative-domain.yaml b/...lays/production/patch-knative-domain.yaml → ...lays/production/patch-knative-domain.yaml
diff --git a/...a/k8s/overlays/staging/kustomization.yaml → ...ative/overlays/staging/kustomization.yaml b/...a/k8s/overlays/staging/kustomization.yaml → ...ative/overlays/staging/kustomization.yaml
diff --git a/...verlays/staging/patch-knative-domain.yaml → ...verlays/staging/patch-knative-domain.yaml b/...verlays/staging/patch-knative-domain.yaml → ...verlays/staging/patch-knative-domain.yaml