Merge #51775 #52837

51775: cloud: add statefulset config for EKS multi-region r=taroface a=taroface This StatefulSet config is meant for use with the EKS multi-region docs (WIP). - Added an `initContainer` that determines the AZ of each pod. - Added `namespace` field to fill in with a region-appropriate namespace. - Modified `--join` and `--locality` in the start command for the user to customize with namespace/region names. I am unclear on whether the values here need to match the `namespace` defined for the cluster above. - Added the memory/CPU requests & limits block that is in the single-region configs (but not the current multi-region config). - Also added `dns-lb-eks.yaml` for creating an EKS network load balancer. - Also added `configmap.yaml` for modifying the Corefile. 52837: kvclient: don't spin in the DistSender trying the same replica over and over r=andreimatei a=andreimatei This patch addresses a scenario where a lease indicates a replica that, when contacted, claims to not have the lease and instead returns an older lease. In this scenario, the DistSender detects the fact that the node returned an old lease (which means that it's not aware of the new lease that it has acquired - for example because it hasn't applied it yet whereas other replicas have) and retries the same replica (with a backoff). Before this patch, the DistSender would retry the replica ad infinitum, hoping that it'll eventually become aware of its new lease. However, it's possible that the replica never finds out about this new lease (or, at least, not until the lease expires and a new leaseholder steps up). This could happen if the a replica acquires a lease but gets partitioned from all the other replicas before applying it. This patch puts a bound on the number of times the DistSender will retry the same replica in a row before moving on to others. Release note: None Co-authored-by: taroface <[email protected]> Co-authored-by: Andrei Matei <[email protected]>
cockroachdb · Aug 20, 2020 · 2a1e9ce · 2a1e9ce
3 parents 904b7cb + 4474d83 + 8d54bce
commit 2a1e9ce
Show file tree

Hide file tree

Showing 7 changed files with 555 additions and 4 deletions.
diff --git a/cloud/kubernetes/multiregion/README.md b/cloud/kubernetes/multiregion/README.md
@@ -1,8 +1,8 @@
-# Running CockroachDB across multiple Kubernetes clusters
+# Running CockroachDB across multiple Kubernetes clusters (GKE)
 
 The script and configuration files in this directory enable deploying
 CockroachDB across multiple Kubernetes clusters that are spread across different
-geographic regions. It deploys a CockroachDB
+geographic regions and hosted on [GKE](https://cloud.google.com/kubernetes-engine). It deploys a CockroachDB
 [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
 into each separate cluster, and links them together using DNS.
 

diff --git a/cloud/kubernetes/multiregion/eks/README.md b/cloud/kubernetes/multiregion/eks/README.md
@@ -0,0 +1,89 @@
+# Running CockroachDB across multiple Kubernetes clusters (EKS)
+
+The configuration files in this directory enable a multi-region CockroachDB deployment on [Amazon EKS](https://aws.amazon.com/eks/), using multiple Kubernetes clusters in different geographic regions. They are primarily intended for use with our [Orchestrate CockroachDB Across Multiple Kubernetes Clusters](https://www.cockroachlabs.com/docs/stable/orchestrate-cockroachdb-with-kubernetes-multi-cluster.html#eks) tutorial, but can be modified for use with any multi-region CockroachDB deployment hosted on EKS.
+
+Note that a successful multi-region deployment also requires configuring your EC2 network for inter-region traffic, which is covered fully in our tutorial.
+
+## Usage
+
+The below assumes you have created a Kubernetes cluster in each region in which you want to deploy CockroachDB.
+
+Each of the 3 configuration files must be applied separately to each Kubernetes cluster.
+
+### Create StatefulSets
+
+[`cockroachdb-statefulset-secure-eks.yaml`](https://github.com/cockroachdb/cockroach/cloud/kubernetes/multiregion/eks/cockroachdb-statefulset-secure-eks.yaml) creates a StatefulSet that runs 3 CockroachDB pods in a single region.
+
+Because the multi-region deployment requires deploying CockroachDB to a separate Kubernetes cluster in each region, you need to customize and apply a separate version of this file to each region.
+
+Use the `namespace` field to specify a namespace other than `default` in which to run the CockroachDB pods. This should correspond to the region in which the Kubernetes cluster is deployed (e.g., `us-east-1`). 
+
+```
+namespace: <cluster-namespace>
+```
+
+Also create the namespace in the appropriate region by running `kubectl create namespace <cluster-namespace> --context=<cluster-context>`.
+
+Change the resource `requests` and `limits` to appropriate values for the hardware that you're running. You can see the allocatable resources on each of your Kubernetes nodes by running `kubectl describe nodes`.
+
+```
+resources:
+  requests:
+    cpu: "16"
+    memory: "8Gi"
+  limits:
+    memory: "8Gi"
+```
+
+Replace the placeholder values in the `--join` and `--locality` flags with the namespace of the CockroachDB cluster in each region (e.g., `us-east-1`). `--join` specifies the host addresses that connect nodes to the cluster and distribute the rest of the node addresses. `--locality` describes the location of each CockroachDB node.
+
+```
+--join cockroachdb-0.cockroachdb.<cluster-namespace-1>,cockroachdb-1.cockroachdb.<cluster-namespace-1>,cockroachdb-2.cockroachdb.<cluster-namespace-1>,cockroachdb-0.cockroachdb.<cluster-namespace-2>,cockroachdb-1.cockroachdb.<cluster-namespace-2>,cockroachdb-2.cockroachdb.<cluster-namespace-2>,cockroachdb-0.cockroachdb.<cluster-namespace-3>,cockroachdb-1.cockroachdb.<cluster-namespace-3>,cockroachdb-2.cockroachdb.<cluster-namespace-3>
+--locality=region=<cluster-namespace-1>,az=$(cat /etc/cockroach-env/zone),dns=$(hostname -f)
+```
+
+You can then deploy the StatefulSet in each region, specifying the appropriate cluster context and namespace (which you defined above):
+
+```
+kubectl create -f <statefulset> --context=<cluster-context> --namespace=<cluster-namespace>
+```
+
+Before initializing the cluster, however, you must enable CockroachDB pods to communicate across regions. This includes peering the VPCs in all 3 regions with each other, setting up a [Network Load Balancer](#set-up-load-balancing) in each region, and [configuring a CoreDNS service](#configure-coredns) to route DNS traffic to the appropriate pods. For information on configuring the EC2 network, see our [documentation](https://www.cockroachlabs.com/docs/stable/orchestrate-cockroachdb-with-kubernetes-multi-cluster.html#eks).
+
+### Set up load balancing
+
+[`dns-lb-eks.yaml`](https://github.com/cockroachdb/cockroach/cloud/kubernetes/multiregion/eks/dns-lb-eks.yaml) creates a [Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) pointed at the CoreDNS service that routes DNS traffic to the appropriate pods. 
+
+Upload the load balancer manifest to each region:
+
+```
+kubectl create -f https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/multiregion/eks/dns-lb-eks.yaml --context=<cluster-context>
+```
+
+### Configure CoreDNS
+
+[`configmap.yaml`](https://github.com/cockroachdb/cockroach/cloud/kubernetes/multiregion/eks/configmap.yaml) is a template for [modifying the ConfigMap](https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/#coredns-configmap-options) for the CoreDNS Corefile in each region.
+
+You must define a separate ConfigMap for each region. Each unique ConfigMap lists the forwarding addresses for the pods in the 2 other regions. 
+
+For each region, replace:
+
+- `region2` and `region3` with the namespaces in which the CockroachDB pods will run in the other 2 regions.
+
+- `ip1`, `ip2`, and `ip3` with the IP addresses of the Network Load Balancers in the region.
+
+First back up the existing ConfigMap in each region:
+
+```
+kubectl -n kube-system get configmap coredns -o yaml > <configmap-backup-name>
+```
+
+Then apply the new ConfigMap:
+
+```
+kubectl apply -f <configmap-name> --context=<cluster-context>
+```
+
+## More information
+
+For more information on running CockroachDB in Kubernetes, please see the [README in the parent directory](../../README.md).
diff --git a/cloud/kubernetes/multiregion/eks/cockroachdb-statefulset-secure-eks.yaml b/cloud/kubernetes/multiregion/eks/cockroachdb-statefulset-secure-eks.yaml
@@ -0,0 +1,281 @@
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: cockroachdb
+  labels:
+    app: cockroachdb
+---
+apiVersion: rbac.authorization.k8s.io/v1beta1
+kind: Role
+metadata:
+  name: cockroachdb
+  labels:
+    app: cockroachdb
+rules:
+- apiGroups:
+  - ""
+  resources:
+  - secrets
+  verbs:
+  - create
+  - get
+---
+apiVersion: rbac.authorization.k8s.io/v1beta1
+kind: ClusterRole
+metadata:
+  name: cockroachdb
+  labels:
+    app: cockroachdb
+rules:
+- apiGroups:
+  - certificates.k8s.io
+  resources:
+  - certificatesigningrequests
+  verbs:
+  - create
+  - get
+  - watch
+---
+apiVersion: rbac.authorization.k8s.io/v1beta1
+kind: RoleBinding
+metadata:
+  name: cockroachdb
+  labels:
+    app: cockroachdb
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: Role
+  name: cockroachdb
+subjects:
+- kind: ServiceAccount
+  name: cockroachdb
+  namespace: default
+---
+apiVersion: rbac.authorization.k8s.io/v1beta1
+kind: ClusterRoleBinding
+metadata:
+  name: cockroachdb
+  labels:
+    app: cockroachdb
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: cockroachdb
+subjects:
+- kind: ServiceAccount
+  name: cockroachdb
+  namespace: default
+---
+apiVersion: v1
+kind: Service
+metadata:
+  # This service is meant to be used by clients of the database. It exposes a ClusterIP that will
+  # automatically load balance connections to the different database pods.
+  name: cockroachdb-public
+  labels:
+    app: cockroachdb
+spec:
+  ports:
+  # The main port, served by gRPC, serves Postgres-flavor SQL, internode
+  # traffic and the cli.
+  - port: 26257
+    targetPort: 26257
+    name: grpc
+  # The secondary port serves the UI as well as health and debug endpoints.
+  - port: 8080
+    targetPort: 8080
+    name: http
+  selector:
+    app: cockroachdb
+---
+apiVersion: v1
+kind: Service
+metadata:
+  # This service only exists to create DNS entries for each pod in the stateful
+  # set such that they can resolve each other's IP addresses. It does not
+  # create a load-balanced ClusterIP and should not be used directly by clients
+  # in most circumstances.
+  name: cockroachdb
+  labels:
+    app: cockroachdb
+  annotations:
+    # Use this annotation in addition to the actual publishNotReadyAddresses
+    # field below because the annotation will stop being respected soon but the
+    # field is broken in some versions of Kubernetes:
+    # https://github.com/kubernetes/kubernetes/issues/58662
+    service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
+    # Enable automatic monitoring of all instances when Prometheus is running in the cluster.
+    prometheus.io/scrape: "true"
+    prometheus.io/path: "_status/vars"
+    prometheus.io/port: "8080"
+spec:
+  ports:
+  - port: 26257
+    targetPort: 26257
+    name: grpc
+  - port: 8080
+    targetPort: 8080
+    name: http
+  # We want all pods in the StatefulSet to have their addresses published for
+  # the sake of the other CockroachDB pods even before they're ready, since they
+  # have to be able to talk to each other in order to become ready.
+  publishNotReadyAddresses: true
+  clusterIP: None
+  selector:
+    app: cockroachdb
+---
+apiVersion: policy/v1beta1
+kind: PodDisruptionBudget
+metadata:
+  name: cockroachdb-budget
+  labels:
+    app: cockroachdb
+spec:
+  selector:
+    matchLabels:
+      app: cockroachdb
+  maxUnavailable: 1
+---
+apiVersion: apps/v1
+kind: StatefulSet
+metadata:
+  name: cockroachdb
+  # TODO: Use this field to specify a namespace other than "default" in which to deploy CockroachDB (e.g., us-east-1).
+  # namespace: <cluster-namespace>
+spec:
+  serviceName: "cockroachdb"
+  replicas: 3
+  selector:
+    matchLabels:
+      app: cockroachdb
+  template:
+    metadata:
+      labels:
+        app: cockroachdb
+    spec:
+      serviceAccountName: cockroachdb
+      affinity:
+        podAntiAffinity:
+          preferredDuringSchedulingIgnoredDuringExecution:
+          - weight: 100
+            podAffinityTerm:
+              labelSelector:
+                matchExpressions:
+                - key: app
+                  operator: In
+                  values:
+                  - cockroachdb
+              topologyKey: kubernetes.io/hostname
+      # This init container is used to determine the availability zones of the Cockroach pods. The AZs are used to define --locality when starting Cockroach nodes.
+      initContainers:
+      - command:
+        - sh
+        - -ecx
+        - echo "aws-$(curl http://169.254.169.254/latest/meta-data/placement/availability-zone/)"
+          > /etc/cockroach-env/zone
+        image: byrnedo/alpine-curl:0.1
+        imagePullPolicy: IfNotPresent
+        name: locality-container
+        resources: {}
+        terminationMessagePath: /dev/termination-log
+        terminationMessagePolicy: File
+        volumeMounts:
+        - mountPath: /etc/cockroach-env
+          name: cockroach-env
+      containers:
+      - name: cockroachdb
+        image: cockroachdb/cockroach:v20.1.3
+        imagePullPolicy: IfNotPresent
+        # TODO: Change these to appropriate values for the hardware that you're running. You can see the amount of allocatable resources on each of your Kubernetes nodes by running: kubectl describe nodes
+        # resources:
+        #   requests:
+        #     cpu: "16"
+        #     memory: "8Gi"
+        #     NOTE: Unless you have enabled the non-default Static CPU Management Policy and are using an integer number of CPUs, we don't recommend setting a CPU limit. See:
+        #         https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy
+        #         https://github.com/kubernetes/kubernetes/issues/51135
+        #   limits:
+        #     cpu: "16"
+        #     memory: "8Gi"
+        ports:
+        - containerPort: 26257
+          name: grpc
+        - containerPort: 8080
+          name: http
+        livenessProbe:
+          httpGet:
+            path: "/health"
+            port: http
+            scheme: HTTPS
+          initialDelaySeconds: 30
+          periodSeconds: 5
+        readinessProbe:
+          httpGet:
+            path: "/health?ready=1"
+            port: http
+            scheme: HTTPS
+          initialDelaySeconds: 10
+          periodSeconds: 5
+          failureThreshold: 2
+        volumeMounts:
+        - name: datadir
+          mountPath: /cockroach/cockroach-data
+        - name: certs
+          mountPath: /cockroach/cockroach-certs
+        - name: cockroach-env
+          mountPath: /etc/cockroach-env
+        env:
+        - name: COCKROACH_CHANNEL
+          value: kubernetes-multiregion
+        - name: GOMAXPROCS
+          valueFrom:
+            resourceFieldRef:
+              resource: limits.cpu
+              divisor: "1"
+        - name: MEMORY_LIMIT_MIB
+          valueFrom:
+            resourceFieldRef:
+              resource: limits.memory
+              divisor: "1Mi"
+        command:
+          - "/bin/bash"
+          - "-ecx"
+          # The use of qualified `hostname -f` is crucial:
+          # Other nodes aren't able to look up the unqualified hostname.
+          - exec
+            /cockroach/cockroach
+            start
+            --logtostderr
+            --certs-dir /cockroach/cockroach-certs
+            --advertise-host $(hostname -f)
+            --http-addr 0.0.0.0
+            # TODO: Replace the placeholder values in --join and --locality with the namespace of the CockroachDB cluster in each region (e.g., us-east-1).
+            # --join cockroachdb-0.cockroachdb.<cluster-namespace-1>,cockroachdb-1.cockroachdb.<cluster-namespace-1>,cockroachdb-2.cockroachdb.<cluster-namespace-1>,cockroachdb-0.cockroachdb.<cluster-namespace-2>,cockroachdb-1.cockroachdb.<cluster-namespace-2>,cockroachdb-2.cockroachdb.<cluster-namespace-2>,cockroachdb-0.cockroachdb.<cluster-namespace-3>,cockroachdb-1.cockroachdb.<cluster-namespace-3>,cockroachdb-2.cockroachdb.<cluster-namespace-3>
+            # --locality=region=<cluster-namespace-1>,az=$(cat /etc/cockroach-env/zone),dns=$(hostname -f)
+            --cache $(expr $MEMORY_LIMIT_MIB / 4)MiB
+            --max-sql-memory $(expr $MEMORY_LIMIT_MIB / 4)MiB
+      # No pre-stop hook is required, a SIGTERM plus some time is all that's
+      # needed for graceful shutdown of a node.
+      terminationGracePeriodSeconds: 60
+      volumes:
+      - name: datadir
+        persistentVolumeClaim:
+          claimName: datadir
+      - name: certs
+        secret:
+          secretName: cockroachdb.node
+          defaultMode: 256
+      - name: cockroach-env
+        emptyDir: {}
+  podManagementPolicy: Parallel
+  updateStrategy:
+    type: RollingUpdate
+  volumeClaimTemplates:
+  - metadata:
+      name: datadir
+    spec:
+      accessModes:
+        - "ReadWriteOnce"
+      resources:
+        requests:
+          storage: 100Gi