Skip to content

Commit

Permalink
Merge #51775 #52837
Browse files Browse the repository at this point in the history
51775: cloud: add statefulset config for EKS multi-region r=taroface a=taroface

This StatefulSet config is meant for use with the EKS multi-region docs (WIP).

- Added an `initContainer` that determines the AZ of each pod.
- Added `namespace` field to fill in with a region-appropriate namespace.
- Modified `--join` and `--locality` in the start command for the user to customize with namespace/region names. I am unclear on whether the values here need to match the `namespace` defined for the cluster above.
- Added the memory/CPU requests & limits block that is in the single-region configs (but not the current multi-region config).
- Also added `dns-lb-eks.yaml` for creating an EKS network load balancer.
- Also added `configmap.yaml` for modifying the Corefile.

52837: kvclient: don't spin in the DistSender trying the same replica over and over r=andreimatei a=andreimatei

This patch addresses a scenario where a lease indicates a replica that,
when contacted, claims to not have the lease and instead returns an
older lease. In this scenario, the DistSender detects the fact that the
node returned an old lease (which means that it's not aware of the new
lease that it has acquired - for example because it hasn't applied it
yet whereas other replicas have) and retries the same replica (with a
backoff). Before this patch, the DistSender would retry the replica ad
infinitum, hoping that it'll eventually  become aware of its new lease.
However, it's possible that the replica never finds out about this new
lease (or, at least, not until the lease expires and a new leaseholder
steps up).  This could happen if the a replica acquires a lease but gets
partitioned from all the other replicas before applying it.
This patch puts a bound on the number of times the DistSender will retry
the same replica in a row before moving on to others.

Release note: None

Co-authored-by: taroface <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
  • Loading branch information
3 people committed Aug 20, 2020
3 parents 904b7cb + 4474d83 + 8d54bce commit 2a1e9ce
Show file tree
Hide file tree
Showing 7 changed files with 555 additions and 4 deletions.
4 changes: 2 additions & 2 deletions cloud/kubernetes/multiregion/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Running CockroachDB across multiple Kubernetes clusters
# Running CockroachDB across multiple Kubernetes clusters (GKE)

The script and configuration files in this directory enable deploying
CockroachDB across multiple Kubernetes clusters that are spread across different
geographic regions. It deploys a CockroachDB
geographic regions and hosted on [GKE](https://cloud.google.com/kubernetes-engine). It deploys a CockroachDB
[StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
into each separate cluster, and links them together using DNS.

Expand Down
89 changes: 89 additions & 0 deletions cloud/kubernetes/multiregion/eks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Running CockroachDB across multiple Kubernetes clusters (EKS)

The configuration files in this directory enable a multi-region CockroachDB deployment on [Amazon EKS](https://aws.amazon.com/eks/), using multiple Kubernetes clusters in different geographic regions. They are primarily intended for use with our [Orchestrate CockroachDB Across Multiple Kubernetes Clusters](https://www.cockroachlabs.com/docs/stable/orchestrate-cockroachdb-with-kubernetes-multi-cluster.html#eks) tutorial, but can be modified for use with any multi-region CockroachDB deployment hosted on EKS.

Note that a successful multi-region deployment also requires configuring your EC2 network for inter-region traffic, which is covered fully in our tutorial.

## Usage

The below assumes you have created a Kubernetes cluster in each region in which you want to deploy CockroachDB.

Each of the 3 configuration files must be applied separately to each Kubernetes cluster.

### Create StatefulSets

[`cockroachdb-statefulset-secure-eks.yaml`](https://github.com/cockroachdb/cockroach/cloud/kubernetes/multiregion/eks/cockroachdb-statefulset-secure-eks.yaml) creates a StatefulSet that runs 3 CockroachDB pods in a single region.

Because the multi-region deployment requires deploying CockroachDB to a separate Kubernetes cluster in each region, you need to customize and apply a separate version of this file to each region.

Use the `namespace` field to specify a namespace other than `default` in which to run the CockroachDB pods. This should correspond to the region in which the Kubernetes cluster is deployed (e.g., `us-east-1`).

```
namespace: <cluster-namespace>
```

Also create the namespace in the appropriate region by running `kubectl create namespace <cluster-namespace> --context=<cluster-context>`.

Change the resource `requests` and `limits` to appropriate values for the hardware that you're running. You can see the allocatable resources on each of your Kubernetes nodes by running `kubectl describe nodes`.

```
resources:
requests:
cpu: "16"
memory: "8Gi"
limits:
memory: "8Gi"
```

Replace the placeholder values in the `--join` and `--locality` flags with the namespace of the CockroachDB cluster in each region (e.g., `us-east-1`). `--join` specifies the host addresses that connect nodes to the cluster and distribute the rest of the node addresses. `--locality` describes the location of each CockroachDB node.

```
--join cockroachdb-0.cockroachdb.<cluster-namespace-1>,cockroachdb-1.cockroachdb.<cluster-namespace-1>,cockroachdb-2.cockroachdb.<cluster-namespace-1>,cockroachdb-0.cockroachdb.<cluster-namespace-2>,cockroachdb-1.cockroachdb.<cluster-namespace-2>,cockroachdb-2.cockroachdb.<cluster-namespace-2>,cockroachdb-0.cockroachdb.<cluster-namespace-3>,cockroachdb-1.cockroachdb.<cluster-namespace-3>,cockroachdb-2.cockroachdb.<cluster-namespace-3>
--locality=region=<cluster-namespace-1>,az=$(cat /etc/cockroach-env/zone),dns=$(hostname -f)
```

You can then deploy the StatefulSet in each region, specifying the appropriate cluster context and namespace (which you defined above):

```
kubectl create -f <statefulset> --context=<cluster-context> --namespace=<cluster-namespace>
```

Before initializing the cluster, however, you must enable CockroachDB pods to communicate across regions. This includes peering the VPCs in all 3 regions with each other, setting up a [Network Load Balancer](#set-up-load-balancing) in each region, and [configuring a CoreDNS service](#configure-coredns) to route DNS traffic to the appropriate pods. For information on configuring the EC2 network, see our [documentation](https://www.cockroachlabs.com/docs/stable/orchestrate-cockroachdb-with-kubernetes-multi-cluster.html#eks).

### Set up load balancing

[`dns-lb-eks.yaml`](https://github.com/cockroachdb/cockroach/cloud/kubernetes/multiregion/eks/dns-lb-eks.yaml) creates a [Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) pointed at the CoreDNS service that routes DNS traffic to the appropriate pods.

Upload the load balancer manifest to each region:

```
kubectl create -f https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/multiregion/eks/dns-lb-eks.yaml --context=<cluster-context>
```

### Configure CoreDNS

[`configmap.yaml`](https://github.com/cockroachdb/cockroach/cloud/kubernetes/multiregion/eks/configmap.yaml) is a template for [modifying the ConfigMap](https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/#coredns-configmap-options) for the CoreDNS Corefile in each region.

You must define a separate ConfigMap for each region. Each unique ConfigMap lists the forwarding addresses for the pods in the 2 other regions.

For each region, replace:

- `region2` and `region3` with the namespaces in which the CockroachDB pods will run in the other 2 regions.

- `ip1`, `ip2`, and `ip3` with the IP addresses of the Network Load Balancers in the region.

First back up the existing ConfigMap in each region:

```
kubectl -n kube-system get configmap coredns -o yaml > <configmap-backup-name>
```

Then apply the new ConfigMap:

```
kubectl apply -f <configmap-name> --context=<cluster-context>
```

## More information

For more information on running CockroachDB in Kubernetes, please see the [README in the parent directory](../../README.md).
Original file line number Diff line number Diff line change
@@ -0,0 +1,281 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: cockroachdb
labels:
app: cockroachdb
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
name: cockroachdb
labels:
app: cockroachdb
rules:
- apiGroups:
- ""
resources:
- secrets
verbs:
- create
- get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: cockroachdb
labels:
app: cockroachdb
rules:
- apiGroups:
- certificates.k8s.io
resources:
- certificatesigningrequests
verbs:
- create
- get
- watch
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
name: cockroachdb
labels:
app: cockroachdb
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: cockroachdb
subjects:
- kind: ServiceAccount
name: cockroachdb
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: cockroachdb
labels:
app: cockroachdb
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cockroachdb
subjects:
- kind: ServiceAccount
name: cockroachdb
namespace: default
---
apiVersion: v1
kind: Service
metadata:
# This service is meant to be used by clients of the database. It exposes a ClusterIP that will
# automatically load balance connections to the different database pods.
name: cockroachdb-public
labels:
app: cockroachdb
spec:
ports:
# The main port, served by gRPC, serves Postgres-flavor SQL, internode
# traffic and the cli.
- port: 26257
targetPort: 26257
name: grpc
# The secondary port serves the UI as well as health and debug endpoints.
- port: 8080
targetPort: 8080
name: http
selector:
app: cockroachdb
---
apiVersion: v1
kind: Service
metadata:
# This service only exists to create DNS entries for each pod in the stateful
# set such that they can resolve each other's IP addresses. It does not
# create a load-balanced ClusterIP and should not be used directly by clients
# in most circumstances.
name: cockroachdb
labels:
app: cockroachdb
annotations:
# Use this annotation in addition to the actual publishNotReadyAddresses
# field below because the annotation will stop being respected soon but the
# field is broken in some versions of Kubernetes:
# https://github.com/kubernetes/kubernetes/issues/58662
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
# Enable automatic monitoring of all instances when Prometheus is running in the cluster.
prometheus.io/scrape: "true"
prometheus.io/path: "_status/vars"
prometheus.io/port: "8080"
spec:
ports:
- port: 26257
targetPort: 26257
name: grpc
- port: 8080
targetPort: 8080
name: http
# We want all pods in the StatefulSet to have their addresses published for
# the sake of the other CockroachDB pods even before they're ready, since they
# have to be able to talk to each other in order to become ready.
publishNotReadyAddresses: true
clusterIP: None
selector:
app: cockroachdb
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: cockroachdb-budget
labels:
app: cockroachdb
spec:
selector:
matchLabels:
app: cockroachdb
maxUnavailable: 1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: cockroachdb
# TODO: Use this field to specify a namespace other than "default" in which to deploy CockroachDB (e.g., us-east-1).
# namespace: <cluster-namespace>
spec:
serviceName: "cockroachdb"
replicas: 3
selector:
matchLabels:
app: cockroachdb
template:
metadata:
labels:
app: cockroachdb
spec:
serviceAccountName: cockroachdb
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- cockroachdb
topologyKey: kubernetes.io/hostname
# This init container is used to determine the availability zones of the Cockroach pods. The AZs are used to define --locality when starting Cockroach nodes.
initContainers:
- command:
- sh
- -ecx
- echo "aws-$(curl http://169.254.169.254/latest/meta-data/placement/availability-zone/)"
> /etc/cockroach-env/zone
image: byrnedo/alpine-curl:0.1
imagePullPolicy: IfNotPresent
name: locality-container
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/cockroach-env
name: cockroach-env
containers:
- name: cockroachdb
image: cockroachdb/cockroach:v20.1.3
imagePullPolicy: IfNotPresent
# TODO: Change these to appropriate values for the hardware that you're running. You can see the amount of allocatable resources on each of your Kubernetes nodes by running: kubectl describe nodes
# resources:
# requests:
# cpu: "16"
# memory: "8Gi"
# NOTE: Unless you have enabled the non-default Static CPU Management Policy and are using an integer number of CPUs, we don't recommend setting a CPU limit. See:
# https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy
# https://github.com/kubernetes/kubernetes/issues/51135
# limits:
# cpu: "16"
# memory: "8Gi"
ports:
- containerPort: 26257
name: grpc
- containerPort: 8080
name: http
livenessProbe:
httpGet:
path: "/health"
port: http
scheme: HTTPS
initialDelaySeconds: 30
periodSeconds: 5
readinessProbe:
httpGet:
path: "/health?ready=1"
port: http
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 2
volumeMounts:
- name: datadir
mountPath: /cockroach/cockroach-data
- name: certs
mountPath: /cockroach/cockroach-certs
- name: cockroach-env
mountPath: /etc/cockroach-env
env:
- name: COCKROACH_CHANNEL
value: kubernetes-multiregion
- name: GOMAXPROCS
valueFrom:
resourceFieldRef:
resource: limits.cpu
divisor: "1"
- name: MEMORY_LIMIT_MIB
valueFrom:
resourceFieldRef:
resource: limits.memory
divisor: "1Mi"
command:
- "/bin/bash"
- "-ecx"
# The use of qualified `hostname -f` is crucial:
# Other nodes aren't able to look up the unqualified hostname.
- exec
/cockroach/cockroach
start
--logtostderr
--certs-dir /cockroach/cockroach-certs
--advertise-host $(hostname -f)
--http-addr 0.0.0.0
# TODO: Replace the placeholder values in --join and --locality with the namespace of the CockroachDB cluster in each region (e.g., us-east-1).
# --join cockroachdb-0.cockroachdb.<cluster-namespace-1>,cockroachdb-1.cockroachdb.<cluster-namespace-1>,cockroachdb-2.cockroachdb.<cluster-namespace-1>,cockroachdb-0.cockroachdb.<cluster-namespace-2>,cockroachdb-1.cockroachdb.<cluster-namespace-2>,cockroachdb-2.cockroachdb.<cluster-namespace-2>,cockroachdb-0.cockroachdb.<cluster-namespace-3>,cockroachdb-1.cockroachdb.<cluster-namespace-3>,cockroachdb-2.cockroachdb.<cluster-namespace-3>
# --locality=region=<cluster-namespace-1>,az=$(cat /etc/cockroach-env/zone),dns=$(hostname -f)
--cache $(expr $MEMORY_LIMIT_MIB / 4)MiB
--max-sql-memory $(expr $MEMORY_LIMIT_MIB / 4)MiB
# No pre-stop hook is required, a SIGTERM plus some time is all that's
# needed for graceful shutdown of a node.
terminationGracePeriodSeconds: 60
volumes:
- name: datadir
persistentVolumeClaim:
claimName: datadir
- name: certs
secret:
secretName: cockroachdb.node
defaultMode: 256
- name: cockroach-env
emptyDir: {}
podManagementPolicy: Parallel
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- metadata:
name: datadir
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 100Gi
Loading

0 comments on commit 2a1e9ce

Please sign in to comment.