Skip to content

Commit

Permalink
edits to operator section
Browse files Browse the repository at this point in the history
  • Loading branch information
gibbscullen committed May 8, 2020
1 parent 0167857 commit 6ed118b
Show file tree
Hide file tree
Showing 16 changed files with 567 additions and 96 deletions.
2 changes: 1 addition & 1 deletion docs-beta/content/how_to_guides/monitoring_m3/_index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Monitoring_m3"
title: "Monitoring M3"
date: 2020-04-21T20:56:58-04:00
draft: true
---
Expand Down
6 changes: 0 additions & 6 deletions docs-beta/content/how_to_guides/other/sql.md

This file was deleted.

3 changes: 1 addition & 2 deletions docs-beta/content/reference_docs/architecture/coordinator.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,4 @@
title: "M3 Coordinator"
date: 2020-04-21T21:01:05-04:00
draft: true
---

--
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@ date: 2020-04-21T21:01:32-04:00
draft: true
---

Link to Yaml: https://github.com/chronosphereio/collector/blob/master/config/chronocollector/config.yml

Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "APIs"
date: 2020-04-21T21:02:36-04:00
title: "Apis"
date: 2020-05-08T12:41:49-04:00
draft: true
---

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
title: "Ingest APIs"
date: 2020-05-08T12:42:14-04:00
draft: true
---

175 changes: 175 additions & 0 deletions docs-beta/content/reference_docs/configurations/apis/operator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
---
title: "Operator API"
date: 2020-05-08T12:42:20-04:00
draft: true
---

API Docs
This document enumerates the Custom Resource Definitions used by the M3DB Operator. It is auto-generated from code comments.

Table of Contents
ClusterCondition
ClusterSpec
IsolationGroup
M3DBCluster
M3DBClusterList
M3DBStatus
NodeAffinityTerm
IndexOptions
Namespace
NamespaceOptions
RetentionOptions
PodIdentity
PodIdentityConfig
ClusterCondition
ClusterCondition represents various conditions the cluster can be in.

Field Description Scheme Required
type Type of cluster condition. ClusterConditionType false
status Status of the condition (True, False, Unknown). corev1.ConditionStatus false
lastUpdateTime Last time this condition was updated. string false
lastTransitionTime Last time this condition transitioned from one status to another. string false
reason Reason this condition last changed. string false
message Human-friendly message about this condition. string false
Back to TOC

ClusterSpec
ClusterSpec defines the desired state for a M3 cluster to be converge to.

Field Description Scheme Required
image Image specifies which docker image to use with the cluster string false
replicationFactor ReplicationFactor defines how many replicas int32 false
numberOfShards NumberOfShards defines how many shards in total int32 false
isolationGroups IsolationGroups specifies a map of key-value pairs. Defines which isolation groups to deploy persistent volumes for data nodes []IsolationGroup false
namespaces Namespaces specifies the namespaces this cluster will hold. []Namespace false
etcdEndpoints EtcdEndpoints defines the etcd endpoints to use for service discovery. Must be set if no custom configmap is defined. If set, etcd endpoints will be templated in to the default configmap template. []string false
keepEtcdDataOnDelete KeepEtcdDataOnDelete determines whether the operator will remove cluster metadata (placement + namespaces) in etcd when the cluster is deleted. Unless true, etcd data will be cleared when the cluster is deleted. bool false
enableCarbonIngester EnableCarbonIngester enables the listener port for the carbon ingester bool false
configMapName ConfigMapName specifies the ConfigMap to use for this cluster. If unset a default configmap with template variables for etcd endpoints will be used. See \"Configuring M3DB\" in the docs for more. *string false
podIdentityConfig PodIdentityConfig sets the configuration for pod identity. If unset only pod name and UID will be used. *PodIdentityConfig false
containerResources Resources defines memory / cpu constraints for each container in the cluster. corev1.ResourceRequirements false
dataDirVolumeClaimTemplate DataDirVolumeClaimTemplate is the volume claim template for an M3DB instance's data. It claims PersistentVolumes for cluster storage, volumes are dynamically provisioned by when the StorageClass is defined. *corev1.PersistentVolumeClaim false
podSecurityContext PodSecurityContext allows the user to specify an optional security context for pods. *corev1.PodSecurityContext false
securityContext SecurityContext allows the user to specify a container-level security context. *corev1.SecurityContext false
imagePullSecrets ImagePullSecrets will be added to every pod. []corev1.LocalObjectReference false
envVars EnvVars defines custom environment variables to be passed to M3DB containers. []corev1.EnvVar false
labels Labels sets the base labels that will be applied to resources created by the cluster. // TODO(schallert): design doc on labeling scheme. map[string]string false
annotations Annotations sets the base annotations that will be applied to resources created by the cluster. map[string]string false
tolerations Tolerations sets the tolerations that will be applied to all M3DB pods. []corev1.Toleration false
priorityClassName PriorityClassName sets the priority class for all M3DB pods. string false
nodeEndpointFormat NodeEndpointFormat allows overriding of the endpoint used for a node in the M3DB placement. Defaults to \"{{ .PodName }}.{{ .M3DBService }}:{{ .Port }}\". Useful if access to the cluster from other namespaces is desired. See \"Node Endpoint\" docs for full variables available. string false
hostNetwork HostNetwork indicates whether M3DB pods should run in the same network namespace as the node its on. This option should be used sparingly due to security concerns outlined in the linked documentation. https://kubernetes.io/docs/concepts/policy/pod-security-policy/#host-namespaces bool false
dnsPolicy DNSPolicy allows the user to set the pod's DNSPolicy. This is often used in conjunction with HostNetwork.+optional *corev1.DNSPolicy false
externalCoordinatorSelector Specify a \"controlling\" coordinator for the cluster It is expected that there is a separate standalone coordinator cluster It is externally managed - not managed by this operator It is expected to have a service endpoint Setup this db cluster, but do not assume a co-located coordinator Instead provide a selector here so we can point to a separate coordinator service Specify here the labels required for the selector map[string]string false
initContainers Custom setup for db nodes can be done via initContainers Provide the complete spec for the initContainer here If any storage volumes are needed in the initContainer see InitVolumes below []corev1.Container false
initVolumes If the InitContainers require any storage volumes Provide the complete specification for the required Volumes here []corev1.Volume false
podMetadata PodMetadata is for any Metadata that is unique to the pods, and does not belong on any other objects, such as Prometheus scrape tags metav1.ObjectMeta false
parallelPodManagement ParallelPodManagement sets StatefulSets created by the operator to have Parallel pod management instead of OrderedReady. This is an EXPERIMENTAL flag and subject to deprecation in a future release. This has not been tested in production and users should not depend on it without validating it for their own use case. bool true
Back to TOC

IsolationGroup
IsolationGroup defines the name of zone as well attributes for the zone configuration

Field Description Scheme Required
name Name is the value that will be used in StatefulSet labels, pod labels, and M3DB placement \"isolationGroup\" fields. string true
nodeAffinityTerms NodeAffinityTerms is an array of NodeAffinityTerm requirements, which are ANDed together to indicate what nodes an isolation group can be assigned to. []NodeAffinityTerm false
numInstances NumInstances defines the number of instances. int32 true
storageClassName StorageClassName is the name of the StorageClass to use for this isolation group. This allows ensuring that PVs will be created in the same zone as the pinned statefulset on Kubernetes < 1.12 (when topology aware volume scheduling was introduced). Only has effect if the clusters dataDirVolumeClaimTemplate is non-nil. If set, the volume claim template will have its storageClassName field overridden per-isolationgroup. If unset the storageClassName of the volumeClaimTemplate will be used. string false
Back to TOC

M3DBCluster
M3DBCluster defines the cluster

Field Description Scheme Required
metadata metav1.ObjectMeta false
type string true
spec ClusterSpec true
status M3DBStatus false
Back to TOC

M3DBClusterList
M3DBClusterList represents a list of M3DB Clusters

Field Description Scheme Required
metadata metav1.ListMeta false
items []M3DBCluster true
Back to TOC

M3DBStatus
M3DBStatus contains the current state the M3DB cluster along with a human readable message

Field Description Scheme Required
state State is a enum of green, yellow, and red denoting the health of the cluster M3DBState false
conditions Various conditions about the cluster. []ClusterCondition false
message Message is a human readable message indicating why the cluster is in it's current state string false
observedGeneration ObservedGeneration is the last generation of the cluster the controller observed. Kubernetes will automatically increment metadata.Generation every time the cluster spec is changed. int64 false
Back to TOC

NodeAffinityTerm
NodeAffinityTerm represents a node label and a set of label values, any of which can be matched to assign a pod to a node.

Field Description Scheme Required
key Key is the label of the node. string true
values Values is an array of values, any of which a node can have for a pod to be assigned to it. []string true
Back to TOC

IndexOptions
IndexOptions defines parameters for indexing.

Field Description Scheme Required
enabled Enabled controls whether metric indexing is enabled. bool false
blockSize BlockSize controls the index block size. string false
Back to TOC

Namespace
Namespace defines an M3DB namespace or points to a preset M3DB namespace.

Field Description Scheme Required
name Name is the namespace name. string false
preset Preset indicates preset namespace options. string false
options Options points to optional custom namespace configuration. *NamespaceOptions false
Back to TOC

NamespaceOptions
NamespaceOptions defines parameters for an M3DB namespace. See https://m3db.github.io/m3/operational_guide/namespace_configuration/ for more details.

Field Description Scheme Required
bootstrapEnabled BootstrapEnabled control if bootstrapping is enabled. bool false
flushEnabled FlushEnabled controls whether flushing is enabled. bool false
writesToCommitLog WritesToCommitLog controls whether commit log writes are enabled. bool false
cleanupEnabled CleanupEnabled controls whether cleanups are enabled. bool false
repairEnabled RepairEnabled controls whether repairs are enabled. bool false
snapshotEnabled SnapshotEnabled controls whether snapshotting is enabled. bool false
retentionOptions RetentionOptions sets the retention parameters. RetentionOptions false
indexOptions IndexOptions sets the indexing parameters. IndexOptions false
Back to TOC

RetentionOptions
RetentionOptions defines parameters for data retention.

Field Description Scheme Required
retentionPeriod RetentionPeriod controls how long data for the namespace is retained. string false
blockSize BlockSize controls the block size for the namespace. string false
bufferFuture BufferFuture controls how far in the future metrics can be written. string false
bufferPast BufferPast controls how far in the past metrics can be written. string false
blockDataExpiry BlockDataExpiry controls the block expiry. bool false
blockDataExpiryAfterNotAccessPeriod BlockDataExpiry controls the not after access period for expiration. string false
Back to TOC

PodIdentity
PodIdentity contains all the fields that may be used to identify a pod's identity in the M3DB placement. Any non-empty fields will be used to identity uniqueness of a pod for the purpose of M3DB replace operations.

Field Description Scheme Required
name string false
uid string false
nodeName string false
nodeExternalID string false
nodeProviderID string false
Back to TOC

PodIdentityConfig
PodIdentityConfig contains cluster-level configuration for deriving pod identity.

Field Description Scheme Required
sources Sources enumerates the sources from which to derive pod identity. Note that a pod's name will always be used. If empty, defaults to pod name and UID. []PodIdentitySource true
Back to TOC
6 changes: 6 additions & 0 deletions docs-beta/content/reference_docs/configurations/apis/query.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
title: "Query APIs"
date: 2020-05-08T12:42:09-04:00
draft: true
---

24 changes: 24 additions & 0 deletions docs-beta/content/reference_docs/configurations/operator/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: "Operator"
date: 2020-05-08T12:43:53-04:00
draft: true
---

Introduction
Welcome to the documentation for the M3DB operator, a Kubernetes operator for running the open-source timeseries database M3DB on Kubernetes.

Please note that this is alpha software, and as such its APIs and behavior are subject to breaking changes. While we aim to produce thoroughly tested reliable software there may be undiscovered bugs.

For more background on the M3DB operator, see our KubeCon keynote on its origins and usage at Uber.

Philosophy
The M3DB operator aims to automate everyday tasks around managing M3DB. Specifically, it aims to automate:

Creating M3DB clusters
Destroying M3DB clusters
Expanding clusters (adding instances)
Shrinking clusters (removing instances)
Replacing failed instances
It explicitly does not try to automate every single edge case a user may ever run into. For example, it does not aim to automate disaster recovery if an entire cluster is taken down. Such use cases may still require human intervention, but the operator will aim to not conflict with such operations a human may have to take on a cluster.

Generally speaking, the operator's philosophy is if it would be unclear to a human what action to take, we will not try to guess.
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
title: "Configuration"
date: 2020-05-08T12:49:38-04:00
draft: true
---

Configuring M3DB
By default the operator will apply a configmap with basic M3DB options and settings for the coordinator to direct Prometheus reads/writes to the cluster. This template can be found here.

To apply custom a configuration for the M3DB cluster, one can set the configMapName parameter of the cluster spec to an existing configmap.

Environment Warning
If providing a custom config map, the env you specify in your config must be $NAMESPACE/$NAME, where $NAMESPACE is the Kubernetes namespace your cluster is in and $NAME is the name of the cluster. For example, with the following cluster:

apiVersion: operator.m3db.io/v1alpha1
kind: M3DBCluster
metadata:
name: cluster-a
namespace: production
...
The value of env in your config MUST be production/cluster-a. This restriction allows multiple M3DB clusters to safely share the same etcd cluster.
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
title: "Managing nodes"
date: 2020-05-08T12:47:10-04:00
draft: true
---

Pod Identity
Motivation
M3DB assumes that if a process is started and owns sealed shards marked as Available that its data for those shards is valid and does not have to be fetched from peers. Consequentially this means it will begin serving reads for that data. For more background on M3DB topology, see the M3DB topology docs.

In most environments in which M3DB has been deployed in production, it has been on a set of hosts predetermined by whomever is managing the cluster. This means that an M3DB instance is identified in a toplogy by its hostname, and that when an M3DB process comes up and finds its hostname in the cluster with Available shards that it can serve reads for those shards.

This does not work on Kubernetes, particularly when working with StatefulSets, as a pod may be rescheduled on a new node or with new storage attached but its name may stay the same. If we were to naively use an instance's hostname (pod name), and it were to get rescheduled on a new node with no data, it could assume that absence of data is valid and begin returning empty results for read requests.

To account for this, the M3DB Operator determines an M3DB instance's identity in the topology based on a configurable set of metadata about the pod.

Configuration
The M3DB operator uses a configurable set of metadata about a pod to determine its identity in the M3DB placement. This is encapsulated in the PodIdentityConfig field of a cluster's spec. In addition to the configures sources, a pod's name will always be included.

Every pod in an M3DB cluster is annotated with its identity and is passed to the M3DB instance via a downward API volume.

Sources
This section will be filled out as a number of pending PRs land.

Recommendations
No Persistent Storage
If not using PVs, you should set sources to PodUID:

podIdentityConfig:
sources:
- PodUID
This way whenever a container is rescheduled, the operator will initiate a replace and it will stream data from its peers before serving reads. Note that not having persistent storage is not a recommended way to run M3DB.

Remote Persistent Storage
If using remote storage you do not need to set sources, as it will default to just the pods name. The data for an M3DB instance will move around with its container.

Local Persistent Storage
If using persistent local volumes, you should set sources to NodeName. In this configuration M3DB will consider a pod to be the same so long as it's on the same node. Replaces will only be triggered if a pod with the same name is moved to a new host.

Note that if using local SSDs on GKE, node names may stay the same even though a VM has been recreated. We also support ProviderID, which will use the underlying VM's unique ID number in GCE to identity host uniqueness.


Loading

0 comments on commit 6ed118b

Please sign in to comment.