doc: Add a design doc for user customization & persistence

Initially, user defined configurations such as changing the number of replicas for a given deployment in a MetalK8s cluster are lost during an upgrade/downgrade scenario. This document explains some of the design choices considered while designing a simplistic tool for MetalK8s that guarantees that user defined configurations are persisted throughout. Closes: #2233
scality · Feb 17, 2020 · e55466e · e55466e
1 parent 17cbd14
commit e55466e
Show file tree

Hide file tree

Showing 3 changed files with 231 additions and 0 deletions.
diff --git a/docs/developer/architecture/index.rst b/docs/developer/architecture/index.rst
@@ -7,3 +7,4 @@ Architecture Documents
    deployment
    monitoring
    requirements
+   user_customization_and_persistence
diff --git a/docs/developer/architecture/user_customization_and_persistence.rst b/docs/developer/architecture/user_customization_and_persistence.rst
@@ -0,0 +1,218 @@
+User Customization & Persistence
+================================
+
+Context
+-------
+
+.. todo::
+
+   This section will be handled by the requirements PR.
+
+
+Design Choices
+--------------
+
+:term:`ConfigMap` store is chosen as a unified data access and
+storage media for user editable configurations in a MetalK8s cluster based on
+the above requirements for the following reasons:
+
+* Ability to support Update operations on ConfigMap's with CLI and UI easily
+  using our already existing python kubernetes module.
+* Guarantee of adaptability and ease of changing the design and implementation
+  in cases were customer needs evolve rapidly.
+* ConfigMap data is store in the :term:`etcd` database which is generally being
+  backed up. This ensures that user settings cannot be lost easily.
+
+.. note::
+
+   Persisting newly added Grafana dashboards or new Grafana datasources
+   especially for modifications added via the Grafana UI cannot be stored in
+   ConfigMaps.
+   To handle this particular user case, there is a need to provision persistent
+   storage volumes to handle the persistence of these settings across Pod
+   restarts.
+
+Rejected design choices
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Consul KV
+~~~~~~~~~
+
+This approach offers a full fledge KV store with a /kv endpoint which allows
+CRUD operations on all KV data stored in it.
+Consul KV also allows access to past versions of objects and has an optimistic
+concurrency when manipulating multiple objects.
+
+Note that, Consul KV store was rejected because managing operations such as
+performing full backups, system restores for a full fledged KV system
+requires time and much more efforts than the ConfigMap KV store which is
+simplistic and matches the requirements stated.
+
+
+Implementation Details
+----------------------
+
+Storage format
+~~~~~~~~~~~~~~
+
+A sample ConfigMap store can be defined with the following fields.
+
+An example of such a ConfigMap store:
+
+.. code-block:: yaml
+
+    apiVersion: v1
+    kind: ConfigMap
+    metadata:
+      namespace: <namespace>
+      name: <config-name>
+    data:
+      config.yaml: |-
+        apiVersion: <object-version>
+        kind: <kind>
+        spec:
+          <key>: <values>
+
+**Use case 1:**
+
+Configure and store the number of replicas for service specific Deployments
+found in the `metalk8s-monitoring` namespace using the ConfigMap store format.
+
+.. code-block:: yaml
+
+    apiVersion: v1
+    kind: ConfigMap
+    metadata:
+      namespace: metalk8s-monitoring
+      name: metalk8s-grafana-userconfig
+    data:
+      config.yaml: |-
+        apiVersion: metalk8s.scality.com/v1alpha1
+        kind: GrafanaUserConfig
+        spec:
+          replicas: 2
+
+How it works
+~~~~~~~~~~~~
+
+Service pods and deployments will be configured to consume configuration data
+directly from their respective minion external pillars.
+
+During Bootstrap, these external pillar values will be pre-filled with default
+values and the service consumers will be configured to use these values.
+
+**Using Saltstates**
+
+Once a ConfigMap KV store is updated by the user(say a user changes the
+number of replicas for Prometheus deployments to a new value), then perform the
+following actions;
+
+  - Apply a salt state that reads the ConfigMap object, validates the schema
+    based on MetalK8s defined standards and checks the new values passed.
+  - If the ConfigMap object is valid, the new values passed by the user are
+    then re-rendered to the pillars.
+  - Finally, we make sure that the updated values are picked up by their
+    respective consumers(this might require Pod restarts for changes to take
+    effect).
+
+Note that, salt-states are used to sync data and update consumers
+of new configurations changes mainly because of the minimum effort it takes to
+setup this flow(i.e. the act of configuration update by the user and it's
+propagation to the consumers) but the K8s Operator pattern could be use to
+replace configuration synchronization between user defined configurations and
+consumers.
+
+The Operator approach is much more complex, requires much more effort
+to realize and there is no real need for applying changes using this method
+because configuration changes are not frequent(for a typical MetalK8s admin,
+changing the number of replicas for a given deployment could happen once in 3
+months) as such, having an operator watch for object changes is not significant
+and not very useful at this point in time.
+
+**Using Operator architecture(Custom Controllers)**
+
+When using an Operator(a Custom Controller that works with CRDs), we create a
+Custom Resource Definition (CRD) that references a ConfigMap. Once a ConfigMap
+is updated by the user, then the Operator is designed to perform the following
+actions;
+
+  - The Operator is connected to the API server to watch for changes in the
+    ConfigMap.
+  - If the Operator detects a modified ConfigMap, it then determines which line
+    of action it should take which are;
+
+      - Extract the ConfigMap name and object fields
+      - Extract the pods associated to this ConfigMap based on it's labels
+      - Read and validate the ConfigMap data and schema
+      - If the ConfigMap is valid, update the pillars and restart the
+        respective pods such that they pick-up new configs from the pillars.
+      - If the ConfigMap is invalid, log the error and perform no further
+        action because a bad ConfigMap being applied could lead to cluster
+        outages
+
+Iteration 1
+~~~~~~~~~~~
+
+- Define and deploy new ConfigMap stores that will hold user configurations
+  as listed in the requirements
+- Template and render Deployment and Pod manifests that will make use of
+  this user customizable configurations using pillar values
+- Document how to change user configurations using kubectl
+- Create and deploy persistent storage volumes for Grafana dashboards and
+  datasources
+- Document how to create these persistent volume for Grafana dashboards and
+  datasources
+
+Iteration 2
+~~~~~~~~~~~
+
+- Provide a CLI tool for changing any of the user configurations:
+
+    - Count of replicas for chosen Deployments(Prometheus)
+    - Updating a Dex authentication connector(OpenLDAP, AD and
+      staticUser store)
+    - Updating the Alertmanager notification configuration
+
+- Provide a UI interface that allows Update operations on all user customizable
+  settings based on the above requirements
+- Provide a UI interface for adding, updating and deleting service specific
+  configurations for example Dex-LDAP  connector integration.
+- Provide a UI interface for listing MetalK8s available/supported Dex
+  authentication Connectors
+- Provide a UI interface for enabling or disabling Dex authentication
+  connectors(LDAP, Active Directory, StaticUser store)
+- Provide a UI interface for changing the number of replicas for a chosen set
+  of MetalK8s deployments(Prometheus, ....)
+- Add a UI interface for listing Alertmanager notification systems MetalK8s
+  will support(Slack, email, hipchat)
+- Provide a UI interface for adding, modifying and deleting Alertmanager
+  configurations from the listing above
+
+Documentation
+-------------
+
+In the Operational Guide:
+
+* Document how to customize or change any given service settings using the CLI
+  tool
+* Document how to customize or change any given service settings using the UI
+  interface
+
+Test Plan
+---------
+
+- Dex Static User authentication is currently covered in our test-suite and it
+  will make sense to cover atleast one other authentication connector with the
+  easiest being LDAP since we readily have access to OpenLDAP Docker images and
+  automating this process is possible on Scality Cloud
+
+- Add test that ensures that update operations on user configurations are
+  propagated down to the various services
+
+- Other corner cases that require testing to reduce error prone setups include:
+
+   - Checking for invalid values in a user defined configuration(e.g setting
+     the number of replicas to a string("two"))
+   - Checking for invalid formats in a user configuration
+   - Checking that a user lost a configuration and we can actually revert
+     to default values within the pillars
diff --git a/docs/glossary.rst b/docs/glossary.rst
@@ -31,6 +31,18 @@ Glossary
      and from where the cluster will be deployed to other machines. It also
      serves as the entrypoint for upgrades of the cluster.
 
+
+   ConfigMap
+     A ConfigMap is a Kubernetes object that allows one to store general
+     configuration information such as environment variables in a key-value
+     pair format.
+     ConfigMaps can only be applied to namespaces and once created, they can
+     be updated automatically without the need of restarting containers that
+     depend on it.
+
+     |see K8s docs|
+     `ConfigMap <https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#understanding-configmaps-and-pods/>`_.
+
    Controller Manager
    ``kube-controller-manager``
      The Kubernetes controller manager embeds the core control loops shipped