update xds doc

googleforgames · XAMPPRocky · May 9, 2022 · Apr 29, 2022 · Apr 29, 2022 · Apr 29, 2022
commit dee5dc38e10172e1df519202b5709b33d9bf866b
@@ -4,16 +4,9 @@ In addition to static configuration provided upon startup, a Quiklin proxy's con
 
 Communication between the proxy and management server uses the [xDS gRPC protocol][xDS], similar to an [envoy proxy]. xDS is one of the standard configuration mechanisms for software proxies and as a result, Quilkin can be setup to discover configuration resources from any API compatible server. Also, given that the protocol is [well specified][xDS-protocol], it is similarly straight-forward to implement a custom server to suit any deployment's needs.
 
-> The [go-control-plane] project provides production ready implementations of the API on top of which custom servers can be built relatively easily.
-
 As described within the [xDS-api] documentation, the xDS API comprises a set of resource discovery APIs, each serving a specific set of configuration resource types, while the protocol itself comes in several [variants][xds-variants].
 Quilkin implements the **Aggregated Discovery Service (ADS)** _State of the World (SotW)_ variant with gRPC.
 
-## Sample Control Plane Implementation
-
-A sample control plane can be found [here][control-plane], with both a demo project that uses a configuration file 
-as its data source, as well as an opinionated integration with [Agones] game server orchestration framework.
-
 ## Supported APIs
 
 Since the range of resources configurable by the xDS API extends that of Quilkin's domain (i.e being UDP based, Quilkin does not have a need for HTTP/TCP resources), only a subset of the API is supported. The following lists these relevant parts and any limitation to the provided support as a result:
@@ -36,6 +29,113 @@ Since the range of resources configurable by the xDS API extends that of Quilkin
   * Only the list of [filters][xds-filters] specified in the [filter chain][xds-filter-chain] is used by the proxy - i.e other fields like `filter_chain_match` are ignored. This list also specifies the order that the corresponding filter chain will be constructed.
   * gRPC proto configuration for Quilkin's built-in filters [can be found here][filter-protos]. They are equivalent to the filter's static configuration.
 
+## Available Providers
+The server can be run by a quilkin commmand name _manage_.
+
+ ### Agones
+
+1. Cluster information is retrieved from [Agones] - the server watches for `Allocated`
+   [Agones GameServers] and exposes their IP address and Port as [upstream endpoints][upstream-endpoint] to
+   any connected Quilkin proxies.
+   The set of tokens for the associated endpoint can be set by adding a comma separated standard base64 encoded strings.
+   This must be added under an annotation `quilkin.dev/tokens` in the [GameServer][Agones GameServers]'s spec.
+   For example:
+   ```yaml
+   annotations:
+     Sets two tokens for the corresponding endpoint with values 1x7ijy6 and 8gj3v2i respectively.
+     quilkin.dev/tokens: MXg3aWp5Ng==,OGdqM3YyaQ==
+   ```
+
+   > Since an Agones GameServer can have multiple ports exposed, if multiple ports are in
+   > use, the server looks for the port named `default` and picks that as the endpoint's
+   > port (otherwise it picks the first port in the port list).
+
+2. Filter chain is configurable on a per-proxy basis. By default an empty filter chain is used and from there the filter chain can configured using a configMap name `quilkin-config` on the proxy's pod.
+
+As an example, the following runs the server with subcommnad _agones_ against a cluster (using default kubeconfig configuration) where Quilkin pods run in the `quilkin` namespace and game-server pods run in the `gameservers` namespace:
+
+```sh
+quilkin manage --port 18000 agones --config-namespace quilkin --gameservers-namespace gameservers
+```
+
+> A proxy's pod must have a `quilkin.dev/role` key in `quilkin-config` configMap set to the value `proxy` in order for the management server to detect the pod as a proxy and push updates to it.
+
+> Note that currently, the server can only discover resources within a single cluster.
+
+### Filesystem
+
+The file server command is primarily an example and mostly suitable for demo purposes. As a result, some configuration options and features might be missing. This file implementation watches a configuration file on disk and sends updates to proxies whenever that file changes.
+
+It can be started with using subcommnad _file_ as the following:
+```sh
+quilkin manage --port 18000 file --config-file-path config.yaml
+```
+
+After running this command, any proxy that connects to port 18000 will receive updates as configured in `config.yaml` file.
+
+The configuration file schema is:
+```yaml
+# clusters contain a list of clusters.
+# Each entry represents a cluster configuration.
+clusters: [{
+  # Name of the cluster.
+  name: string
+
+  # List of endpoints belonging to the cluster.
+  # Each entry represents an upstream endpoint.
+  endpoints: [{
+    # The endpoint's IP address.
+    ip: int
+    # The endpoint's port.
+    port: int
+    # Opaque metadata that will be the endpoint's metadata.
+    metadata: {}
+  }]
+}]
+
+# filterchain represents the filter chain configuration.
+# It contains a list of filter configurations.
+filterchain: [{
+  # Name of the filter
+  name: string
+
+  # typed_config contains the filter's configuration.
+  typed_config: {
+    # @type must be equivalent to name - the name of the filter.
+    # It is an extra, required field.
+    '@type': string
+    # ...
+    # The rest of the body contains filter specific configuration or
+    # is empty if the filter has no configuration.
+  }
+}]
+```
+Example:
+```yaml
+clusters:
+- name: cluster-a
+  endpoints:
+  - ip: 123.0.0.1
+    port": 29
+    metadata:
+      'quilkin.dev':
+         tokens:
+         - "MXg3aWp5Ng=="
+filterchain:
+- name: quilkin.filters.debug.v1alpha1.Debug
+  typed_config:
+    '@type': quilkin.filters.debug.v1alpha1.Debug
+    id: hello
+```
+
+## Admin server
+
+In addition the gRPC server, a http server (configurable via `--admin-port`is also started to serve administrative functionality.
+The following endpoints are provided:
+- `/ready`: Readiness probe that returns a 5xx if communication with the Kubernetes api is problematic.
+- `/live`: Liveness probe that always returns a 200 response.
+- `/metrics`: Exposes Prometheus metrics.
+
 ## Metrics
 
 Quilkin exposes the following metrics around the management servers and its resources:
@@ -61,10 +161,45 @@ Quilkin exposes the following metrics around the management servers and its reso
   The total number of [DiscoveryRequest]s made by the proxy to management servers. This tracks messages flowing in the direction from the proxy to the management server.
 
 
+The following metrics are exposed by the management server.
+
+- `quilkin_management_server_connected_proxies` (Gauge)
+
+   The number of proxies currently connected to the server.
+- `quilkin_management_server_discovery_requests_total{request_type}` (Counter)
+
+   The total number of xDS Discovery requests received across all proxies.
+   - `request_type` = `type.googleapis.com/envoy.config.cluster.v3.Cluster` | `type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment` | `type.googleapis.com/envoy.config.listener.v3.Listener`
+     Type URL of the requested resource
+- `quilkin_management_server_discovery_responses_total` (Counter)
+
+   The total number of xDS Discovery responses sent back across all proxies in response to Discovery Requests.
+   Each Discovery response sent corresponds to a configuration update for some proxy.
+   - `request_type` = `type.googleapis.com/envoy.config.cluster.v3.Cluster` | `type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment` | `type.googleapis.com/envoy.config.listener.v3.Listener`
+     Type URL of the requested resource
+- `quilkin_management_server_endpoints_total` (Gauge)
+
+   The number of active endpoints discovered by the server. The number of active endpoints
+   correlates with the size of the cluster configuration update sent to proxies.
+- `quilkin_management_server_snapshot_generation_errors_total` (Counter)
+
+   The total number of errors encountered while generating a configuration snapshot update for a proxy.
+- `quilkin_management_server_snapshots_generated_total` (Counter)
+
+   The total number of configuration snapshot generated across all proxies. A snapshot corresponds
+   to a point in time view of a proxy's configuration. However it does not necessarily correspond
+   to a proxy update - a proxy only gets the latest snapshot so it might miss intermediate
+   snapshots if it lags behind.
+- `quilkin_management_server_snapshots_cache_size` (Gauge)
+
+   The current number of snapshots in the in-memory snapshot cache. This corresponds 1-1 to
+   proxies that connect to the server. However the number may be slightly higher than the number
+   of connected proxies since snapshots for disconnected proxies are only periodically cleared
+   from the cache.
+
 [xDS]: https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#xds-rest-and-grpc-protocol
 [envoy proxy]: https://www.envoyproxy.io/docs/envoy/latest/
 [xDS-protocol]: https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#the-xds-transport-protocol
-[go-control-plane]: https://github.com/envoyproxy/go-control-plane
 [xDS-api]: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/operations/dynamic_configuration
 [CDS]: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/operations/dynamic_configuration#cds
 [EDS]: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/operations/dynamic_configuration#eds
@@ -86,3 +221,6 @@ Quilkin exposes the following metrics around the management servers and its reso
 [endpoint-metadata]: ./proxy.md#endpoint-metadata
 [control-plane]: https://github.com/googleforgames/quilkin/tree/main/xds
 [Agones]: https://agones.dev
+[Kubernetes]: https://kubernetes.io/
+[Agones GameServers]: https://agones.dev/site/docs/getting-started/create-gameserver/
+[upstream-endpoint]: https://googleforgames.github.io/quilkin/main/book/proxy.html#upstream-endpoint