Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update xDS doc #528

Merged
merged 2 commits into from
May 9, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
update xds doc
rezvaneh committed Apr 29, 2022

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit dee5dc38e10172e1df519202b5709b33d9bf866b
154 changes: 146 additions & 8 deletions docs/src/xds.md
Original file line number Diff line number Diff line change
@@ -4,16 +4,9 @@ In addition to static configuration provided upon startup, a Quiklin proxy's con

Communication between the proxy and management server uses the [xDS gRPC protocol][xDS], similar to an [envoy proxy]. xDS is one of the standard configuration mechanisms for software proxies and as a result, Quilkin can be setup to discover configuration resources from any API compatible server. Also, given that the protocol is [well specified][xDS-protocol], it is similarly straight-forward to implement a custom server to suit any deployment's needs.

> The [go-control-plane] project provides production ready implementations of the API on top of which custom servers can be built relatively easily.
As described within the [xDS-api] documentation, the xDS API comprises a set of resource discovery APIs, each serving a specific set of configuration resource types, while the protocol itself comes in several [variants][xds-variants].
Quilkin implements the **Aggregated Discovery Service (ADS)** _State of the World (SotW)_ variant with gRPC.

## Sample Control Plane Implementation

A sample control plane can be found [here][control-plane], with both a demo project that uses a configuration file
as its data source, as well as an opinionated integration with [Agones] game server orchestration framework.

## Supported APIs

Since the range of resources configurable by the xDS API extends that of Quilkin's domain (i.e being UDP based, Quilkin does not have a need for HTTP/TCP resources), only a subset of the API is supported. The following lists these relevant parts and any limitation to the provided support as a result:
@@ -36,6 +29,113 @@ Since the range of resources configurable by the xDS API extends that of Quilkin
* Only the list of [filters][xds-filters] specified in the [filter chain][xds-filter-chain] is used by the proxy - i.e other fields like `filter_chain_match` are ignored. This list also specifies the order that the corresponding filter chain will be constructed.
* gRPC proto configuration for Quilkin's built-in filters [can be found here][filter-protos]. They are equivalent to the filter's static configuration.

## Available Providers
The server can be run by a quilkin commmand name _manage_.

### Agones

1. Cluster information is retrieved from [Agones] - the server watches for `Allocated`
[Agones GameServers] and exposes their IP address and Port as [upstream endpoints][upstream-endpoint] to
any connected Quilkin proxies.
The set of tokens for the associated endpoint can be set by adding a comma separated standard base64 encoded strings.
This must be added under an annotation `quilkin.dev/tokens` in the [GameServer][Agones GameServers]'s spec.
For example:
```yaml
annotations:
Sets two tokens for the corresponding endpoint with values 1x7ijy6 and 8gj3v2i respectively.
quilkin.dev/tokens: MXg3aWp5Ng==,OGdqM3YyaQ==
```
> Since an Agones GameServer can have multiple ports exposed, if multiple ports are in
> use, the server looks for the port named `default` and picks that as the endpoint's
> port (otherwise it picks the first port in the port list).

2. Filter chain is configurable on a per-proxy basis. By default an empty filter chain is used and from there the filter chain can configured using a configMap name `quilkin-config` on the proxy's pod.

As an example, the following runs the server with subcommnad _agones_ against a cluster (using default kubeconfig configuration) where Quilkin pods run in the `quilkin` namespace and game-server pods run in the `gameservers` namespace:

```sh
quilkin manage --port 18000 agones --config-namespace quilkin --gameservers-namespace gameservers
```

> A proxy's pod must have a `quilkin.dev/role` key in `quilkin-config` configMap set to the value `proxy` in order for the management server to detect the pod as a proxy and push updates to it.

> Note that currently, the server can only discover resources within a single cluster.

### Filesystem

The file server command is primarily an example and mostly suitable for demo purposes. As a result, some configuration options and features might be missing. This file implementation watches a configuration file on disk and sends updates to proxies whenever that file changes.

It can be started with using subcommnad _file_ as the following:
```sh
quilkin manage --port 18000 file --config-file-path config.yaml
```

After running this command, any proxy that connects to port 18000 will receive updates as configured in `config.yaml` file.

The configuration file schema is:
```yaml
# clusters contain a list of clusters.
# Each entry represents a cluster configuration.
clusters: [{
# Name of the cluster.
name: string
# List of endpoints belonging to the cluster.
# Each entry represents an upstream endpoint.
endpoints: [{
# The endpoint's IP address.
ip: int
# The endpoint's port.
port: int
# Opaque metadata that will be the endpoint's metadata.
metadata: {}
}]
}]
# filterchain represents the filter chain configuration.
# It contains a list of filter configurations.
filterchain: [{
# Name of the filter
name: string
# typed_config contains the filter's configuration.
typed_config: {
# @type must be equivalent to name - the name of the filter.
# It is an extra, required field.
'@type': string
# ...
# The rest of the body contains filter specific configuration or
# is empty if the filter has no configuration.
}
}]
```
Example:
```yaml
clusters:
- name: cluster-a
endpoints:
- ip: 123.0.0.1
port": 29
metadata:
'quilkin.dev':
tokens:
- "MXg3aWp5Ng=="
filterchain:
- name: quilkin.filters.debug.v1alpha1.Debug
typed_config:
'@type': quilkin.filters.debug.v1alpha1.Debug
id: hello
```

## Admin server

In addition the gRPC server, a http server (configurable via `--admin-port`is also started to serve administrative functionality.
The following endpoints are provided:
- `/ready`: Readiness probe that returns a 5xx if communication with the Kubernetes api is problematic.
- `/live`: Liveness probe that always returns a 200 response.
- `/metrics`: Exposes Prometheus metrics.

## Metrics

Quilkin exposes the following metrics around the management servers and its resources:
@@ -61,10 +161,45 @@ Quilkin exposes the following metrics around the management servers and its reso
The total number of [DiscoveryRequest]s made by the proxy to management servers. This tracks messages flowing in the direction from the proxy to the management server.


The following metrics are exposed by the management server.

- `quilkin_management_server_connected_proxies` (Gauge)

The number of proxies currently connected to the server.
- `quilkin_management_server_discovery_requests_total{request_type}` (Counter)

The total number of xDS Discovery requests received across all proxies.
- `request_type` = `type.googleapis.com/envoy.config.cluster.v3.Cluster` | `type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment` | `type.googleapis.com/envoy.config.listener.v3.Listener`
Type URL of the requested resource
- `quilkin_management_server_discovery_responses_total` (Counter)

The total number of xDS Discovery responses sent back across all proxies in response to Discovery Requests.
Each Discovery response sent corresponds to a configuration update for some proxy.
- `request_type` = `type.googleapis.com/envoy.config.cluster.v3.Cluster` | `type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment` | `type.googleapis.com/envoy.config.listener.v3.Listener`
Type URL of the requested resource
- `quilkin_management_server_endpoints_total` (Gauge)

The number of active endpoints discovered by the server. The number of active endpoints
correlates with the size of the cluster configuration update sent to proxies.
- `quilkin_management_server_snapshot_generation_errors_total` (Counter)

The total number of errors encountered while generating a configuration snapshot update for a proxy.
- `quilkin_management_server_snapshots_generated_total` (Counter)

The total number of configuration snapshot generated across all proxies. A snapshot corresponds
to a point in time view of a proxy's configuration. However it does not necessarily correspond
to a proxy update - a proxy only gets the latest snapshot so it might miss intermediate
snapshots if it lags behind.
- `quilkin_management_server_snapshots_cache_size` (Gauge)

The current number of snapshots in the in-memory snapshot cache. This corresponds 1-1 to
proxies that connect to the server. However the number may be slightly higher than the number
of connected proxies since snapshots for disconnected proxies are only periodically cleared
from the cache.

[xDS]: https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#xds-rest-and-grpc-protocol
[envoy proxy]: https://www.envoyproxy.io/docs/envoy/latest/
[xDS-protocol]: https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#the-xds-transport-protocol
[go-control-plane]: https://github.com/envoyproxy/go-control-plane
[xDS-api]: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/operations/dynamic_configuration
[CDS]: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/operations/dynamic_configuration#cds
[EDS]: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/operations/dynamic_configuration#eds
@@ -86,3 +221,6 @@ Quilkin exposes the following metrics around the management servers and its reso
[endpoint-metadata]: ./proxy.md#endpoint-metadata
[control-plane]: https://github.com/googleforgames/quilkin/tree/main/xds
[Agones]: https://agones.dev
[Kubernetes]: https://kubernetes.io/
[Agones GameServers]: https://agones.dev/site/docs/getting-started/create-gameserver/
[upstream-endpoint]: https://googleforgames.github.io/quilkin/main/book/proxy.html#upstream-endpoint
173 changes: 0 additions & 173 deletions xds/README.md

This file was deleted.