Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding docs for leader election timings #635

Merged
merged 8 commits into from
Nov 13, 2023
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ providers.kubernetes_leaderelection:
# qps: 5
# burst: 10
#leader_lease: agent-k8s-leader-lock
#leader_retryperiod: 2
#leader_leaseduration: 15
#leader_renewdeadline: 10
----

`enabled`:: (Optional) Defaults to true. To explicitly disable the LeaderElection provider,
Expand All @@ -30,6 +33,9 @@ Supported options are `qps` and `burst`. If not set, the Kubernetes client's
default QPS and burst settings are used.
`leader_lease`:: (Optional) Specify the name of the leader lease.
This is set to `elastic-agent-cluster-leader` by default.
`leader_retryperiod`:: (Optional) Default value 2 (in sec). How long before {agent}s try to get leader role.
`leader_leaseduration`:: (Optional) Default value 15 (in sec). How long the Leader {agent} to hold the "leader" state
`leader_renewdeadline`:: (Optional) Default value 10 (in sec). How long for leaders to retry getting "leader" role
gizas marked this conversation as resolved.
Show resolved Hide resolved

The available key is:

Expand All @@ -42,6 +48,24 @@ The available key is:

|===


[discrete]
= Understanding leader timings

As described above, the LeaderElection configuration offers following parameters: Lease duration (`leader_leaseduration`), Renew deadline (`leader_renewdeadline`) and
Retry period (`leader_retryperiod`) . Based on the config provided, each agent will trigger {k8s} API requests and will try to check the status of the lease.
gizas marked this conversation as resolved.
Show resolved Hide resolved

NOTE: The number of Leader API calls is proportional to the number of {agent}s installed. This means that Leader API requests will come from all {agent}s per `leader_retryperiod`. Setting `leader_retryperiod` to a greater value than the default (2sec), means that less API requests will be made towards {k8s} Control API, but will also increase the period where collection of metrics from Leader {agent} might be lost.
gizas marked this conversation as resolved.
Show resolved Hide resolved

The library applies https://github.com/kubernetes/client-go/blob/master/tools/leaderelection/leaderelection.go#L76[specific checks] for the timing parameters and if those are not verified the {agent} will exit with `panic` error.
gizas marked this conversation as resolved.
Show resolved Hide resolved

In general:
- Leaseduration must be greater than renewdeadline
- Renewdeadline must be greater than retryperiod*JitterFactor.

NOTE: Constant JitterFactor=1.2 defined in https://pkg.go.dev/gopkg.in/kubernetes/client-go.v11/tools/leaderelection[leaderelection lib]
gizas marked this conversation as resolved.
Show resolved Hide resolved


[discrete]
= Enabling configurations only when on leadership

Expand All @@ -62,3 +86,5 @@ metricset only when the leadership lock is acquired:
period: 10s
condition: ${kubernetes_leaderelection.leader} == true
----