Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve docs of leaderelection configuration #20601

Merged
merged 13 commits into from
Sep 2, 2020

Conversation

ChrsMark
Copy link
Member

Follow up of #20512.

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 14, 2020
@ChrsMark ChrsMark self-assigned this Aug 14, 2020
@ChrsMark ChrsMark added the Team:Platforms Label for the Integrations - Platforms team label Aug 14, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Aug 14, 2020
@masci
Copy link

masci commented Aug 14, 2020

Following up #20512 (comment)

I would say that the current approach is smoother for every case. I don't think we should remove the Deployment completely by now since in many cases it can be really helpful when it comes to scaling where someone may want to collect cluster-wide metrics from big clusters.

I'm not sure we should chase complex use cases but focus on onboarding instead.

With the current approach, users don't benefit from the simplified configuration provided by leader election out of the box: on the contrary, they have to learn about specifics of the feature through the comments and change the manifest in several points to simplify the setup, which is a bit of a paradox.

If I'm running a big, complex cluster requiring to fine-tune observability, chances are I won't even use the manifests as they are in the repo because the default resources provided to the standalone pods won't fit any possible case.

To sum up, I think with the current setup we're not providing a turnkey solution to anybody, while with leader election we have the chance to build a good user story to simplify onboarding.

@elasticmachine
Copy link
Collaborator

elasticmachine commented Aug 14, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #20601 updated]

  • Start Time: 2020-09-02T08:51:51.369+0000

  • Duration: 61 min 20 sec

Test stats 🧪

Test Results
Failed 0
Passed 2732
Skipped 722
Total 3454

Steps errors

Expand to view the steps failures

  • Name: Install Go 1.14.7

    • Description: .ci/scripts/install-go.sh

    • Duration: 1 min 32 sec

    • Start Time: 2020-09-02T09:15:37.801+0000

    • log

  • Name: Install docker-compose 1.21.0

    • Description: .ci/scripts/install-docker-compose.sh

    • Duration: 1 min 33 sec

    • Start Time: 2020-09-02T09:15:47.840+0000

    • log

@ChrsMark
Copy link
Member Author

@exekias @jsoriano what do you think about @masci 's proposal of completely removing the old proposed manifests that include the Deployment and keep only the new approach with the leader election configuration within the Daemonset?

I'm ok with this, however I'm only concerned about users that already use Deployment manifests (breaking change ?).

@jsoriano
Copy link
Member

@exekias @jsoriano what do you think about @masci 's proposal of completely removing the old proposed manifests that include the Deployment and keep only the new approach with the leader election configuration within the Daemonset?

Maybe we have to check how big a deployment needs to be to require a dedicated pod. There are two main reasons why this may be needed:

  • Collection of cluster-level metrics require significantly more resources, and configuring these resources in the DaemonSet will make most of the pods to reserve much more resources than they need.
  • Performance problems with leader election caused by having two many candidates.

If we are more or less sure that for most of the cases this is not going to be a problem, then I am ok with keeping the new approach only.

I'm ok with this, however I'm only concerned about users that already use Deployment manifests (breaking change ?).

We can keep a dummy deployment with the same name and zero replicas to replace the previous one avoiding a breaking change.

@ChrsMark
Copy link
Member Author

ChrsMark commented Sep 1, 2020

@exekias @jsoriano what do you think about @masci 's proposal of completely removing the old proposed manifests that include the Deployment and keep only the new approach with the leader election configuration within the Daemonset?

Maybe we have to check how big a deployment needs to be to require a dedicated pod. There are two main reasons why this may be needed:

  • Collection of cluster-level metrics require significantly more resources, and configuring these resources in the DaemonSet will make most of the pods to reserve much more resources than they need.
  • Performance problems with leader election caused by having two many candidates.

If we are more or less sure that for most of the cases this is not going to be a problem, then I am ok with keeping the new approach only.

My feeling is that this kind of performance issues will hit users with real big clusters and if they have to fine tune their setup to resolve this issues it shouldn't be a big deal to evaluate the solution of the singleton Pod using a deployment. For most of our users keeping only the new approach from now on would improve their onboarding experience.

I'm ok with this, however I'm only concerned about users that already use Deployment manifests (breaking change ?).

We can keep a dummy deployment with the same name and zero replicas to replace the previous one avoiding a breaking change.

I think that having a dummy Deployment spec inside our manifests making things more complicated for users that are not so familiar with the whole setup. On the other hand I'm not sure if this can be considered a breaking change since docs are linked to the versions of the manifests and only old users that might want adopt the new approach might end up with "orphan" Deployment Pods on their clusters however this is something that not so unusual and it's not that bad.

In this, I'm leaning towards keeping only the new approach.
@masci @jsoriano @exekias let me know what you think.

@exekias
Copy link
Contributor

exekias commented Sep 1, 2020

It would sound like the general case would benefit from going only for the DaemonSet, and then we can document how to switch to a cluster-wide Deployment for big clusters.

This is not really a breaking change, as it won't break existing deployments, we are just changing how the new ones will happen.

Signed-off-by: chrismark <[email protected]>
@jsoriano
Copy link
Member

jsoriano commented Sep 1, 2020

Ok, should we mention something in the docs in case someone upgrades by using the new manifests?

"orphan" Deployment Pods on their clusters however this is something that not so unusual and it's not that bad.

Well, these orphan deployments will be collecting duplicated metrics, with an old version.

@ChrsMark
Copy link
Member Author

ChrsMark commented Sep 1, 2020

Ok, should we mention something in the docs in case someone upgrades by using the new manifests?

New docs will only point to new manifests and only refer to the new approach. And wouldn't be an overkill to mention this in the old versions' docs?

"orphan" Deployment Pods on their clusters however this is something that not so unusual and it's not that bad.

Well, these orphan deployments will be collecting duplicated metrics, with an old version.

@jsoriano
Copy link
Member

jsoriano commented Sep 1, 2020

New docs will only point to new manifests and only refer to the new approach. And wouldn't be an overkill to mention this in the old versions' docs?

I meant to mention only something like "if you deployed these manifests before 7.10, you may need to manually remove the deployment". But we are not telling anything about upgrades so far, so maybe it is ok to don't say anything and users just change the versions in the manifests they are using.

metricbeat/docs/running-on-kubernetes.asciidoc Outdated Show resolved Hide resolved
metricbeat/docs/running-on-kubernetes.asciidoc Outdated Show resolved Hide resolved
metricbeat/docs/running-on-kubernetes.asciidoc Outdated Show resolved Hide resolved
@jsoriano jsoriano added needs_backport PR is waiting to be backported to other branches. v7.10.0 labels Sep 1, 2020
@ChrsMark ChrsMark requested a review from masci September 1, 2020 14:22
Copy link

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few wording suggestions for the docs but this LGTM!

metricbeat/docs/running-on-kubernetes.asciidoc Outdated Show resolved Hide resolved
metricbeat/docs/running-on-kubernetes.asciidoc Outdated Show resolved Hide resolved
metricbeat/docs/running-on-kubernetes.asciidoc Outdated Show resolved Hide resolved
metricbeat/docs/running-on-kubernetes.asciidoc Outdated Show resolved Hide resolved
metricbeat/docs/running-on-kubernetes.asciidoc Outdated Show resolved Hide resolved
metricbeat/docs/running-on-kubernetes.asciidoc Outdated Show resolved Hide resolved
metricbeat/docs/running-on-kubernetes.asciidoc Outdated Show resolved Hide resolved
metricbeat/docs/running-on-kubernetes.asciidoc Outdated Show resolved Hide resolved
metricbeat/docs/running-on-kubernetes.asciidoc Outdated Show resolved Hide resolved
@masci masci linked an issue Sep 2, 2020 that may be closed by this pull request
@ChrsMark ChrsMark merged commit 6d7213f into elastic:master Sep 2, 2020
ChrsMark added a commit to ChrsMark/beats that referenced this pull request Sep 2, 2020
@ChrsMark ChrsMark removed the needs_backport PR is waiting to be backported to other branches. label Sep 2, 2020
ChrsMark added a commit that referenced this pull request Sep 2, 2020
v1v added a commit to v1v/beats that referenced this pull request Sep 2, 2020
…ne-2.0

* upstream/master: (87 commits)
  [packaging] Normalise GCP bucket folder structure (elastic#20903)
  [Metricbeat] Add billing metricset into googlecloud module (elastic#20812)
  Include python docs in devguide index (elastic#20917)
  Avoid generating incomplete configurations in autodiscover (elastic#20898)
  Improve docs of leaderelection configuration (elastic#20601)
  Document how to set the ES host and Kibana URLs in Ingest Manager (elastic#20874)
  docs: Update beats for APM (elastic#20881)
  Adding cborbeat to community beats (elastic#20884)
  Bump kibana version to 7.9.0 in x-pack/metricbeat (elastic#20899)
  Kubernetes state_daemonset metricset for Metricbeat (elastic#20649)
  [Filebeat][zeek] Add new x509 fields to zeek (elastic#20867)
  [Filebeat][Gsuite] Add note about admin in gsuite docs (elastic#20855)
  Ensure kind cluster has RFC1123 compliant name (elastic#20627)
  Setup python paths in test runner configuration (elastic#20832)
  docs: Add  `processor.event` info to Logstash output (elastic#20721)
  docs: update cipher suites (elastic#20697)
  [ECS] Update ecs to 1.6.0 (elastic#20792)
  Fix path in hits docs (elastic#20447)
  Update filebeat azure module documentation (elastic#20815)
  Remove duplicate ListGroupsForUsers in aws/cloudtrail (elastic#20788)
  ...
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this pull request Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Platforms Label for the Integrations - Platforms team v7.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement kubernetes agent cluster scope leader election
5 participants