Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add k8s manifest leveraging leaderelection #20512

Merged
merged 7 commits into from
Aug 14, 2020

Conversation

ChrsMark
Copy link
Member

@ChrsMark ChrsMark commented Aug 10, 2020

What does this PR do?

This PR proposes new Kubernetes manifests for shake of #19731, leveraging unique Autodiscover provider implemented at #20281.

With these manifests, only Metricbeat's Deamonset will be able to monitor whole k8s cluster since one Deamonset Pod each time will hold the leadership being responsible to coordinate metricsets that collect cluster wide metrics.

We will might need a meta issue to keep track of deprecating Deployment manifests (if needed).

Why is it important?

To get rid of the requirement to maintain/handle two different deployment strategies.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Test the manifests similarly to the testing steps of #20281 (comment).

  1. Prepare a multinode cluster (ie on GKE)
  2. Edit metricbeat-leaderelection-kubernetes.yml properly to set the proper image (ie docker.elastic.co/beats/metricbeat:7.10.0-SNAPSHOT) and the proper ES output (ie on Elastic Cloud)
  3. Deploy the metricbeat-leaderelection-kubernetes.yml manifest and make sure that all the desired metricsets are shipping events and that k8s related Dashboards are populated with data correctly.
  4. kubectl delete the leader pod and make sure that the leadership is either transfered to another Pod or it is gained again by the new replacement Pod.
  5. Add static or hints-based autodiscovery configs/providers and make sure that work all together.
    Example:
metricbeat.autodiscover:
      providers:
        # To enable hints based autodiscover uncomment this:
        #- type: kubernetes
        #  node: ${NODE_NAME}
        #  hints.enabled: true
        - type: kubernetes
          node: ${NODE_NAME}
          templates:
            - condition:
                contains:
                  kubernetes.pod.name: "nats"
              config:
                - module: nats
                  hosts: "${data.host}:${data.port}"
        - type: kubernetes
          scope: cluster
          node: ${NODE_NAME}
          unique: true
          identifier: gke-lease
          templates:
            - config:
                - module: kubernetes
                  hosts: ["kube-state-metrics:8080"]
                  period: 10s
                  add_metadata: true
                  metricsets:
                    - state_node
                    - state_deployment
                    - state_replicaset

Related issues

@ChrsMark ChrsMark added enhancement in progress Pull request is currently in progress. needs_backport PR is waiting to be backported to other branches. [zube]: In Progress autodiscovery v7.10.0 labels Aug 10, 2020
@ChrsMark ChrsMark self-assigned this Aug 10, 2020
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 10, 2020
@ChrsMark ChrsMark added the Team:Platforms Label for the Integrations - Platforms team label Aug 10, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Aug 10, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Aug 10, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #20512 updated]

  • Start Time: 2020-08-14T08:21:21.993+0000

  • Duration: 60 min 6 sec

Test stats 🧪

Test Results
Failed 0
Passed 3252
Skipped 727
Total 3979

@ChrsMark ChrsMark added [zube]: In Review needs_reviewer PR needs to be assigned a reviewer review labels Aug 13, 2020
@ChrsMark
Copy link
Member Author

ChrsMark commented Aug 13, 2020

Tested on a 3-node cluster on GKE:

  • Unique provider does the job for us and starts state_* metricsets to monitor cluster-wide services
  • Hints-based autodiscover provider works along with unique provider
  • Template-based autodiscover provider works along with unique provider

Reviews more than welcome at this point.

Signed-off-by: chrismark <[email protected]>
Signed-off-by: chrismark <[email protected]>
@ChrsMark ChrsMark requested review from exekias, masci and jsoriano August 13, 2020 10:13
@jsoriano
Copy link
Member

I wonder if we want to maintain the two sets of manifests for Metricbeat. Having a provider with leader election or a deployment to do the same thing is an implementation detail. I think that having an only strategy is more clear, and having leader election by default seems a better option as it only requires the DaemonSet.
Also having a mostly duplicated configuration can be more problematic for us, we may forget to apply some change in one of the configurations, it already happens to us when we add something to the manifest of some beat but we don't add it to the ones of other beats that could benefit from it.

One way to help on the migration while keeping the current set of manifests would be:

  • Add the leadership configuration to current DaemonSet.
  • Add replicas: 0 to current Deployment.
  • Add a comment in the leadership configuration explaining to comment out this provider and the replicas: 0 line in the Deployment if old behaviour is wanted.
  • Update docs that mention the Deployment.

Having the deployment with replicas: 0 sould allow to use the new manifest with existing deployments that use the manifests as we provide them. It will keep the Deployment, but it won't start any pod.

@jsoriano
Copy link
Member

Or we can also add the leadership configuration to current manifests but commented out, and add a comment about deleting the deployment if this configuration is wanted.

@jsoriano
Copy link
Member

jsoriano commented Aug 13, 2020

Oh, another thought come to my mind that could affect the decision of deprecating the deployment manifest. How does leader election work when there are many candidates? In a cluster with thousands of nodes there will be thousands of candidates competing for the lease.

@ChrsMark
Copy link
Member Author

Oh, another thought come to my mind that could affect the decision of deprecating the deployment manifest. How does leader election work when there are many candidates? In a cluster with thousands of nodes there will be thousands of candidates competing for the lease.

Hmm, good question but I don't think we can know about it if we don't try 🙂 . The thousands of leader candidates will try to get the lock and we rely on the locking mechanism of the leaderelection library for this. I don't see any difference if there are 3 or 3K candidates since all of them will try to acquire the lock from the API server. Having said this, I see an "overhead" on each candidate since it will have one extra routine to keep trying to obtain the lock and I guess that this will also add extra load to the k8s API server (spinlock behaviour requesting for the lock). But all these do not look really bad to me on cluster that carries a lot of workloads/operators-controllers etc.

Is this what you had in mind?

Signed-off-by: chrismark <[email protected]>
Signed-off-by: chrismark <[email protected]>
@jsoriano
Copy link
Member

Oh, another thought come to my mind that could affect the decision of deprecating the deployment manifest. How does leader election work when there are many candidates? In a cluster with thousands of nodes there will be thousands of candidates competing for the lease.

Hmm, good question but I don't think we can know about it if we don't try slightly_smiling_face . The thousands of leader candidates will try to get the lock and we rely on the locking mechanism of the leaderelection library for this. I don't see any difference if there are 3 or 3K candidates since all of them will try to acquire the lock from the API server. Having said this, I see an "overhead" on each candidate since it will have one extra routine to keep trying to obtain the lock and I guess that this will also add extra load to the k8s API server (spinlock behaviour requesting for the lock). But all these do not look really bad to me on cluster that carries a lot of workloads/operators-controllers etc.

Is this what you had in mind?

Yes, this is what I was thinking, with 3K candidates, if a leader change is needed, 3K interactions with the API will be done more or less suddenly. But as you say in a cluster this size the API is probably sized accordingly.

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config LGTM, added some suggestions about the docs.

metricbeat/docs/running-on-kubernetes.asciidoc Outdated Show resolved Hide resolved
metricbeat/docs/running-on-kubernetes.asciidoc Outdated Show resolved Hide resolved
Signed-off-by: chrismark <[email protected]>
@ChrsMark ChrsMark merged commit 7b7fb3b into elastic:master Aug 14, 2020
ChrsMark added a commit to ChrsMark/beats that referenced this pull request Aug 14, 2020
@ChrsMark ChrsMark removed the needs_backport PR is waiting to be backported to other branches. label Aug 14, 2020
Copy link

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for the posthumous review here but I didn't expect we would merge right ahead.

I'm not sure I fully understand why we're keeping around the single-pod deployment. I know we need to keep backward compatibility but the setup for new users onboarding with 7.10 seems a bit clumsy, having to manually remove the deployment definition or (worse) set replicas to 0.

Can we discuss what's the use case that would impact backward compatibility if we remove the deployment?

Comment on lines +232 to +235
Users can enable the respective parts the Daemonset ConfigMap and
set the `replicas` of the Deployment to `0` in order to only deploy
the Daemonset on the cluster with the leader election provider enabled
in order to collect cluster-wide metrics:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrsMark Something doesn't work with the wording here, we should make this statement a bit clearer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +23 to +27
# Uncomment the following to enable leader election provider that handles
# singleton instance configuration across the Daemonset Pods of the whole cluster
# in order to monitor some unique data sources, like kube-state-metrics.
# When enabling this remember to also delete the Deployment or just set the replicas of the
# Deployment to 0.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too much wording IMO, if we do the docs right this should only be

# uncomment the following to enable collection from kube-state-metrics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I will try to improve it

# scope: cluster
# node: ${NODE_NAME}
# unique: true
# # identifier: <lease_lock_name>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be confusing: under which circumstances should I uncomment? What the value should be in that case?
I'd remove this line altogether.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I will remove it

@@ -258,6 +293,8 @@ metadata:
labels:
k8s-app: metricbeat
spec:
# Set to 0 if using leader election provider with the Daemonset
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we remove the Deployment instead? I don't think the UX is great here...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Removing the deployment completely will have the same result -> 0 Deployment Pods.

@ChrsMark
Copy link
Member Author

I'm sorry for the posthumous review here but I didn't expect we would merge right ahead.

I'm not sure I fully understand why we're keeping around the single-pod deployment. I know we need to keep backward compatibility but the setup for new users onboarding with 7.10 seems a bit clumsy, having to manually remove the deployment definition or (worse) set replicas to 0.

Can we discuss what's the use case that would impact backward compatibility if we remove the deployment?

Thanks for the comments @masci! I would say that the current approach is smoother for every case. I don't think we should remove the Deployment completely by now since in many cases it can be really helpful when it comes to scaling where someone may want to collect cluster-wide metrics from big clusters.

In this regard, introducing the new unique provider as an optional-additional approach within the commented out section gives us the option to see it in action optionally by the users that actually want it.

@ChrsMark
Copy link
Member Author

Let's continue the discussion at #20601

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autodiscovery enhancement in progress Pull request is currently in progress. needs_reviewer PR needs to be assigned a reviewer review Team:Platforms Label for the Integrations - Platforms team v7.10.0 [zube]: In Progress [zube]: In Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement kubernetes agent cluster scope leader election
4 participants