Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Agent standalone k8s manifest #23679

Merged
merged 4 commits into from
Feb 16, 2021

Conversation

ChrsMark
Copy link
Member

@ChrsMark ChrsMark commented Jan 26, 2021

What does this PR do?

This PR adds k8s manifest for running Elastic Agent in standalone mode with the k8s integration enabled by default. This one deploys Agent as Daemonset Pods on all k8s nodes and as Deployment Pod on the cluster. Deamonset Pods are responsible for collecting metrics from node's kubelet API, kubeproxy metrics and try to autodiscover k8s Scheduler Pod and k8s Controller Manager Pod (which are deployed on master node(s)) and start collecting from them dynamically using the respective metricsets. Deployment pod is responsible for collecting cluster wide metrics from kube_state_metrics service running on the cluster.

@blakerouse @masci @ph @ruflin I would love your feedback here.

Disclaimer: The manifest works if we disable the dynamic inputs part. Find full information about the issues in the bottom of this description: #23685

How to test this PR locally

  1. Run a kind cluster locally using the following:
# three node (two workers) cluster config
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker

kind create cluster --config kind-mutly.yaml
2. Uncomment the scheduler and controllermanager config section and deploy Agent: kubectl apply -f elastic-agent-standalone-kubernetes.yml
3. Verify that all data streams ship data:

  • kubernetes.volume
  • kubernetes.container
  • kubernetes.pod
  • kubernetes.proxy
  • kubernetes.node
  • kubernetes.system
  • kubernetes.controllermanager
  • kubernetes.scheduler
  • kubernetes.state_container
  • kubernetes.state_pod 
  • kubernetes.state_replicaset
  • kubernetes.state_node
  • kubernetes.state_deployment 
  • kubernetes.state_service  
  • kubernetes.state_storageclass
  • kubernetes.apiserver
  • kubernetes.event
  • kubernetes.state_persistentvolumeclaim
  • kubernetes.state_statefulset 
  • kubernetes.state_cronjob 
  • kubernetes.state_persistentvolume 
  • kubernetes.state_resourcequota
  1. Install Kubernetes integration from Fleet UI and verify that Dashboards work properly as well as Metrics UI.

Related issues

Open Issues

  1. Dynamic inputs / Former Autodiscover
    Dynamic inputs setup to automatically discover scheduler and controllermanager Pods does not completely work right now and we get the following error:
2021-01-25T15:36:29.224Z	DEBUG	application/periodic.go:40	Failed to read configuration, error: could not emit configuration: could not create the AST from the configuration: missing field accessing 'inputs' (source:'/etc/agent.yml')

Converting ${NODE_NAME} placeholders to ${env.NODE_NAME} does not fix the problem and even if we remove all other datastream configs and leave only the dynamic one it still gives the error:

- id: >-
    kubernetes/metrics-kubernetes.controllermanager-3d50c483-2327-40e7-b3e5-d877d4763fe1
  data_stream:
    dataset: kubernetes.controllermanager
    type: metrics
  metricsets:
    - controllermanager
  hosts:
    - '${kubernetes.pod.ip}:10252'
  period: 10s
  condition: ${kubernetes.pod.labels.component} == 'kube-controller-manager'

In addition, if we remove the dynamic inputs part and have ${env.NODE_NAME} we still get the same error.

In this, there might be a bug in Agent which does not allow us to combine these 2 configuration approaches.

  1. Package setup requires manual interaction with the UI
    After deploying the manifests the package is not automaticaly installed and requires the user to manually install it from Fleet UI. This is already known but I'm putting it here for reference.

@ChrsMark ChrsMark self-assigned this Jan 26, 2021
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 26, 2021
@ChrsMark ChrsMark added the Team:Integrations Label for the Integrations team label Jan 26, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@elasticmachine
Copy link
Collaborator

elasticmachine commented Jan 26, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #23679 updated

  • Start Time: 2021-02-16T11:20:28.199+0000

  • Duration: 52 min 58 sec

  • Commit: 9955c74

Trends 🧪

Image of Build Times

❕ Flaky test report

No test was executed to be analysed.

@blakerouse
Copy link
Contributor

@ChrsMark Lets file an issue for the dynamic inputs piece. You are using it correctly here so it should work, we need to track down and fix why it is not working on the Agent side.

No way to disable dynamic inputs in Agent either, the ${NODE_NAME} would still be considered a variable by dynamic inputs and it would fail to resolve. That still should be ${env.NODE_NAME}.

@ChrsMark
Copy link
Member Author

ChrsMark commented Feb 16, 2021

@ruflin @blakerouse Heads-up on this, after pulling the latest changes from #23886 (thanks @blakerouse!) it finally works and collects metrics from all k8s datastreams. This one also proves that dynamic inputs in combination with kubernetes provider can be used to autodiscover scheduler and controller-manager pods and start collecting from them on the master node (https://github.com/elastic/beats/pull/23679/files#diff-7896a70414721b8d0b3d8b90808b92c750d40c56bdf2ad01bf629c9499cde64eR112):

Screenshot 2021-02-16 at 1 13 02 PM

@blakerouse can you share your thoughts here plz? After this one is in we can add parts for system metrics and container logs (better to split them in different PRs)

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Glad to see this is working.

@blakerouse
Copy link
Contributor

@ChrsMark I think with the new hostfs work your did on the inputs, I think gather system metrics from the nodes should be possible.

@ChrsMark
Copy link
Member Author

@ChrsMark I think with the new hostfs work your did on the inputs, I think gather system metrics from the nodes should be possible.

Yeap, this will be the next one coming.

@ChrsMark
Copy link
Member Author

ChrsMark commented Feb 16, 2021

Merging this one and let's iterate on it with follow-up PRs to add more functionality.

@ChrsMark ChrsMark merged commit 5538a21 into elastic:master Feb 16, 2021
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: elastic-agent
image: docker.elastic.co/beats/elastic-agent:7.12.0-SNAPSHOT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split this manifest on different files and use the %VERSION% placeholder as done in other beats.

runAsUser: 0
resources:
limits:
memory: 200Mi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this pod is going to run metricbeat and filebeat we may need to increase this limit. This is the limit used by default for a single beat.

- >-
${ES_HOST}
username: ${ES_USERNAME}
password: ${ES_PASSWORD}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add settings also for cloud id and auth?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this is how standalone is produced by cloud. But sure we can update it (not sure if cloud settings are available on agent though)

k8s-app: elastic-agent
data:
agent.yml: |-
id: ef9cc740-5bf0-11eb-8b51-39775155c3f5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this id? is it ok to have the same for all agents running in the cluster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blake do you think this would be a problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment from working PR to track: #23938 (comment)

Comment on lines +161 to +165
data_stream:
dataset: kubernetes.pod
type: metrics
metricsets:
- pod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, there are several metricsets with the same configuration, but they are defined as different streams.

Is this because they need to have different data_stream.dataset?
Is this translated to one module configuration for each metricset in Metricbeat config?

This can be relevant for possible uses of logic at the module level, e.g. the cloudfoundry module keeps a single connection at the module level for many metricsets, and we we could make a similar thing with the state_* metricsets of kubernetes to avoid making the same big request per metricset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, that's how standalone config looks like after being exported from Fleet UI. Regarding fetch optimisation this is a known enhancement filed at elastic/integrations#601 (point no2)

@ChrsMark
Copy link
Member Author

@jsoriano thank's for your comments, let's continue at #23938, sorry for merging fast.

v1v added a commit to v1v/beats that referenced this pull request Feb 17, 2021
…-arm

* upstream/master:
  [CI] install docker-compose with retry (elastic#24069)
  Add nodes to filebeat-kubernetes.yaml ClusterRole - fixes elastic#24051 (elastic#24052)
  updating manifest files for filebeat threatintel module (elastic#24074)
  Add Zeek Signatures (elastic#23772)
  Update Beats to ECS 1.8.0 (elastic#23465)
  Support running Docker logging plugin on ARM64 (elastic#24034)
  Fix ec2 metricset fields.yml and add integration test (elastic#23726)
  Only build targz and zip versions of Beats if PACKAGES is set in agent (elastic#24060)
  [Filebeat] Add field definitions for known Netflow/IPFIX vendor fields (elastic#23773)
  [Elastic Agent] Enroll with Fleet Server (elastic#23865)
  [Filebeat] Convert logstash logEvent.action objects to strings (elastic#23944)
  [Ingest Management] Fix reloading of log level for services (elastic#24055)
  Add Agent standalone k8s manifest (elastic#23679)
v1v added a commit to v1v/beats that referenced this pull request Feb 17, 2021
…dows-7

* upstream/master: (332 commits)
  Use ECS v1.8.0 (elastic#24086)
  Add support for postgresql csv logs (elastic#23334)
  [Heartbeat] Refactor config system (elastic#23467)
  [CI] install docker-compose with retry (elastic#24069)
  Add nodes to filebeat-kubernetes.yaml ClusterRole - fixes elastic#24051 (elastic#24052)
  updating manifest files for filebeat threatintel module (elastic#24074)
  Add Zeek Signatures (elastic#23772)
  Update Beats to ECS 1.8.0 (elastic#23465)
  Support running Docker logging plugin on ARM64 (elastic#24034)
  Fix ec2 metricset fields.yml and add integration test (elastic#23726)
  Only build targz and zip versions of Beats if PACKAGES is set in agent (elastic#24060)
  [Filebeat] Add field definitions for known Netflow/IPFIX vendor fields (elastic#23773)
  [Elastic Agent] Enroll with Fleet Server (elastic#23865)
  [Filebeat] Convert logstash logEvent.action objects to strings (elastic#23944)
  [Ingest Management] Fix reloading of log level for services (elastic#24055)
  Add Agent standalone k8s manifest (elastic#23679)
  [Metricbeat][Kubernetes] Extend state_node with more conditions (elastic#23905)
  [CI] googleStorageUploadExt step (elastic#24048)
  Check fields are documented for aws metricsets (elastic#23887)
  Update go-concert to 0.1.0 (elastic#23770)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add kubernetes manifests for standalone Agent with kubernetes integration
4 participants