Test and verify that Elastic-Agent with k8s and system integrations run on Openshift #2065

ChrsMark · 2021-10-29T14:21:21Z

We need to verify that both standalone and managed Agent can run properly on Openshift with the proposed manifests. system and kubernetes packages should be running without a problem.

Please take as an example the work done for Metricbeat/Filebeat: elastic/beats#17516
We can verify this on minishift but running on actual Openshift deployment is recommended.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-10-29T14:21:39Z

Pinging @elastic/integrations (Team:Integrations)

tetianakravchenko · 2021-12-28T17:47:09Z

Testing environments:

Minishift - supports only openshift version 3

minishift version:

minishift version
minishift v1.34.3+4b58f89

minishift started with virtualbox driver, issue - minishift/minishift#3494:

minishift start --vm-driver virtualbox

(works on virtualbox version 6.1.26)
openshift version:

minishift openshift version
openshift v3.11.0+32a500f-598

openshift client (installed from https://mirror.openshift.com/pub/openshift-v3/clients/):

oc version
oc v3.11.587
kubernetes v1.11.0+d4cacc0
features: Basic-Auth

Standalone elastic-agent

How-to: https://www.elastic.co/guide/en/fleet/current/running-on-kubernetes-standalone.html

elastic-agent errors:

6 leaderelection.go:329] error initially creating leader election record: the server could not find the requested resource (post leases.coordination.k8s.io)

reason:
to start an agent we are using Role:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: elastic-agent
  # should be the namespace where elastic-agent is running
  namespace: kube-system
  labels:
    k8s-app: elastic-agent
rules:
  - apiGroups:
      - coordination.k8s.io
    resources:
      - leases
    verbs: ["get", "create", "update"]
---

(this change was introduced in elastic/beats#24958).

there is no coordination API in the cluster:

oc api-resources | grep coordination

just could find that Lease API was promoted to v1 in version 1.14 - https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.14.md:

The Lease API type in the coordination.k8s.io API group is promoted to v1

Openshift is actually using kubernetes v1.11

Question here: do we want to support this k8s version? might be related to the elastic/beats#29604

tetianakravchenko · 2021-12-30T14:01:16Z

CRC - openshift version 4

crc version:

crc version
CodeReady Containers version: 1.37.0+3876d27d
OpenShift version: 4.9.10 (bundle installed at /Applications/CodeReady Containers.app/Contents/Resources/crc_hyperkit_4.9.10.crcbundle)

openshift & k8s version:

crc start
eval $(crc oc-env)
% oc version
Client Version: 4.9.10
Server Version: 4.9.10
Kubernetes Version: v1.22.3+ffbb954

Standalone elastic-agent

How-to: https://www.elastic.co/guide/en/fleet/current/running-on-kubernetes-standalone.html

oc login -u kubeadmin
oc apply -f elastic-agent-standalone-kubernetes.yaml

✅ kubernetes-node-metrics input:

✅ works:

kubernetes.container
kubernetes.node
kubernetes.pod
kubernetes.system
kubernetes.volume

❌ doesn't work from the default config:
kubernetes.proxy

error getting processed metrics: error making http request: Get "http://localhost:10249/metrics": dial tcp 127.0.0.1:10249: connect: connection refused

Resolved - ✅ elastic/beats#17863:

hosts:
  # Kubernetes
  # - 'localhost:10249'
  # Openshift
  - 'localhost:29101'

kubernetes.controllermanager
Reason for that: default condition doesn't work
Resolved - ✅ change condition:

# condition: ${kubernetes.labels.component} == 'kube-controller-manager'
condition: ${kubernetes.labels.app} == 'kube-controller-manager'

must be commented out!

kubernetes.scheduler
Reason for that: default condition doesn't work
Resolved - ✅ change condition:

# condition: ${kubernetes.labels.component} == 'kube-scheduler'
condition: ${kubernetes.labels.app} == 'openshift-kube-scheduler'

Resource usage for kubernetes-node-metrics input (kubernetes.proxy are commented out):

oc adm top pod
NAME                                  CPU(cores)   MEMORY(bytes)
elastic-agent-8vdns                   740m         475Mi

NOTE: cpu usage is much higher

✅ system-metrics input:

all datastreams works (system.core, system.cpu, system.diskio, system.filesystem, system.fsstat, system.load, system.memory, system.network, system.process, system.process_summary, system.socket_summary)

✅ kubernetes-cluster-metrics input:

✅ works:

kubernetes.apiserver
kubernetes.event
kubernetes.state_container
kubernetes.state_cronjob
kubernetes.state_daemonset
kubernetes.state_deployment
kubernetes.state_job
kubernetes.state_node
kubernetes.state_persistentvolume
kubernetes.state_persistentvolumeclaim
kubernetes.state_pod
kubernetes.state_replicaset
kubernetes.state_resourcequota
kubernetes.state_service
kubernetes.state_statefulset
kubernetes.state_storageclass

To all kubernetes.state_* datasets were added:

bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  ssl.certificate_authorities:
    - /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt

to access https://kube-state-metrics.openshift-monitoring.svc:8443

✅ kubernetes.container_logs

❌ kubernetes.audit_logs -

Resolved - ✅ customized log path - /var/log/kube-apiserver/audit.log

system-logs - no sense to enable it, as logs path doesn't exist inside elastic-agent pod: `/var/log/auth.log, /var/log/secure, /var/log/messages, /var/log/syslog*`

❗ For some reason when all enabled, metrics are not received:

2022-01-14T14:48:32.631Z	ERROR	log/reporter.go:36	2022-01-14T14:48:32Z - message: Application: metricbeat--7.16.2[6d04e5ac-1076-4992-afe9-01393daed484]: State changed to FAILED: Missed two check-ins - type: 'ERROR' - sub_type: 'FAILED'

metricbeat errors:

2022-01-27T12:06:06.189Z	ERROR	module/wrapper.go:259	Error fetching data for metricset beat.stats: error making http request: Get "http://unix/stats": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2022-01-27T12:06:06.208Z	ERROR	module/wrapper.go:259	Error fetching data for metricset http.json: error making http request: Get "http://unix/stats": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2022-01-27T12:06:06.461Z	ERROR	module/wrapper.go:259	Error fetching data for metricset beat.state: error making http request: Get "http://unix/state": net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Managed elastic-agent

How-to: https://www.elastic.co/guide/en/fleet/current/running-on-kubernetes-managed-by-fleet.html#running-on-kubernetes-managed-by-fleet

With all inputs enabled time to time I am getting using correct conditions for scheduler and controller-manager:

Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137

Errors:

ERROR	status/reporter.go:236	Elastic Agent status changed to: 'error'
2022-01-14T15:04:33.405Z	ERROR	log/reporter.go:36	2022-01-14T15:04:33Z - message: Application: filebeat--7.16.2[dc7c110f-3828-4aec-bbfa-777dffad1527]: State changed to FAILED: Missed two check-ins - type: 'ERROR' - sub_type: 'FAILED'
2022-01-14T15:04:33.456Z	ERROR	log/reporter.go:36	2022-01-14T15:04:33Z - message: Application: metricbeat--7.16.2[dc7c110f-3828-4aec-bbfa-777dffad1527]: State changed to FAILED: Missed two check-ins - type: 'ERROR' - sub_type: 'FAILED'

oc adm top pod
NAME                                  CPU(cores)   MEMORY(bytes)
elastic-agent-764rs                   1499m        448Mi

and seems events processing just stuck. dashboards are empty
2. Disabled container logs:

 oc adm top pod
NAME                                  CPU(cores)   MEMORY(bytes)
elastic-agent-764rs                   533m         439Mi

metrics are not consistent:

ChrsMark · 2022-01-25T07:40:30Z

Regarding kube-proxy there is some feedback at elastic/beats#17863, which proves that it is doable to make kube-proxy metricsets/datastream to work with the proper configuration on k8s side. If we verify this we can document the steps required on k8s side and make kube-proxy available at Openshift both for Metricbeat and Agent.

tetianakravchenko · 2022-01-25T10:07:15Z

I've already checked kube-proxy with the provided config and it works, will add it to the list.

ChrsMark · 2022-01-25T10:29:27Z

Thank you @tetianakravchenko ! Do you also plan to update the documentation accordingly? We have this section -> running-on-kubernetes.html#_red_hat_openshift_configuration about Openshift specifics so this would be good fit there I think.

tetianakravchenko · 2022-01-26T18:27:24Z

Openshift on GCP:
errors:

Full certificate name should be used:

error making http request: Get "https://name-openshift-v48hb-worker-b-km2m8:10250/stats/summary": x509: certificate is valid for name-openshift-v48hb-worker-b-km2m8.c.project_name.internal, not name-openshift-v48hb-worker-b-km2m8

-> used 'https://${env.NODE_NAME}.c.project_name.internal:10250', but it is specific to the GCP I think
2. When used "https://kube-state-metrics.openshift-monitoring.svc:8443" as a host for kubernetes.state_* datasets, we might miss some prometheus metrics we rely on due to metric deny list - https://github.com/openshift/cluster-monitoring-operator/blob/master/assets/kube-state-metrics/deployment.yaml#L39-L51
Options here: either path kube-state-metrics deployment of cluster-monitoring-operator or install kube-state-metrics aside of elastic-agent (note: resources name shouldn't overrlap/override with the kube-state-metrics of cluster-monitoring-operator)

metrics kubernetes.state_* are collected with gaps:

the only errors I see in metricbeat logs:

2022-01-26T17:55:24.433Z	ERROR	module/wrapper.go:259	Error fetching data for metricset beat.stats: error making http request: Get "http://unix/stats": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2022-01-26T17:55:24.433Z	ERROR	module/wrapper.go:259	Error fetching data for metricset beat.state: error making http request: Get "http://unix/state": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2022-01-26T17:54:06.328Z	ERROR	module/wrapper.go:259	Error fetching data for metricset http.json: error making http request: Get "http://unix/stats": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2022-01-26T17:55:24.567Z	ERROR	module/wrapper.go:259	Error fetching data for metricset beat.stats: error making http request: Get "http://unix/stats": dial unix /usr/share/elastic-agent/state/data/tmp/default/metricbeat/metricbeat.sock: connect: connection refused
2022-01-26T17:55:24.574Z	ERROR	module/wrapper.go:259	Error fetching data for metricset http.json: error making http request: Get "http://unix/stats": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2022-01-26T17:55:24.575Z	ERROR	module/wrapper.go:259	Error fetching data for metricset beat.state: error making http request: Get "http://unix/state": dial unix /usr/share/elastic-agent/state/data/tmp/default/metricbeat/metricbeat.sock: connect: connection refused
2022-01-26T17:55:24.575Z	ERROR	module/wrapper.go:259	Error fetching data for metricset http.json: error making http request: Get "http://unix/stats": dial unix /usr/share/elastic-agent/state/data/tmp/default/metricbeat/metricbeat.sock: connect: connection refused

created an issue for that - elastic/beats#30033

After rising memory limit don't see this error.
Resources usage

oc adm top pods | grep elastic
elastic-agent-b5hxm                                  38m          540Mi
elastic-agent-bjnrz                                  85m          575Mi
elastic-agent-cnlg4                                  133m         600Mi
elastic-agent-dg9zq                                  277m         684Mi <- this one is a leader here (scrape kube-state-metrics)
elastic-agent-j57tg                                  28m          530Mi
elastic-agent-kls6z                                  33m          597Mi

Started documentation PR - https://github.com/elastic/beats/compare/master...tetianakravchenko:openshift-documentation?expand=1

mtojek · 2022-01-27T09:03:09Z

Hey folks! How about adding support for Openshift in elastic-package, similarly to kind? Will it help or not really?

ChrsMark · 2022-01-27T09:41:38Z

Hey folks! How about adding support for Openshift in elastic-package, similarly to kind? Will it help or not really?

That would be great! In the past we had thought about supporting Openshift in our CI but in Beats CI it was a bit more complicated. Now with elastic-package it should be more straight-forward. I'm wondering if it would make sense to use a local "cluster" or a cloud one on GCP for a more official approach. The second one would be more expensive if the cluster is always up but we can verify this with infra team. I think that ECK team does the same to test the operator.

mtojek · 2022-01-27T10:16:01Z

I created the relevant issue and we can take it from there instead of polluting this thread.

tetianakravchenko · 2022-01-27T12:53:31Z

@ChrsMark

Do you also plan to update the documentation accordingly? We have this section -> running-on-kubernetes.html#_red_hat_openshift_configuration about Openshift specifics so this would be good fit there I think.

elastic/beats#30054
it seems elastic-agent documentation is in different repo, would it be the proper place - https://github.com/elastic/observability-docs/blob/7.16/docs/en/ingest-management/elastic-agent/running-on-kubernetes-standalone.asciidoc ? any specific processes here?

ChrsMark · 2022-01-27T13:50:39Z

@ChrsMark

Do you also plan to update the documentation accordingly? We have this section -> running-on-kubernetes.html#_red_hat_openshift_configuration about Openshift specifics so this would be good fit there I think.

elastic/beats#30054 it seems elastic-agent documentation is in different repo, would it be the proper place - https://github.com/elastic/observability-docs/blob/7.16/docs/en/ingest-management/elastic-agent/running-on-kubernetes-standalone.asciidoc ? any specific processes here?

That's the correct place for the docs yes. Nothing special there, we mostly update this specific docs so you can just go ahead and open a PR there and ask for a review from our team.

tetianakravchenko · 2022-02-04T15:19:17Z

For future references I will keep installation processes of crc and openshift on gcp in this issue:

CRC:

create a redhat account
CRC intalator can be downloaded from https://mirror.openshift.com/pub/openshift-v4/clients/crc/ or from the redhat account.
crc setup
crc start
adjust cpu/memory configuration for crc

To enable openshift cluster monitoring:

crc config set enable-cluster-monitoring true

this install monitoring stack in openshift monitoring namespace, use "https://kube-state-metrics.openshift-monitoring.svc:8443" to access kube-state-metrics.
Note: you might miss some metrics - https://github.com/openshift/cluster-monitoring-operator/blob/master/assets/kube-state-metrics/deployment.yaml#L39-L51 if used kube-state-metrics provided by openshift/cluster-monitoring-operator

Openshift on GCP:

use https://github.com/openshift/installer
NOTE: use official openshift-install binary, not the one built from master - in my case worker nodes were not created due to the issue when bootstraping master nodes - some daemonset was crashing.
official openshift-install can be found in you account https://console.redhat.com/openshift/create, here can be found pull secrets as well.

Create a google service account with all the required permissions - https://github.com/openshift/installer/blob/master/docs/user/gcp/iam.md, export this key:

export GCLOUD_KEYFILE_JSON=$(pwd)/elastic-obs-integrations-dev-*.json

Create a folder with the install-config.yml:
$ mkdir GCP
$ cat GCP/install-config.yml

apiVersion: v1
baseDomain: cf-obs.elastic.dev
compute:
- hyperthreading: Enabled
  name: worker
  platform:
    gcp:
      type: n2-standard-4
  replicas: 4
controlPlane:
  hyperthreading: Enabled
  name: master
  platform:
    gcp:
      type: n2-standard-4
  replicas: 3
metadata:
  name: tetiana-openshift
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  gcp:
    projectID: elastic-obs-integrations-dev
    region: us-central1
sshKey: ...
pullSecret: ...

Note!: add pullSecrets in format '{"auths":{"cloud.openshift.com":{"auth":...
add sshKey in format: ssh-... ... ` - it is your public ssh key to access created node
full list of settings - here

create cluster:
./openshift-install create cluster --dir=./GCP --log-level=debug

It creates 1 bootstrap node, 3 master and 3 worker. If you need for some reason ssh to the nodes - use
ssh core@'bootstrap-public-ip'
3. Check installation progress:

oc --kubeconfig=$(pwd)/GCP/auth/kubeconfig get clusterversion

destroy:

./openshift-install destroy cluster --dir=./GCP --log-level=info

ChrsMark · 2022-02-07T11:17:55Z

Thanks for working on this @tetianakravchenko! It will help a lot to enhance our coverage and supportability for what's around the corner.

Myasnik2000 · 2023-01-19T08:58:53Z

Hey folks, can anyone help me and provide elastic-agents configmap for atlassian integration ?

ChrsMark added release-pending Team:Integrations Label for the Integrations team labels Oct 29, 2021

ChrsMark changed the title ~~Test and verify that Elastic-Agent with k8s and system integrations run on OCP~~ Test and verify that Elastic-Agent with k8s and system integrations run on Openshift Oct 29, 2021

tetianakravchenko self-assigned this Nov 4, 2021

ChrsMark mentioned this issue Jan 20, 2022

Kube-proxy support for Openshift elastic/beats#17863

Closed

mtojek mentioned this issue Jan 27, 2022

[Discuss] Add support for OpenShift service deployer elastic/elastic-package#672

Open

tetianakravchenko mentioned this issue Jan 27, 2022

Update openshift documentation, manifests for metricbeat and elastic-agent standalone elastic/beats#30054

Merged

6 tasks

This was referenced Jan 27, 2022

elastic-agent standalone on openshift documentation elastic/observability-docs#1498

Merged

Add ssl.certificate_authorities configuration #2613

Merged

tetianakravchenko closed this as completed Feb 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test and verify that Elastic-Agent with k8s and system integrations run on Openshift #2065

Test and verify that Elastic-Agent with k8s and system integrations run on Openshift #2065

ChrsMark commented Oct 29, 2021 •

edited

Loading

elasticmachine commented Oct 29, 2021

tetianakravchenko commented Dec 28, 2021 •

edited

Loading

tetianakravchenko commented Dec 30, 2021 •

edited

Loading

ChrsMark commented Jan 25, 2022

tetianakravchenko commented Jan 25, 2022 •

edited

Loading

ChrsMark commented Jan 25, 2022

tetianakravchenko commented Jan 26, 2022 •

edited

Loading

mtojek commented Jan 27, 2022

ChrsMark commented Jan 27, 2022

mtojek commented Jan 27, 2022

tetianakravchenko commented Jan 27, 2022 •

edited

Loading

ChrsMark commented Jan 27, 2022

tetianakravchenko commented Feb 4, 2022 •

edited

Loading

ChrsMark commented Feb 7, 2022

Myasnik2000 commented Jan 19, 2023

Test and verify that Elastic-Agent with k8s and system integrations run on Openshift #2065

Test and verify that Elastic-Agent with k8s and system integrations run on Openshift #2065

Comments

ChrsMark commented Oct 29, 2021 • edited Loading

elasticmachine commented Oct 29, 2021

tetianakravchenko commented Dec 28, 2021 • edited Loading

Minishift - supports only openshift version 3

Standalone elastic-agent

tetianakravchenko commented Dec 30, 2021 • edited Loading

CRC - openshift version 4

Standalone elastic-agent

✅ kubernetes-node-metrics input:

✅ system-metrics input:

✅ kubernetes-cluster-metrics input:

✅ kubernetes.container_logs

❌ kubernetes.audit_logs -

system-logs - no sense to enable it, as logs path doesn't exist inside elastic-agent pod: /var/log/auth.log, /var/log/secure*, /var/log/messages*, /var/log/syslog*

Managed elastic-agent

ChrsMark commented Jan 25, 2022

tetianakravchenko commented Jan 25, 2022 • edited Loading

ChrsMark commented Jan 25, 2022

tetianakravchenko commented Jan 26, 2022 • edited Loading

mtojek commented Jan 27, 2022

ChrsMark commented Jan 27, 2022

mtojek commented Jan 27, 2022

tetianakravchenko commented Jan 27, 2022 • edited Loading

ChrsMark commented Jan 27, 2022

tetianakravchenko commented Feb 4, 2022 • edited Loading

CRC:

Openshift on GCP:

ChrsMark commented Feb 7, 2022

Myasnik2000 commented Jan 19, 2023

ChrsMark commented Oct 29, 2021 •

edited

Loading

tetianakravchenko commented Dec 28, 2021 •

edited

Loading

tetianakravchenko commented Dec 30, 2021 •

edited

Loading

system-logs - no sense to enable it, as logs path doesn't exist inside elastic-agent pod: `/var/log/auth.log, /var/log/secure, /var/log/messages, /var/log/syslog*`

tetianakravchenko commented Jan 25, 2022 •

edited

Loading

tetianakravchenko commented Jan 26, 2022 •

edited

Loading

tetianakravchenko commented Jan 27, 2022 •

edited

Loading

tetianakravchenko commented Feb 4, 2022 •

edited

Loading