-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/istio metrics #4253
Feature/istio metrics #4253
Conversation
🚀 Benchmarks reportTo see the full report comment with |
🌐 Coverage report
|
packages/istio/data_stream/istiod_metrics/elasticsearch/ingest_pipeline/default.yml
Show resolved
Hide resolved
@@ -15,3 +15,22 @@ The `access_logs` data stream collects Istio access logs. | |||
{{event "access_logs"}} | |||
|
|||
{{fields "access_logs"}} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can u add a section on how to test it locally? Steps how to install istio, config until you see metrics in ELK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think all the instructions on how to test this locally should end up in the official docs. It doesn't seem to be standard practise in other packages. In order to test it "locally" you need to setup all sort of VMs, K8s cluster and elastic stack with custom properties. I don't think this kind of information should end up in this readme. For the istio specific steps to set this up, I am using the getting started at https://istio.io/latest/docs/setup/getting-started/ which is already documented in the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tried to change this a little: have a look here https://github.com/elastic/integrations/tree/main/packages/nginx_ingress_controller/_dev/build#how-to-setup-and-test-ingress-controller-locally
In general of course not instructions to install all components, you will consider that user has already elk and k8s etc. But reference to starting guide with some additional hints on what to install would be great I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I can add something similar but fortunately there is not much custom configs or manual steps that I needed for either logs or metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are other packages that have additionla readme files in testing that describe in detail how it should be tested. So users using the package will not see it but any package dev has easy access to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruflin reading your comment I would suggest that a more appropriate place for testing docs is packages/istio/data_stream/istiod_metrics/_dev/test/system or anywhere else hidden in the test folders. If I understand correctly adding this test docs here (at packages/istio/_dev/build/docs/README.md will modify the packages/istio/docs/README.md which is used to expose the package docs to the package users).
I'm not a fan of mixing testing docs with official user facing docs where we usually document only the fields exposed by the integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan to include any tests ?
hey @gizas about the tests, |
@ChrsMark can you confirm my assumption at #4253 (comment) that I can't test this PR with prometheus metrics as input similarly to how you do that in your Istio beat module? |
If I understand the question correctly: in general you can have system/integration tests for metrics in packages. See for example #4253 (comment). This under the hood spins a target container using the files at https://github.com/elastic/integrations/tree/main/packages/nats/_dev/deploy. The approach must be documented at https://github.com/elastic/elastic-package/blob/main/docs/howto/system_testing.md. In case of Istio though, since it's not quite straightfarward to spin up an istio target container I would suggest playing with a mocked API that would serve a specific sample as response. This has been done for Would that do the trick here too? |
hey @ChrsMark, thanks for the reply. Diving into the docker-compose.yml used in both integration tests I have a couple of questions though:
You would have a file <1st time scrape data>
---
<2nd time scrape data>
---
<3rd time scrape data>
---
<4th time scrape data>
... and that image would provide a different set of metrics data every time is queried. My question is, since all the metrics in both those integrations have only 1 set of metrics (there is no
If my assumptions on both questions are correct, we could use just the nginx image and serve the static file with the prometheus metrics. @MichaelKatsoulis and @ChrsMark and @mtojek could use please explain why we decided to use the two docker images and this setup? BTW, I agree with your reasoning that since Istio setup is quite memory intensive (I can't run everything on my laptop) and for the Istio metrics integration we only have to mock the prometheus metrics, and we can do that with a static file and nginx image (or 2 docker images), we should just do that for now. |
Correct :) .
I think the mock server just responds the same sample over time, at least this what the
Maybe in
@MichaelKatsoulis do you have the answer handy on this? Is it because we want proxie the request to a specific endpoint internally 🤔 ?
From my side if you have an accessible mock API to serve the response you are good to go :), so whatever that works best for you. But let's wait for @MichaelKatsoulis ' answer on your no2 question above.
|
I really don't. Maybe it was easier for me to reuse an example with both containers than finding how to server a text file with nginx. But just nginx container which will serve the static text file is what we need. |
@ChrsMark and @MichaelKatsoulis I am having some troubles understanding how the system tests should work. Here a couple of questions:
|
That is weird. I just tried them out locally and they pass for all data streams.
They way they work is that an elastic agent is spin up and a policy with specific data stream is created and starts collecting logs from an actual or mock system. Then elastic-package just checks that a data-stream (index) for that data set is created in elasticsearch. That's it. It is assumed that data are being collected and shipped. |
Hey let me try to answer those inline.
Hmm it should not fail 🤔 . What exactly do you run? What do you see? I run from latest master/main: elastic-package stack up -d -v --version=8.6.0-SNAPSHOT
eval "$(elastic-package stack shellinit)"
elastic-package test system -v Note that this should be executed under the ....
--- Test results for package: containerd - START ---
╭────────────┬─────────────┬───────────┬───────────┬────────┬───────────────╮
│ PACKAGE │ DATA STREAM │ TEST TYPE │ TEST NAME │ RESULT │ TIME ELAPSED │
├────────────┼─────────────┼───────────┼───────────┼────────┼───────────────┤
│ containerd │ blkio │ system │ default │ PASS │ 25.721272744s │
│ containerd │ cpu │ system │ default │ PASS │ 27.524840255s │
│ containerd │ memory │ system │ default │ PASS │ 29.275314138s │
╰────────────┴─────────────┴───────────┴───────────┴────────┴───────────────╯
--- Test results for package: containerd - END ---
Done
This looks like the policy is not able to be assigned to the Agent. I think it has to do with the policy itself and not the environment (nginx etc). Is the package policy able to be assinged to an Agent manually? Any other logs that would look suspicious? @jsoriano any ideas how we could better define a troubleshooting process in cases like this?
From https://github.com/elastic/elastic-package/blob/main/docs/howto/system_testing.md:
So the idea is that you test if the package can collect data from the endpoint and hence produce events, and if the events' fields are covered by the package's mappings. |
For future reference,
The following commands were run from within the elastic-package build
elastic-package stack up --services package-registry -d |
I have now added some system test that simulate the prometheus metrics just using a static file and an nginx container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done! Code-wise looks good to me! Only one major comment to address regarding the connectivity to istiod
component.
multi: false | ||
required: true | ||
show_user: true | ||
default: ${kubernetes.labels.app} == 'istiod' and ${kubernetes.annotations.prometheus.io/scrape} == 'true' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing a rehearsal from https://www.elastic.co/blog/istio-monitoring-with-elastic-observability I remember that istiod
could be accessed directly through its service. For example:
Metricbeat config:
- module: istio
metricsets: ['istiod']
period: 10s
hosts: ['istiod.istio-system:15014']
So in that case you don't need a to automatically discover the Pods using a condition, since you can directly access the endpoint through the k8s Service:
istiod.istio-system:15014
where istiod
is the name of the Service and istio-system
is the name of Namespace it belongs to.
Is this still the case or there is sth that has been changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also any update on this @gsantoro ? The Pr can be merged for now and open a small enhancement for this if you think you need time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer to merge and then open a new issue for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But is istiod
exposed through a Service today? We need to clarify this before proceeding.
Also how many istiod
Pods are running per cluster? If it's only one, which is also exposed through a Service, then we should better use istiod.istio-system:15014
as host endpoint along with a leaderelection
condition. Similarly to what we have for cluster level metrics at https://github.com/elastic/integrations/blob/main/packages/kubernetes/data_stream/apiserver/manifest.yml#L21.
If there are more than 1 istiod
Pod per cluster (for high availability maybe) then we should check if those keep and expose the same metrics or each one of them keep and provide different metrics (using a sharding mechanismo for example). In that case we could use the "autodiscovery" approach otherwise if the endpoint of the control plane metrics is unique we should not use the "autodiscovery" approach. See below:
The way we have it now makes it more complicated and resource consuming since we use an "autodiscovery" approach when the endpoint is unique per cluster and static through a Service. This means that every time the istiod
Pod's state is updated we will trigger an event that this Pod is updated and we will re-launch the "autodiscovery" event. This happens at https://github.com/elastic/elastic-agent/blob/main/internal/pkg/composable/providers/kubernetes/pod.go#L218-L232 and if you follow the codebase you can understand that the processing load is quite a lot.
Add to this the fact that "autodiscovery" based inputs are harder to troubleshoot them.
Consequently it'snot a suggested practice to use "autodiscovery" conditions if not really needed and I would prefer fixing it at first place here instead of having a follow up.
Let me know if that makes sense or if I miss anything here.
@gsantoro I remember an issue you had about broken dashboards because datastreams were missing? I can not see a specific filter in the dashboards at the moment so I guess we are ok? Is not it? |
The issue with broken dashboards was because of missing data-view not datastreams and yes there are filters on the dashboards called |
version: 0.1.0 | ||
release: beta | ||
version: 0.2.0 | ||
release: ga |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to make it ga
already? Usually we were making integrations ga
at some point after the initial release. For example at #482 there is a list of items (https://github.com/elastic/beats/blob/master/.github/ISSUE_TEMPLATE/module-checklist.md) that need to be verified before the integration to become ga
. I'm not sure if this has changed. @gizas do you know more on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry that was a change at last minute. The title of the Epic for this iteration 8.6 clearly state that this should become GA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gsantoro the title of the Epic might be misleading but to clarify the main goal was to perform the work in order to be ready for GA. But putting the label in the package before letting the integration being tested from our customers can be tricky.
I would suggest before promoting this 0.2.0 version to production, please do a hot-fix and revert the beta label back.
Maybe this is the chance to deal with above #4253 (comment) as well n the same push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR at #4684
What does this PR do?
Provide an integrations for Istio metrics for both Istiod and sidecar Proxy containers
Checklist
changelog.yml
file.Author's Checklist
How to test this PR locally
Same steps as #3632
Related issues
Screenshots