-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor kubernetes autodiscover to avoid skipping short-living pods #24742
Conversation
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪💚 Flaky test reportTests succeeded. Expand to view the summary
Test stats 🧪
|
Face the same issue, thanks for your great job! When will this refactor be done? |
@hukaixuan thanks for your interest, I expect to dedicate some time to this in the following days, but I cannot give a fixed timeline, stay tunned to this PR 🙂 |
Hey @jsoriano ! |
Hi, this is still a draft, it still doesn't cover all cases, I actually know it doesn't work well with some basic cases. |
bc37e73
to
b3aef73
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good so far 🍻 :
- spotted one tricky line that might be a bug (maybe not :)). I might need to have a second pass with a clearer mind for hidden bug hunting (I believe there are none though)
- @jsoriano do you think you can improve testing notes a bit? I guess you have some specific scenarios you used to test this one. Would be nice to have them logged in this PR both for manual testing but also for future reference maybe.
// so templates that need network don't generate a config. | ||
if c.status.State.Running != nil { | ||
if port.Name != "" && port.ContainerPort != 0 { | ||
event["ports"] = common.MapStr{port.Name: port.ContainerPort} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Will this just keep only the last "accessed" port of the loop because we create a new MapStr
each time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have changed the behaviour here, I have moved data.ports
from the pod event to each container/port event. Let me know if you think this can be a problem.
Before, data.ports
was included only in the pod event, so when configurations matched they didn't include the container metadata, only the pod metadata.
Code in this PR doesn't set the whole collection of data.ports
in the pod event, it sets a single data.ports.<portname>
field in to each container/port event, so when configurations match they include also the container metadata. By doing this each event with a named port is going to contain a single port.
This also fixes something I found misleading during my tests. As data.ports
was included only in pod events, templates with conditions matching container fields, like the following one, didn't work. They work now with the changes here.
templates:
- condition:
contains.kubernetes.container.image: redis
config:
- module: redis
metricsets: ["info", "keyspace"]
hosts: "${data.host}:${data.ports.redis}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see more clearly how this is different now in these test cases: https://github.com/elastic/beats/pull/24742/files#diff-9f24ba274864b933a726842c2b9fb8ee487a13545579938f9e373d0463611142R557
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, what doesn't work now is this case. I will revert this change and do some more tests.
templates:
- condition:
contains.kubernetes.pod.name: redis
config:
- module: redis
hosts: "${data.host}:${data.ports.redis}"
- module: nginx
hosts: "${data.host}:${data.ports.http}"
We need more testing to cover all these cases 😣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, it makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted in 3301827, I will follow with this in a separate PR.
@ChrsMark I have some quite simple kubernetes manifests, that represent some cases in a list in a private manual test case. I would like to translate this at some moment to system or e2e tests. |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thank you!
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! let's make sure we test this thoroughly before releasing 😇
This pull request is now in conflicts. Could you fix it? 🙏
|
/test |
…lastic#24742) Refactor logic in kubernetes autodiscover that decides when to generate events to try to address issues with short-living containers. Kubernetes autodiscover can generate events without network information now (without host or port/ports). This allows to generate events for pods that haven't started yet, or have succeeded/failed before generating a running event. These events still include the container id, so they can be used to collect logs. Still, no start event is generated if no pod ip and no container ids are available. Some helpers have been added to obtain relevant information from pods and their containers. Some additional small refactors are done to improve readability. (cherry picked from commit b4b6e6e)
…-github-pr-comment-template * upstream/master: [Ingest Manager] Keep http and logging config during enroll (elastic#25132) Refactor kubernetes autodiscover to avoid skipping short-living pods (elastic#24742) [libbeat] New decode xml wineventlog processor (elastic#25115) Add svc to agent k8s clusterRole (elastic#25146) Add awsfargate module to collect container logs from Amazon ECS on Fargate (elastic#25041) [Filebeat][Cisco ASA] log enhancement and performance (elastic#24744) Watch kubernetes namespaces for autodiscover metadata for pods (elastic#25117) Cyberark Privileged Access Security module (elastic#24803) [Elastic Agent] Log the container command output with LOGS_PATH (elastic#25150) Fix for tests after `device...` field has been removed (elastic#25141) [Ingest Manager] Restart process on output change (elastic#24907) Set --insecure in container when FLEET_SERVER_ENABLE and FLEET_INSECURE set. (elastic#25137) [filebeat] Update documentation / changelog / beta warnings for the syslog input (elastic#25047) Add support for ignore_inactive in filestream input (elastic#25036) Fix bug with annotations dedot config on k8s not used (elastic#25111)
…24742) (#25167) Refactor logic in kubernetes autodiscover that decides when to generate events to try to address issues with short-living containers. Kubernetes autodiscover can generate events without network information now (without host or port/ports). This allows to generate events for pods that haven't started yet, or have succeeded/failed before generating a running event. These events still include the container id, so they can be used to collect logs. Still, no start event is generated if no pod ip and no container ids are available. Some helpers have been added to obtain relevant information from pods and their containers. Some additional small refactors are done to improve readability. (cherry picked from commit b4b6e6e)
…ng-versions-stack * upstream/master: (28 commits) Add support for parsers in filestream input (elastic#24763) Skip flaky test TestFilestreamTruncate (elastic#25218) backport: Add 7.13 branch (elastic#25189) Update decode_json_fields.asciidoc (elastic#25056) [Elastic Agent] Fix status and inspect command to work inside running container (elastic#25204) Check native environment before starting (elastic#25186) Change event.code and winlog.event_id type (elastic#25176) [Ingest Manager] Proxy processes/elastic-agent to stats (elastic#25193) Update mergify backporting to 7.x and 7.13 (elastic#25196) [Heartbeat]: ensure synthetics version co* [Heartbeat]: ensure synthetics version compatability for suites * address review and fix notice * fix lowercase struct * fix version conflict and rebase * update go.* stuff to master * fix notice.txt * move validate inside sourcempatability for suites (elastic#24777) [Filebeat] Ensure Kibana audit `event.category` and `event.type` are still processed as strings. (elastic#25101) Update replace.asciidoc (elastic#25055) Fix nil panic when overwriting metadata (elastic#24741) [Filebeat] Add Malware Bazaar to Threat Intel Module (elastic#24570) Fix k8s svc selectors mapping (elastic#25169) [Ingest Manager] Make agent retry values for bootstraping configurable (elastic#25163) [Metricbeat] Remove elasticsearc.index.created from the SM code (elastic#25113) [Ingest Manager] Keep http and logging config during enroll (elastic#25132) Refactor kubernetes autodiscover to avoid skipping short-living pods (elastic#24742) [libbeat] New decode xml wineventlog processor (elastic#25115) ...
What does this PR do?
Refactor logic in kubernetes autodiscover that decides when to generate events to try to address #22718.
Kubernetes autodiscover can generate events without network information now (without
host
orport
/ports
). This allows to generate events for pods that haven't started yet, or have succeeded/failed before generating a running event. These events still include the container id, so they can be used to collect logs. Still, no start event is generated if no pod ip and no container ids are available.Some helpers have been added to obtain relevant information from pods and their containers.
Some additional small refactors are done to improve readability.
Why is it important?
Current logic is checking similar things at the pod and container levels, try to simplify this logic focusing in containers only.
No matter what is the state of the pod, if there is a container running or trying to run, even if it is unhealthy, some configuration should be generated, so logs can be collected.
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
data.host
.This is logged at the debug level:
data.host
ordata.ports
.data.ports.*
doesn't seem to be working, check if this is a regression. Named ports were not added to container events. They are now.Author's notes for the future
data.ports
is not included in container-level events, so it doesn't work when conditions for specific containers are used.How to test this PR locally
kubectl run --image=redis redis -- echo foo
).Related issues
Use cases
Collect logs from containers in short-living or failing pods.