-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Filebeat kubernetes module. {issue}10812[10812] #10912
Conversation
Pinging @elastic/secops |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alakahakai for this. I think that supporting envoy and coredns is going to be a great addition.
But I think that they should be two different modules, and not linked to kubernetes. Even if both envoy and coredns are services that are usually deployed on kubernetes they can also be run with other orchestrators or even as standalone services.
Filebeat modules should focus on supporting specific services, no matter how they are deployed.
coredns: | ||
enabled: true | ||
envoy: | ||
enabled: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they should be two modules, one for coredns and another one for envoy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the suggestion. Let me see whether doing these two modules separately will allow them to work in kubernetes deployments. Some additional work might still be needed.
description: > | ||
Fields from coredns logs after normalization | ||
fields: | ||
- name: duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add descriptions for fields in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out how I handled event.duration
across very different sources in #10274. I think that's a pattern we can likely reuse broadly. This will reduce compilations a lot across the board, if we can always use the same snippet.
type: integer | ||
|
||
- name: response_flags | ||
type: keyword |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use ECS fields for http requests and responses https://github.com/elastic/ecs#-http-fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Jaime's comment here and below. I've added a few more as well.
Is the intent of this module to preserve the original information under envoy.*
, and only copy out to the ECS fields?
In most places we've simply moved all of the data to the ECS field directly, and not defined the field for the source naming. Of course you still need to define all of the fields that don't actually map to ECS.
- name: response_flags | ||
type: keyword | ||
|
||
- name: bytes_received |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use http.request.bytes
for this?
- name: bytes_sent | ||
type: long | ||
|
||
- name: duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
event.duration
? (https://github.com/elastic/ecs#-event-fields)
field: "message" | ||
target_prefix: "coredns" | ||
- drop_fields: | ||
fields: [time] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be better to place all processing in the ingest pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just like for Zeek, I'm carrying the fight over here ;-)
I like the idea of splitting out the logs here. It lets people use Beat processors that are inherently agent-side. There's no PID in this specific log, but one may want to do reverse DNS on the IPs.
I would directly name the fields following ECS here.
"%{timestamp} [%{event.level}] %{client.ip}:%{client.port} ...
And make sure to copy client.* to source.*. We're trying to always have source.*
as a baseline in ECS.
target_prefix: "envoy" | ||
- drop_fields: | ||
fields: [time] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be better to do all processing in the ingest pipeline.
labels: | ||
k8s-app: ingest | ||
--- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already deploy files in deploy/kubernetes
, could this be added there?
var: | ||
- name: paths | ||
default: | ||
- /var/lib/docker/containers/*/*-json.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will take logs from all containers running in the system, we probably want to set here some safe default, or nothing at all so setting something is required.
Take into account that if this module is used with kubernetes or docker autodiscover this path will be easily set to the one of the specific envoy container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see file ingest-k8s.yaml and pull 10911 for addressing this. Note that the file names are using some generated 64-byte id here hence hard to specify without auto-discovery,.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless specifically desired, I think we should as much as possible use the ECS fields directly, without defining the source fields that are mapped cleanly to ECS. Of course all of the fields that don't map should be defined, as you're doing.
Noted a few other minor things.
@@ -244,6 +244,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d | |||
- Add ISO8601 timestamp support in syslog metricset. {issue}8716[8716] {pull}10736[10736] | |||
- Add more info to message logged when a duplicated symlink file is found {pull}10845[10845] | |||
- Add Netflow module to enrich flow events with geoip data. {pull}10877[10877] | |||
- Add Filebeat kubernetes module. {issue}10812[10812] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would link the Changelog to this PR which will have all of the action, not the original issue.
- name: upstream_service_time | ||
type: double | ||
|
||
- name: forwarded_for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing for xff in ECS yet
- name: request_id | ||
type: keyword | ||
|
||
- name: authority |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is an RFC Authority (name or IP + port), this can go in .address
- name: forwarded_for | ||
type: keyword | ||
|
||
- name: http_user_agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ECS user_agent.original
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, all the related ECS fields are added via the beats or ingest pipeline processors. The original fields are kept for reference. I guess I should remove these original fields instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The team has been going back and forth on this.
For all plaintext logs, the cutover has been pretty straightforward and drastic. We parse right into the ECS fields as much as we can, and add custom fields for everything not covered.
For Suricata, we started with the assumption that some pre-existing content or community experience existed, based on the exact Suricata field names & so on, because of the existence of their EVE JSON log format. I'm not sure if that's the case or not.
But since 7.0, we've started "hollowing out" the suricata.eve.*
namespace, and replacing all of the fields that map to ECS with field aliases anyway (examples).
So I think we should do the same here. Especially since this is a brand new module. Let's not create legacy we'll have to clean up in 8.0 ;-)
- name: upstream_port | ||
type: integer | ||
|
||
- name: message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ECS message
field: "message" | ||
target_prefix: "coredns" | ||
- drop_fields: | ||
fields: [time] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just like for Zeek, I'm carrying the fight over here ;-)
I like the idea of splitting out the logs here. It lets people use Beat processors that are inherently agent-side. There's no PID in this specific log, but one may want to do reverse DNS on the IPs.
I would directly name the fields following ECS here.
"%{timestamp} [%{event.level}] %{client.ip}:%{client.port} ...
And make sure to copy client.* to source.*. We're trying to always have source.*
as a baseline in ECS.
description: > | ||
Fields from coredns logs after normalization | ||
fields: | ||
- name: duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out how I handled event.duration
across very different sources in #10274. I think that's a pattern we can likely reuse broadly. This will reduce compilations a lot across the board, if we can always use the same snippet.
"script": { | ||
"lang": "painless", | ||
"source": "ctx.event.created = ctx['@timestamp']; ctx['@timestamp'] = ctx['coredns']['timestamp']; ctx.coredns.remove('timestamp');", | ||
"ignore_failure" : true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should parse the date in ingest node, to ensure the document content is consistent. Even if the Elasticsearch date
datatype is able to parse many things, we want the source document itself to be consistent.
It's a thing ECS will have to make more clear in the future, IMO. (See what happened with the Zeek Unix ts)
{ | ||
"set": { | ||
"field": "user_agent.original", | ||
"value": "{{envoy.http_user_agent}}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to copy it around to avoid clobbering anymore. Since 7.0, the UA parser re-outputs the .original
.
So you can directly populate user_agent.original
and have:
{
"user_agent": {
"field": "user_agent.original",
"ignore_missing": true
}
}
and the right thing will happen :-)
] | ||
env: | ||
- name: ELASTICSEARCH_HOST | ||
value: 192.168.99.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this IP be here (same for KIBANA_HOST below)? Is this a Kubernetes default for the gateway / your host?
I will try to break this into two separate modules - one for coredns, the other for envoy; And move them to non x-pack. Should additional things required for these to work in Kubernetes, a separate kubernetes module would be added then. |
Close this PR. I will open new issues and PRs for coredns and envoy. |
Add Filebeat kubernetes module. #10812. Closed old PR #10844.