-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example of a Suricata datasource configuration #16496
Example of a Suricata datasource configuration #16496
Conversation
Suricate is using the logs input but creates multiples kind of event, so its a single input mixed output. Lets try to see if type on the stream could work or not.
Pinging @elastic/ingest (Project:fleet) |
- id?: {id} | ||
type: "typeX" | ||
dataset: suricata.logs | ||
path: /var/log/surcata/eve.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruflin this is a followup for our discussion, I've looked a the current implementation of the Suricata module. This is indeed a single input type (logs) mixed outputs (events, alert and metrics). All the generated events are extracted from a single source file the eve.json file.
Now, I don't think we can express that difference at the stream level, the logic is heavily dependent on the ingest pipeline implementation. Is log the right datasource type here? Maybe Event or *File would be more generic and appropriate, or could they be an alias to log?
I think your question is more how are we targetting the right index for these kinds of scenario? Because the above example will use the logs-{dataset}-{namespace}
as the destination.
I think the actual solution is to make sure that all the fields that we use: dataset, namespace and type is available for the ingest pipeline and assume that a pipeline can route events if the content is mixed. With our current permission model and final pipeline usage it should just work?
I am not sure that the Suricata case is common.
PS: Beats is also doing that by sending a summary of the stats in the log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on having the stream.dataset
, stream.type
and stream.namespace
available in all events and make it possible for the ingest pipeline to make decisions based on it and put it in different indices if needed.
@andrewkroh Would this make sense for suricata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewkroh If this is OK with you I am going to create the related issues to pass down the required information to generate the target index from an ingest pipeline.
@ruflin concerning the values I presume we are using values from the input when stream.type
or stream.namespace
aren't defined on the stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing some context about the current design. So a final pipeline will be installed to dynamically set the _index for all events based on stream.dataset
, stream.type
, and stream.namespace
. Will those fields be present in all events? And then the suricata.logs dataset will overwrite stream.type
to alerts
or metrics
when needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current design is not exactly what you are describing at the moment the agent will generate the target index based on fields present in the datasource configuration.
If we take the following nginx datasource and only concentrate on the "error"
beats/x-pack/agent/docs/agent_configuration_example.yml
Lines 28 to 61 in 77f5f68
datasources: | |
# use the nginx package | |
- id?: nginx-x1 | |
enabled?: true # default to true | |
title?: "This is a nice title for human" | |
# Package this config group is coming from. On importing, we know where it belongs | |
# The package tells the UI which application to link to | |
package?: | |
name: epm/nginx | |
version: 1.7.0 | |
namespace?: prod | |
constraints?: | |
# Contraints look are not final | |
- os.platform: { in: "windows" } | |
- agent.version: { ">=": "8.0.0" } | |
use_output: long_term_storage | |
inputs: | |
- type: logs | |
processors?: | |
streams: | |
- id?: {id} | |
enabled?: true # default to true | |
dataset: nginx.acccess | |
paths: /var/log/nginx/access.log | |
- id?: {id} | |
enabled?: true # default to true | |
dataset: nginx.error | |
paths: /var/log/nginx/error.log | |
- type: nginx/metrics | |
streams: | |
- id?: {id} | |
enabled?: true # default to true | |
dataset: nginx.stub_status | |
metricset: stub_status |
The agent will take the input type logs and the namespace prod and the dataset nginx.error and will generate the target index to be "logs-error.error-prod" and will send the data to that index. We cannot use the final pipeline to generate the index, because the usage context fleet vs standalone are different and we cannot guarantee the pipeline would be installed before.
Now, If we look at the Suricata use case, this is the exception that confirms the rules, considering that events: logs, metrics, and alerts are coming from the same source (logs) and we want to disambiguate them and route them to the right index. We see this as a more advanced use case where that logic to identify and route events are part of a pipeline definition.
So based on incoming data and with the aid of the streams.* fields it can make a rerouting decision and send the events to the appropriate index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: It could be part of a final pipeline but at the moment it's up to the specific pipeline to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the details. I don't see any issue with adding some extra ingest processors to handle modifying the index for logs and alerts.
- id?: {id} | ||
type: "typeX" | ||
dataset: suricata.logs | ||
path: /var/log/surcata/eve.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing some context about the current design. So a final pipeline will be installed to dynamically set the _index for all events based on stream.dataset
, stream.type
, and stream.namespace
. Will those fields be present in all events? And then the suricata.logs dataset will overwrite stream.type
to alerts
or metrics
when needed?
Co-Authored-By: Andrew Kroh <[email protected]>
Will create followup issues for |
Created #16562 for the stream.* discussion. |
* Example of a Suricata datasource configuration Suricate is using the logs input but creates multiples kind of event, so its a single input mixed output. Lets try to see if type on the stream could work or not. * Update x-pack/agent/docs/agent_configuration_example.yml Co-Authored-By: Andrew Kroh <[email protected]> Co-authored-by: Andrew Kroh <[email protected]>
Suricata is using the logs input but creates multiples kind of event, so
its a single input mixed output. Lets try to see if type on the stream
could work or not.