Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of a Suricata datasource configuration #16496

Merged
merged 2 commits into from
Feb 25, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions x-pack/agent/docs/agent_configuration_example.yml
Original file line number Diff line number Diff line change
Expand Up @@ -451,6 +451,23 @@ datasources:
dataset: docker.network
period: 10s

#################################################################################################
### Suricata
#
- id?: suricata-x1
title: Suricata's data
namespace?: "abc"
package:
name: suricata
version: x.x.x
inputs:
- type: log
streams:
- id?: {id}
type: "typeX"
dataset: suricata.logs
path: /var/log/surcata/eve.json
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin this is a followup for our discussion, I've looked a the current implementation of the Suricata module. This is indeed a single input type (logs) mixed outputs (events, alert and metrics). All the generated events are extracted from a single source file the eve.json file.

Now, I don't think we can express that difference at the stream level, the logic is heavily dependent on the ingest pipeline implementation. Is log the right datasource type here? Maybe Event or *File would be more generic and appropriate, or could they be an alias to log?

I think your question is more how are we targetting the right index for these kinds of scenario? Because the above example will use the logs-{dataset}-{namespace} as the destination.

I think the actual solution is to make sure that all the fields that we use: dataset, namespace and type is available for the ingest pipeline and assume that a pipeline can route events if the content is mixed. With our current permission model and final pipeline usage it should just work?

I am not sure that the Suricata case is common.

PS: Beats is also doing that by sending a summary of the stats in the log.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on having the stream.dataset, stream.type and stream.namespace available in all events and make it possible for the ingest pipeline to make decisions based on it and put it in different indices if needed.

@andrewkroh Would this make sense for suricata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewkroh If this is OK with you I am going to create the related issues to pass down the required information to generate the target index from an ingest pipeline.

@ruflin concerning the values I presume we are using values from the input when stream.type or stream.namespace aren't defined on the stream?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing some context about the current design. So a final pipeline will be installed to dynamically set the _index for all events based on stream.dataset, stream.type, and stream.namespace. Will those fields be present in all events? And then the suricata.logs dataset will overwrite stream.type to alerts or metrics when needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current design is not exactly what you are describing at the moment the agent will generate the target index based on fields present in the datasource configuration.

If we take the following nginx datasource and only concentrate on the "error"

datasources:
# use the nginx package
- id?: nginx-x1
enabled?: true # default to true
title?: "This is a nice title for human"
# Package this config group is coming from. On importing, we know where it belongs
# The package tells the UI which application to link to
package?:
name: epm/nginx
version: 1.7.0
namespace?: prod
constraints?:
# Contraints look are not final
- os.platform: { in: "windows" }
- agent.version: { ">=": "8.0.0" }
use_output: long_term_storage
inputs:
- type: logs
processors?:
streams:
- id?: {id}
enabled?: true # default to true
dataset: nginx.acccess
paths: /var/log/nginx/access.log
- id?: {id}
enabled?: true # default to true
dataset: nginx.error
paths: /var/log/nginx/error.log
- type: nginx/metrics
streams:
- id?: {id}
enabled?: true # default to true
dataset: nginx.stub_status
metricset: stub_status

The agent will take the input type logs and the namespace prod and the dataset nginx.error and will generate the target index to be "logs-error.error-prod" and will send the data to that index. We cannot use the final pipeline to generate the index, because the usage context fleet vs standalone are different and we cannot guarantee the pipeline would be installed before.

Now, If we look at the Suricata use case, this is the exception that confirms the rules, considering that events: logs, metrics, and alerts are coming from the same source (logs) and we want to disambiguate them and route them to the right index. We see this as a more advanced use case where that logic to identify and route events are part of a pipeline definition.

So based on incoming data and with the aid of the streams.* fields it can make a rerouting decision and send the events to the appropriate index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: It could be part of a final pipeline but at the moment it's up to the specific pipeline to do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the details. I don't see any issue with adding some extra ingest processors to handle modifying the index for logs and alerts.


#################################################################################################
### suggestion 1
- id?: myendpoint-x1
Expand Down