[Agent] Exposed stream.type, stream.dataset and stream.namespace to every events. #16562

ph · 2020-02-25T15:10:34Z

Some modules define like Suricata a single input where mixed events are generated: metrics, logs or alerts. The identification of the data is often done in the ingest pipeline, where the type is generated from fields or values from the data.

The way the agent operates it assumes unique type of data would be generated from input. This is normally true, but there is an exception like the Suricata system. To allow the maximun flexibility we should add fields to the event to allow the target index to be generated inside an ingest pipeline.

To do so we want to add the following fields to each event.

stream.dataset
stream.type
stream.namespace

The values for stream.type and stream.namespace are inherited from the datasource and the input definition.

So see the following examples for generation rules:

datasources: 
   - id: nginx-x1 
     namespace: prod 
     inputs: 
       - type: logs 
         streams: 
           - id: {id} 
             enabled?: true # default to true 
             dataset: nginx.acccess 
             paths: /var/log/nginx/access.log

Will generate an event with the following values:

fields:
 type: logs
 namespace: prod
 dataset: nginx.access

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-02-25T15:10:36Z

Pinging @elastic/ingest (Project:fleet)

ph · 2020-02-25T15:13:14Z

@ruflin I see two issues here:

For adding fields on the event do we want to use the add_fields processors or fields defined in the input? I suppose we want to formalize one way to do it and maybe it's a good time to do so?
Are we allowing a user to define the type, namespace at the stream level? If so we need to change the index generation code.

ph · 2020-02-25T15:29:33Z

Maybe this could be done on the Fleet side? I worry adding magic fields or processors especially since they could impact other user-defined processors, because ordering is important and having values modified without you knowing is not nice?

ruflin · 2020-02-27T08:10:03Z

I think fields should be deprecated in favor or add_fields. The problem about doing it on the Fleet side is if someone sets up the agent manually, the fields will not be there. My thinking here is that adding this fields should be done by each Beat and not rely on the agent. This keeps the agent simple and adding it to the existing modules should be pretty straight forward. This might also allow an easier migration path from Beats to Agent in the future.

Namespace: Lets not allow the user to modify the namespace at the stream level to keep things simple.

There is an other reason having this fields in each even is important. Lets assume we support LS output in the future an all events are sent through LS. I expect LS to use these fields to make the right decision on where to send the data. (@jsvd FYI)

ph · 2020-02-27T13:17:15Z

I think fields should be deprecated in favor or add_fields. The problem about doing it on the Fleet side is if someone sets up the agent manually, the fields will not be there. My thinking here is that adding this fields should be done by each Beat and not rely on the agent. This keeps the agent simple and adding it to the existing modules should be pretty straight forward. This might also allow an easier migration path from Beats to Agent in the future.

To have that logic in beats this would mean that we add type, namespace and dataset as configuration field in the input and the input magically use add_fields?

ruflin · 2020-02-27T13:39:10Z

Yes. If add_fields is used or hardcoded does not matter in the end.

elasticmachine · 2020-03-27T12:08:23Z

Pinging @elastic/ingest-management (Team:ingest-management)

ph · 2020-03-31T13:46:51Z

@michalpristas @ruflin I think this has felt into a crack. Michal can you take a look?

ph added the Project:fleet label Feb 25, 2020

ph mentioned this issue Feb 25, 2020

Example of a Suricata datasource configuration #16496

Merged

ruflin assigned ph Mar 5, 2020

ph removed their assignment Mar 10, 2020

ph changed the title ~~[Agent] Exposed required data to generate an index from an ingest pipeline.~~ [Agent] Exposed stream.type, stream.dataset and stream.namespace to every events. Mar 17, 2020

ph added the Team:ingest-management label Mar 27, 2020

ruflin mentioned this issue Mar 31, 2020

Use stream.dataset instead of event.dataset elastic/package-registry#315

Closed

ph assigned michalpristas Mar 31, 2020

michalpristas mentioned this issue Apr 3, 2020

[Agent] Expose stream.* data in every event #17468

Merged

6 tasks

michalpristas closed this as completed in #17468 Apr 14, 2020

michalpristas mentioned this issue Apr 15, 2020

Cherry-pick #17468 to 7.x: [Agent] Expose stream.* data in every event #17722

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Agent] Exposed stream.type, stream.dataset and stream.namespace to every events. #16562

[Agent] Exposed stream.type, stream.dataset and stream.namespace to every events. #16562

ph commented Feb 25, 2020

elasticmachine commented Feb 25, 2020

ph commented Feb 25, 2020

ph commented Feb 25, 2020

ruflin commented Feb 27, 2020

ph commented Feb 27, 2020

ruflin commented Feb 27, 2020

elasticmachine commented Mar 27, 2020

ph commented Mar 31, 2020

[Agent] Exposed stream.type, stream.dataset and stream.namespace to every events. #16562

[Agent] Exposed stream.type, stream.dataset and stream.namespace to every events. #16562

Comments

ph commented Feb 25, 2020

elasticmachine commented Feb 25, 2020

ph commented Feb 25, 2020

ph commented Feb 25, 2020

ruflin commented Feb 27, 2020

ph commented Feb 27, 2020

ruflin commented Feb 27, 2020

elasticmachine commented Mar 27, 2020

ph commented Mar 31, 2020