Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Agent] Exposed stream.type, stream.dataset and stream.namespace to every events. #16562

Closed
ph opened this issue Feb 25, 2020 · 8 comments · Fixed by #17468
Closed

[Agent] Exposed stream.type, stream.dataset and stream.namespace to every events. #16562

ph opened this issue Feb 25, 2020 · 8 comments · Fixed by #17468
Assignees

Comments

@ph
Copy link
Contributor

ph commented Feb 25, 2020

Some modules define like Suricata a single input where mixed events are generated: metrics, logs or alerts. The identification of the data is often done in the ingest pipeline, where the type is generated from fields or values from the data.

The way the agent operates it assumes unique type of data would be generated from input. This is normally true, but there is an exception like the Suricata system. To allow the maximun flexibility we should add fields to the event to allow the target index to be generated inside an ingest pipeline.

To do so we want to add the following fields to each event.

  • stream.dataset
  • stream.type
  • stream.namespace

The values for stream.type and stream.namespace are inherited from the datasource and the input definition.

So see the following examples for generation rules:

datasources: 
   - id: nginx-x1 
     namespace: prod 
     inputs: 
       - type: logs 
         streams: 
           - id: {id} 
             enabled?: true # default to true 
             dataset: nginx.acccess 
             paths: /var/log/nginx/access.log 

Will generate an event with the following values:

fields:
 type: logs
 namespace: prod
 dataset: nginx.access
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest (Project:fleet)

@ph
Copy link
Contributor Author

ph commented Feb 25, 2020

@ruflin I see two issues here:

  1. For adding fields on the event do we want to use the add_fields processors or fields defined in the input? I suppose we want to formalize one way to do it and maybe it's a good time to do so?

  2. Are we allowing a user to define the type, namespace at the stream level? If so we need to change the index generation code.

@ph
Copy link
Contributor Author

ph commented Feb 25, 2020

Maybe this could be done on the Fleet side? I worry adding magic fields or processors especially since they could impact other user-defined processors, because ordering is important and having values modified without you knowing is not nice?

@ruflin
Copy link
Contributor

ruflin commented Feb 27, 2020

I think fields should be deprecated in favor or add_fields. The problem about doing it on the Fleet side is if someone sets up the agent manually, the fields will not be there. My thinking here is that adding this fields should be done by each Beat and not rely on the agent. This keeps the agent simple and adding it to the existing modules should be pretty straight forward. This might also allow an easier migration path from Beats to Agent in the future.

Namespace: Lets not allow the user to modify the namespace at the stream level to keep things simple.

There is an other reason having this fields in each even is important. Lets assume we support LS output in the future an all events are sent through LS. I expect LS to use these fields to make the right decision on where to send the data. (@jsvd FYI)

@ph
Copy link
Contributor Author

ph commented Feb 27, 2020

I think fields should be deprecated in favor or add_fields. The problem about doing it on the Fleet side is if someone sets up the agent manually, the fields will not be there. My thinking here is that adding this fields should be done by each Beat and not rely on the agent. This keeps the agent simple and adding it to the existing modules should be pretty straight forward. This might also allow an easier migration path from Beats to Agent in the future.

To have that logic in beats this would mean that we add type, namespace and dataset as configuration field in the input and the input magically use add_fields?

@ruflin
Copy link
Contributor

ruflin commented Feb 27, 2020

Yes. If add_fields is used or hardcoded does not matter in the end.

@ruflin ruflin assigned ph Mar 5, 2020
@ph ph removed their assignment Mar 10, 2020
@ph ph changed the title [Agent] Exposed required data to generate an index from an ingest pipeline. [Agent] Exposed stream.type, stream.dataset and stream.namespace to every events. Mar 17, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:ingest-management)

@ph
Copy link
Contributor Author

ph commented Mar 31, 2020

@michalpristas @ruflin I think this has felt into a crack. Michal can you take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants