Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fields to integrations by default #949

Closed
ishleenk17 opened this issue Aug 25, 2022 · 44 comments
Closed

Add fields to integrations by default #949

ishleenk17 opened this issue Aug 25, 2022 · 44 comments
Labels
discuss Team:Fleet Label for the Fleet team

Comments

@ishleenk17
Copy link

As part of elastic package, it would be good to have the below fields in the default template of elastic package create, so that every integration inherits these.

  1. Tags
  2. Processors
  3. ignore-older

These fields can be optional for the users.

The above would provide us unanimity across integrations and more flexibility to users.

cc: @ruflin

@jlind23 jlind23 added the Team:Ecosystem Label for the Packages Ecosystem team label Aug 25, 2022
@ishleenk17 ishleenk17 assigned ishleenk17 and unassigned ishleenk17 Aug 25, 2022
@mtojek
Copy link
Contributor

mtojek commented Aug 25, 2022

pinging @elastic/ecosystem and @mukeshelastic for prioritization

@ruflin
Copy link
Contributor

ruflin commented Aug 25, 2022

One of the options is to add these to each package. The other one is to have it as a feature of Fleet like data_stream.dataset value.

I think ignore_older falls under its own category as it only applies to the logs input.

@jsoriano
Copy link
Member

jsoriano commented Aug 25, 2022

I agree that they would offer a lot of flexibility to users, but I think that there are already other efforts intended to cover these areas.

For example custom processing is being added to Fleet. Having it in Fleet makes it available for all packages without adding anything. Any improvement in this area will benefit all packages, including existing ones.
When adding custom processing you will end up needing custom mappings or other features, that is better to solve in a single place.
Adding processors in packages would offer another way of doing custom processing, but limited, and not aligned with other efforts. Also, this processing would happen on the agent side, when the current tendency is to bring processing to the ingest nodes.

Tags would be covered by any solution intended for custom processing. If you can do custom processing, you can add tags.
If we want to have specific settings to add tags instead of using generic custom processing, I think that this should also be implemented in Fleet.

Regarding ignore_older. This is a specific setting of an specific input. There are many other settings and many other inputs, it is not possible to expose them all in all packages. For this a better solution would be input packages, another effort already in progress, that will allow to expose input-specific settings, and make them available for its use in combination with other packages.

@ruflin
Copy link
Contributor

ruflin commented Aug 26, 2022

We need to split up a few things here:

  • global processing: should be done in ingest pipelines
  • local processing where local knowledge is needed: things like looking up host info, dropping filtering events

I don't think the custom processing covers the local case. For tags, it is also a bit different. Assuming we have multiple policies shipping system metrics. All metrics go to the same data stream, but for the "east coast" policy a tag should be added to each event and one for the "west coast" policy. This can't be done globally AFAIK.

@jsoriano
Copy link
Member

Good points on global vs. local processing. I still wonder if, given that we want this in all packages (or all packages of certain kinds), the config for local processing should be also managed by Fleet. This would allow to change the implementation without needing to change all packages.
Imagine for example looking up host info (or any other local info), it could be based now on local processors, but there could be also an implementation based on the enrichment processor that could allow to centralize the collection of this data in one agent, instead of needing implementations and error-prone yaml configurations on each collector of this agent. Something like this has been discussed in some other places.

It is true that we are probably far from having anything based on enrichment processor, but still, I think it'd be preferred if things wanted the same in all packages of a kind are managed by Fleet. For consistency and to reduce the number of things package developers need to take into account.

For tags, it is also a bit different. Assuming we have multiple policies shipping system metrics. All metrics go to the same data stream, but for the "east coast" policy a tag should be added to each event and one for the "west coast" policy. This can't be done globally AFAIK.

Another good point, but this sounds to me more like a limitation of the current approach for custom processing. As a user I would expect to be able to add different custom processing per policy, the data stream is more an implementation detail.

My main point is that requiring to add something to all packages to have some desired functionality sounds like a source of problems: there can be inconsistencies between packages and it increases complexity (and error-proneness) of package development.

@ishleenk17
Copy link
Author

Processors:
I think from the above discussions we agree that local processing is required as well.
The next point for discussion being if we have to do it at package or fleet level.
To avoid inconsistencies across packages, I think it would be a good idea to have these implemented at Fleet level along with custom processing.

Tags:
Is there an agreement that we would be having custom processing at policy level ? That would help us avoid usage of tags.

ignore_older:
@jsoriano : You mentioned that input packages would take care of these. Would this be a default setting available in the appropriate input package or a customer using these input packages for their service would have to add these in config ?

Pinging @elastic/obs-service-integrations

@jsoriano
Copy link
Member

@jsoriano : You mentioned that input packages would take care of these. Would this be a default setting available in the appropriate input package or a customer using these input packages for their service would have to add these in config ?

The log input package would have this option, yes. Ideally these inputs could use the pipelines of other packages, providing a similar functionality to the definition of custom inputs in Filebeat modules configuration.
The UX for this is still under discussion: elastic/kibana#133296.

@ishleenk17
Copy link
Author

@jsoriano : Is there any ticket for custom processing at policy level ?
@ruflin Also, I suppose we should proceed with having processors implementation at fleet level. We can check further on this with Fleet team.

@ruflin
Copy link
Contributor

ruflin commented Aug 29, 2022

The log input package would have this option, yes.

This triggered a more fundamental idea on my end. What if Fleet becomes knowledge about certain "core" inputs like logfile? Instead of each package having to figure out what the necessary fields are for using the logs input instead Fleet would have some basic knowledge. Or a package would inherit it from the log input package? This could be an extension of the input package development elastic/kibana#133296 @kpollich It makes it even more clear that any non input package is basically an extension of an input package (or multiple of them).

@jsoriano
Copy link
Member

Is there any ticket for custom processing at policy level ?

Not that I know, I'd say this is a new idea that has appeared here. Pinging @kpollich for his opinion here.

What if Fleet becomes knowledge about certain "core" inputs like logfile? Instead of each package having to figure out what the necessary fields are for using the logs input instead Fleet would have some basic knowledge. Or a package would inherit it from the log input package?

Yes, I think we should go in the direction of including this information in the specific input packages and not on every package. Fleet would also get the knowledge about these options from the input packages, this knowledge shouldn't be hardcoded in Fleet.

@ishleenk17
Copy link
Author

ishleenk17 commented Aug 30, 2022

Also, I suppose we should proceed with having processors implementation at fleet level. We can check further on this with Fleet team.

@kpollich: Could you please let us know about any dependencies if these processors are required to be implemented at Fleet Level.

@kpollich
Copy link
Member

kpollich commented Aug 30, 2022

Is there any ticket for custom processing at policy level ?

I believe this is the issue we're looking for here: elastic/kibana#122892

Could you please let us know about any dependencies if these processors are required to be implemented at Fleet Level.

I don't think we have any real blockers on our end, as we already support custom processors for the Custom Logs integration. We'd just need to do some definition/architecture around how we'll expand that support to all integrations.

What if Fleet becomes knowledge about certain "core" inputs like logfile? Instead of each package having to figure out what the necessary fields are for using the logs input instead Fleet would have some basic knowledge. Or a package would inherit it from the log input package?

Yes, I think we should go in the direction of including this information in the specific input packages and not on every package. Fleet would also get the knowledge about these options from the input packages, this knowledge shouldn't be hardcoded in Fleet.

I like these thoughts, but I wonder about the feasibility of "cross package lookups". e.g. I have the nginx package which "extends" the logs input package. The logs input package defines support for custom processors, ignore_older, etc. Does Fleet need to go fetch that input package (or otherwise have some internal reference to it) in order to resolve those fields/options? Right now a single package is its own source of truth. If we have this notion of extending "base packages" we break that assumption which feels incorrect to me.

If this isn't the case, and a single package manifest still contains all the package data Fleet needs to operate, then the "base package" notion is not a Fleet concern and instead should be handled by the elastic-package CLI when bootstrapping a package. e.g. we can create an nginx package that extends the logs base package and is pre-populated with the extended fields.

@ruflin
Copy link
Contributor

ruflin commented Aug 31, 2022

I like these thoughts, but I wonder about the feasibility of "cross package lookups". e.g. I have the nginx package which "extends" the logs input package. The logs input package defines support for custom processors, ignore_older, etc. Does Fleet need to go fetch that input package (or otherwise have some internal reference to it) in order to resolve those fields/options? Right now a single package is its own source of truth. If we have this notion of extending "base packages" we break that assumption which feels incorrect to me.

If this isn't the case, and a single package manifest still contains all the package data Fleet needs to operate, then the "base package" notion is not a Fleet concern and instead should be handled by the elastic-package CLI when bootstrapping a package. e.g. we can create an nginx package that extends the logs base package and is pre-populated with the extended fields.

All very good thoughts. To not derail the initial conversation of the Github issue, should we take this to a new issue?

@ishleenk17
Copy link
Author

I don't think we have any real blockers on our end, as we already support custom processors for the Custom Logs integration. We'd just need to do some definition/architecture around how we'll expand that support to all integrations.

@kpollich I have raised a Github ticket for the same. elastic/kibana#139900.
Please let me know if this can be picked as part of next iteration. There have been multiple SDH requests for the same.

I believe this is the issue we're looking for here: elastic/kibana#122892

Also, regarding elastic/kibana#122892, for custom processing at policy level.
Could you please share the completion schedule for the same.

@kpollich
Copy link
Member

kpollich commented Sep 1, 2022

cc @jen-huang @nimarezainia - See above for requested work in 8.6

@ishleenk17 ishleenk17 changed the title Add fields to default elastic package template Add fields to integrtions by default Sep 5, 2022
@ishleenk17 ishleenk17 self-assigned this Sep 5, 2022
@ishleenk17 ishleenk17 changed the title Add fields to integrtions by default Add fields to integrtaions by default Sep 5, 2022
@ishleenk17
Copy link
Author

If we implement processors in Fleet UI, what would be the behaviour of the integrations that already have a processor yaml box and are brought up on the new UI.
@ruflin @kpollich : Thoughts ?

@ishleenk17 ishleenk17 changed the title Add fields to integrtaions by default Add fields to integrations by default Sep 12, 2022
@ishleenk17
Copy link
Author

Gisting out AI's here:

  1. Processors: This is which we expect to be added through Fleet:
    Support of processors across integrations through Fleet kibana#139900.

cc @jen-huang @nimarezainia - See above for requested work in 8.6

@jen-huang @nimarezainia : Could you please incorporate this in 8.6 planning. It would be a great addition for integrations from Fleet.

  1. Tags: If the processing is at policy level, having tags can be avoided.
    @kpollich, @jen-huang @nimarezainia : [Fleet] Add input settings at the agent policy level kibana#122892 Is this planned for 8.6 ?

  2. ignore_older: For now if there is an urgent ask for this, we will add it to specific integration. Once input package are implemented, we expect these to be part of the input

@ruflin
Copy link
Contributor

ruflin commented Sep 12, 2022

If we implement processors in Fleet UI, what would be the behaviour of the integrations that already have a processor yaml box and are brought up on the new UI.

I expect that we have quite a few scenarios in the future where certain configs become obsolete because these will be removed eventually or were taken over by another config. In an ideal scenario we can just migrate users but I guess this is not the case here. One idea would be to mark this fields as deprecated. For any user where the field already contains content, it still shows up and these users have to migrate over to the new feature manually. For any other user, it does not show up.

@ishleenk17
Copy link
Author

As per Kyles comment here, looks like having a generic raw yml box is not being looked at for near future.
In that case all of these processors, tags, condition addition has to be taken up individually.
@ruflin @gizas : Please share your thoughts here

@gizas
Copy link
Contributor

gizas commented Sep 27, 2022

Added a follow up elastic/kibana#139900 (comment) .
I would like to clarify how soonish can any kind of such changes start going into roadmap.

Additionally processors are a little different than conditions, because can live for start only in the top level (if that eases the implementation) and do not interfere with any existing data stream field. So the text box for conditions can start and independently continue the discussions for any new cases like processors that can be more complex

FYI exposing conditions is a critical thing for cloudnative team if we want to support auto-discovery feature in managed agent based on them

@ChrsMark
Copy link
Member

@gizas @ishleenk17 @kpollich let's use elastic/kibana#108525 for the conditions specific topic since it's kind of different one.

@ishleenk17
Copy link
Author

If we are not taking the generic approach of raw yaml, it would be good to carry on discussions for processors/tags here and conditions in the other thread.

@nimarezainia
Copy link

@ishleenk17 Consistency of input processors across all integrations is a goal but we won't be able to address this in the 8.6 timeframe.

What we have done (based on customer requirement) has been to ensure that some of the top integrations already have implemented tags&processors for their datastreams. is there a specific customer request we need to consider? if not we should punt this to a later release and ensure the technical design is appropriate.

We also do plan to address the Global processors on a per policy basis via: https://github.com/elastic/ingest-dev/issues/2442

@ishleenk17
Copy link
Author

@nimarezainia , there are a couple of reasons why we think having this consistency of processors/tags is important:

  • There have been couple of SDH's/customer asks for this enhancement.
  • For all the new integrations we are building this has been an ask in PRD's
  • Solution architects have reached out stating importance of having these while being on call with the customer

if not we should punt this to a later release and ensure the technical design is appropriate.

I agree, the end to end technical design and challenges should be thought through. We will plan to close that in this iteration, post that I suppose we should prioritize this activity.

cc: @rameshelastic @lalit-satapathy

@nimarezainia
Copy link

@ishleenk17 I agree that we need this type of consistency across all integrations and the proposal to do it via fleet seems to be lowest effort approach. Regarding customer requests though, we have addressed those for the most part by modifying the individual packages - so I think there shoudl be less urgency in that sense.

Based on the lack of urgency on this I don' want to disrupt the work/planning that has already started for 8.6 and we will try and pursiue the correct implementation in 8.7 (once prioritized amongst other asks there)

@ishleenk17
Copy link
Author

ishleenk17 commented Nov 8, 2022

we will try and pursiue the correct implementation in 8.7 (once prioritized amongst other asks there)

Hello @nimarezainia , wanted to check if we will be prioritizing this for 8.7 ?
cc: @rameshelastic @lalit-satapathy

@ishleenk17
Copy link
Author

@nimarezainia : There have been a couple of requests for processors in integrations yet again.
This time the ask came in for processors in Redis.
Can we look into this issue and take it up for 8.9 by fixing it through Fleet? It will help us cater to such requests in general across integrations.
cc: @rameshelastic @lalit-satapathy @ruflin @mukeshelastic

@nimarezainia
Copy link

@ishleenk17 our 8.9 is over subscribed. I have added this as a lower priority item for 8.9 but we may have to consider 8.10 or later.

(fyi @kpollich )

@jlind23
Copy link
Collaborator

jlind23 commented May 12, 2023

@andresrc for this is has been push out to alter sprint due to competing priorities.
If this is a high priority on your end feel free to have someone from your team pushing a PR and I will assign someone to review it.

@andresrc andresrc added Team:Fleet Label for the Fleet team and removed Team:Ecosystem Label for the Packages Ecosystem team labels May 12, 2023
@elasticmachine
Copy link
Collaborator

Pinging @elastic/fleet (Team:Fleet)

@nimarezainia
Copy link

The UX proposal for this, as was developed for some of the most utilized integrations, is here: https://docs.google.com/document/d/18x1q_jog3_xgCdsU20wVZcyZz1_ZiI4pXGE5ZMma6Ho/edit?usp=sharing

I'm moving this out of PM definition into tech.

@kpollich
Copy link
Member

kpollich commented Nov 1, 2023

I've created elastic/kibana#170336 to capture the plan for adding default field appending logic for all data streams to Fleet

@kpollich
Copy link
Member

Closing in favor of elastic/kibana#170336

@kpollich kpollich closed this as not planned Won't fix, can't repro, duplicate, stale Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Team:Fleet Label for the Fleet team
Projects
None yet
Development

No branches or pull requests