Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of processors across integrations through Fleet #139900

Open
ishleenk17 opened this issue Sep 1, 2022 · 8 comments
Open

Support of processors across integrations through Fleet #139900

ishleenk17 opened this issue Sep 1, 2022 · 8 comments
Assignees
Labels
Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@ishleenk17
Copy link

Describe the feature:

Processors to be added by default to all integrations through Fleet.

Describe a specific use case for the feature:

Having processors through Fleet helps in below ways:

  1. Customers get the flexibility to do data enrichment themselves for different integratons/datastreams through UI.
  2. Maintains unanimity across integrations
@ishleenk17 ishleenk17 added the Team:Fleet Team label for Observability Data Collection Fleet team label Sep 1, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@kpollich
Copy link
Member

Hi @ishleenk17 - I just want to clarify the expectations for the intended work on Fleet's side with this issue.

Are we adding a processors field of type YML to every integration on Fleet's side of things, even if an integration does not define this field in its package manifest?

Can you just clarify my understanding above to make sure we have the right idea about the requirements here? Thank you!

@kpollich kpollich assigned nimarezainia and unassigned kpollich Sep 12, 2022
@kpollich
Copy link
Member

@nimarezainia Assigning this to you since you're helping with definition and priority here on related issues.

@ishleenk17
Copy link
Author

Yes @kpollich , the expectation is to have processors type of box for every integration.

This leads to further set of questions:

  1. How will fleet manage the integrations where say the processors box currently exists. Is there a way to overwrite the existing box with the one coming be default through fleet ?

  2. Also, was there a thought to maybe have a generic type of box (which can handle formats like yaml, text etc), rather than having separate boxes for processors, tags, namespace etc. Would like to know if there was any such though on this before ? Although this would provide too much flexibility to our customers which might lead to breaking of our integrations.

@kpollich
Copy link
Member

I've read through these issues and threads of comments therein

As I understand, the core ask of the Fleet team is that we have an input for processors available on each and every Elastic Agent integration, not just a subset of integrations that include a variable definition for processors. E.g the system integration defines processors for its logs-system.auth data stream here: https://github.com/elastic/integrations/blob/main/packages/system/data_stream/auth/manifest.yml#L40-L47. We'd want every data stream to have this processors input regardless of where its defined in the package spec or not.

image

If I'm correctly understanding the intent here, then I'd answer this question:

"It would be good to know if architecturally what we are advising is possible in Fleet"

with yes.

This would be technically possible to add functionality in Fleet such that the processors input appears for every data stream. I don't know whether it's the best way to do things or if we could simply add tooling for Elastic Agent integration maintainers to add global variables like this to all their data stream in each manifest.yml file.

If we could allow integration maintainers to "bulk edit" integrations to add a processors variable to every data stream, we would not need to commit any Fleet resources to supporting code changes here: everything would be captured in the integration source code.

On to the other questions here

How will fleet manage the integrations where say the processors box currently exists. Is there a way to overwrite the existing box with the one coming be default through fleet ?

We'd probably want to deprecate the processors variable in the package spec and make Fleet's definition for this variable the source of truth. Any package-level definition for a data stream variable named processors would be ignored in favor of the one defined by Fleet.

If an older version of Kibana installed a newer version of a package where processors has been removed, this would result in the user no longer being able to configure the processors variable. Something for integration maintainers to be mindful of with version constraints on their package releases.

Also, was there a thought to maybe have a generic type of box (which can handle formats like yaml, text etc), rather than having separate boxes for processors, tags, namespace etc. Would like to know if there was any such though on this before ? Although this would provide too much flexibility to our customers which might lead to breaking of our integrations.

Directly managing policies via YML is an intriguing prospect, but one that I think would be a far-out future vision. I think it'd be interesting to rethink the policy editor experience entirely with a "raw YML" editor baked in, but bolting one on to the existing policy editor seems risky and error-prone for the time being.

@gizas
Copy link
Contributor

gizas commented Sep 27, 2022

Any package-level definition for a data stream variable named processors would be ignored in favor of the one defined by Fleet.

Hello @kpollich , rather than ignoring data_stream level in favor of top level, why dont you consider merging both to a single list. This way probably will keep consistency with old versions and also we dont change current implementation on data stream level. What do you think?

but bolting one on to the existing policy editor seems risky and error-prone for the time being.

This is constant discussion between UX and feature enhancement balance I guess. As long as we keep giving customers more freedom, indeed it is more possible to have broken configs, but ... I guess at some point we need to narrow the gap between standalone and managed is not it? So training our customers is better I think to use the new UI
The entire "raw YML" is (I guess) as error-prone, even more than, a text-box addition. And should be the last step of a user to make things happen.

So from technical side do you see any big problem to add more features inside UI?

@kpollich
Copy link
Member

Hello @kpollich , rather than ignoring data_stream level in favor of top level, why dont you consider merging both to a single list. This way probably will keep consistency with old versions and also we dont change current implementation on data stream level. What do you think?

I'm a little confused by this. I didn't realize we were talking about global integration-level processors in addition to the already-supported data-stream-level processors. This adds significant implementation work around precedence, overrides, etc.

I was under the impression that processors would continue to exist at the data-stream level, just that the definition for that variable would move to Fleet as its source of truth. Thus, if a package contains a processors variable for a given data stream, that definition would be ignored in favor of Fleet's "apply to all data streams" definition for processors.

This is constant discussion between UX and feature enhancement balance I guess. As long as we keep giving customers more freedom, indeed it is more possible to have broken configs, but ... I guess at some point we need to narrow the gap between standalone and managed is not it? So training our customers is better I think to use the new UI
The entire "raw YML" is (I guess) as error-prone, even more than, a text-box addition. And should be the last step of a user to make things happen.

So from technical side do you see any big problem to add more features inside UI?

Fleet's policy editor is absolutely massive. I recorded myself just scrolling through the Kubernetes integration and clicking around to demonstrate just how much stuff lives in this UI today:

Screen.Recording.2022-09-27.at.9.10.48.AM.mov

We've been piling features onto this UI for a long time now, and it's becoming quite difficult to maintain. We're also not providing a very good UX with this interface in general. Adding on raw YML editing for every field or something like that is an intriguing idea, but it would be virtually impossible to take on without rethinking this UI from the ground up. I'd like to very diligent about what we add to this UI and how to ease the maintainability burdens it presents.

I'd love to chat more about this in the future, but I think this is quite out of scope for this current issue.

@nimarezainia
Copy link
Contributor

nimarezainia commented Jan 17, 2023

@kpollich i'd like to revive this conversation, we briefly touched on in the past.

For the sake of consistency amongst all the integrations, adding the processor input for each datastream, in my opinion, would be beneficial if it can be done via fleet.

Requirements:

  1. Add a processor box for each datastream in each integration
  2. Address the UI sprawl by including a "+ processor", where the user can click and add their processor to a specific datastream. This could be in the advance section of any integration settings.
  3. There are a handful of integrations that already have this processor block, in order to avoid a breaking change we need to consider migrating processors that are already configured.

I don't want to offer a processor at a Global integration level as you rightly point out it could be very problematic in regards to order of precedence. We have a project to add processors globally to the policy and that addresses the main use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

6 participants