Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Add default variables to all integration data streams #170336

Open
kpollich opened this issue Nov 1, 2023 · 8 comments
Open

[Fleet] Add default variables to all integration data streams #170336

kpollich opened this issue Nov 1, 2023 · 8 comments
Labels
QA:Needs Validation Issue needs to be validated by QA Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@kpollich
Copy link
Member

kpollich commented Nov 1, 2023

Ref elastic/elastic-package#949

There are several common variables that are defined across many integrations that have broad use cases and generic appeal to users, e.g.

  • tags - string[]
  • processors - string (yml box)
  • ignore_older - string
  • preserve_original_event - boolean

These variables are almost always copy/pasted between various data streams that want to support them. This is a manual process, and incurs a maintenance burden for integrations that want to adopt these universally applicable variables. If we wanted to adopt these across all integrations today, we'd also need to publish a new version of every single integrations or accept a slow pace of adoption - neither of which is particularly appealing.

Instead, we should add logic to Fleet to append these fields to all data streams across all integrations.

When a user opens policy editor to create/edit a package policy, Fleet will check a hardcoded list of global variable definitions and append those variables to the policy editor section for each data stream.

The nginx integration's access dataset is a good example of what the "end state" would look like for all data streams:

image

The highlighted section of "advanced fields" are defined by the nginx integration, but with this change in Fleet they'd appear in the same spot for all data streams across all integrations.

There's a few open questions and caveats to consider, e.g:

  • What should we do when an integration already defines one of these fields?
    • Defer to the integration manifest
  • What does an upgrade look like between package versions where these fields have moved from the integration source to Fleet?
  • What is the long term migration path? Should this logic always live in Fleet, or should this eventually be handled by "integration templates" or a similar concept?

To start, let's limit the approach here to only the processors variable. This will let us validate the approach and iron out any of the edge cases around upgrades, deferring to integration manifests, etc.

Proposed configuration schema

Each variable definition should conform to an object that extends the package spec's variable definition. However, we should enforce a few caveats for the v1 of this implementation in the interest of avoiding complexity:

  • Default variables are always placed in the Show advanced collapsible region
  • Default variables are never marked as required
interface DefaultIntegrationVariable {
  name: string;
  type: string;
  title: string;
  description: string;
  multi: boolean;
  default: any; // can probably do some TypeScript wizardry to make this typesafe
  data_stream_types: Array<"logs" | "metrics">
}
@kpollich kpollich added the Team:Fleet Team label for Observability Data Collection Fleet team label Nov 1, 2023
@kpollich kpollich self-assigned this Nov 1, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@ishleenk17
Copy link

The highlighted section of "advanced fields" are defined by the nginx integration, but with this change in Fleet they'd appear in the same spot for all data streams across all integrations.

@kpollich: Not all of these fields would be valid for all datastreams. For Eg: ignore_older,tags is true only for logs datastream.

@kpollich
Copy link
Member Author

kpollich commented Nov 2, 2023

Thanks, @ishleenk17

Not all of these fields would be valid for all datastreams. For Eg: ignore_older,tags is true only for logs datastream.

We could define a baseline compatibility condition in Fleet by adding a condition to each variable definition like data_stream_types: [logs, metrics].

@ishleenk17
Copy link

@kpollich : On the same lines as ignore_older, there is another variable "Preserve Original Events" which is meant to be present for all the logs datastream by default.

cc: @lalit-satapathy

@kpollich
Copy link
Member Author

kpollich commented Nov 8, 2023

Thanks @ishleenk17, I've updated the description to capture that variable.

@kpollich
Copy link
Member Author

@nimarezainia - Can you confirm that we're okay to start here with just the processors variable to validate the implementation? I think we had talked about reducing our scope here to just processors for now but would like to confirm. Thanks.

@nimarezainia
Copy link
Contributor

@kpollich yes we should go with processors first. add_fields or add_tags can be used to achieve the tagging requirement if needed.

@nimarezainia
Copy link
Contributor

@kpollich could we revisit this? Does it need to be in sprint 41? ranking is 20 and also I wonder if we really want to do this now given OTel. I also haven't seen any customer impacted as yet. Not discounting the need for the topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
QA:Needs Validation Issue needs to be validated by QA Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

4 participants