[Logs+] Assign a default @timestamp if missing #95551

felixbarny · 2023-04-25T15:22:17Z

Multiple ways to do that were investigated in the past. The most generic is probably to install an ingest pipeline for logs-*-* data streams during setup. This is currently not possible in Elasticsearch, so this task should begin with investigation of the feasibility of this option. If we can add this capability natively to Elasticsearch, we will be able to benefit from it for other things we plan to do through ingest pipelines later on, like automatic detection of JSON inputs and rerouting (to enable automatic dataset separation, for example based on automatically discovered services).

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2023-04-25T15:22:49Z

Pinging @elastic/es-data-management (Team:Data Management)

dakrone · 2023-04-25T16:52:19Z

The most generic is probably to install an ingest pipeline for logs-*-* data streams during setup. This is currently not possible in Elasticsearch, so this task should begin with investigation of the feasibility of this option.

For anyone doing the implementation, please add this functionality to StackTemplateRegistry (IndexTemplateRegistry is the abstract class where it can be added.)

felixbarny · 2023-04-25T17:15:38Z

Yep, that was the plan 👍

eyalkoren · 2023-04-30T14:29:30Z

I started implementing this and while searching for a way to represent and load pipelines as JSON files I bumped into #95198, which seems to be pretty identical to what I started doing, only as part of Enterprise Search.
Should we:

add an additional, generic, capability to define auto-installable pipelines, to which Enterprise Search may or may not choose to switch later?
move this one to be the generic one?

eyalkoren · 2023-05-02T08:39:53Z

@jbaiera @dakrone when adding composable templates, we first verify that all required component templates exist as well.
Now we are about to allow setting index.default_pipeline and index.final_pipeline within index settings.

Should we do a similar verification for configured pipeline existence before requesting to add a component?
Should we fail a component template addition if it declares a pipeline that is not yet installed?
Any additional verifications you can think of that we need to apply?

eyalkoren · 2023-05-02T11:46:40Z

One more question in this regard- since all components are added through asynchronous client calls, what prevents a race condition between their addition? For example, if a composable template C is composed of templates A and B, and all of those are added at the same ClusterChangedEvent handling, what guarantees proper order of actual addition?

dakrone · 2023-05-02T16:13:50Z

Should we:

add an additional, generic, capability to define auto-installable pipelines, to which Enterprise Search may or may not choose to switch later?

move this one to be the generic one?

Either way, the one that was added is in the ent-search module, so it can't be used by other modules. I haven't actually looked at the one the enterprise search team added, so I think it's reasonable to move it to core as a separate PR so it can be evaluated and reviewed.

Now we are about to allow setting index.default_pipeline and index.final_pipeline within index settings.

Should we do a similar verification for configured pipeline existence before requesting to add a component?

I'm generally +1 on verifying, so I think it'd be great if we could verify that the pipeline existed from within the registry when the the default_pipeline or final_pipeline settings are present in an automatically-installed template. I'm less sure about doing it from the actual template API itself. For example, we don't do this when an index is created with those settings, so doing it from the template feels like over-validating.

Should we fail a component template addition if it declares a pipeline that is not yet installed?

I think this would be a breaking change if we did this at the template level, and since we don't do it for index creation, I don't think we should.

Any additional verifications you can think of that we need to apply?

The registry-based validation I think we should investigate (to see how hard it is at the least).

One more question in this regard- since all components are added through asynchronous client calls, what prevents a race condition between their addition? For example, if a composable template C is composed of templates A and B, and all of those are added at the same ClusterChangedEvent handling, what guarantees proper order of actual addition?

Putting the composable template performs a check that all the required component templates already exist, so it won't add the composable template until it passes that validation.

eyalkoren · 2023-05-03T03:57:41Z

Thanks for the answers!

I'm generally +1 on verifying, so I think it'd be great if we could verify that the pipeline existed from within the registry when the the default_pipeline or final_pipeline settings are present in an automatically-installed template. I'm less sure about doing it from the actual template API itself. For example, we don't do this when an index is created with those settings, so doing it from the template feels like over-validating.

I think I meant what you mean, but just to be sure- my intention is to add a validation within the registry exactly like the one you referenced and I get it that you are +1 on that

I think this would be a breaking change if we did this at the template level, and since we don't do it for index creation, I don't think we should.

Right, I meant in the registry, where it can't be a breaking change as it was not supported until we add it now

Putting the composable template performs a check that all the required component templates already exist, so it won't add the composable template until it passes that validation.

That is exactly what I meant to ask- if a composable template C is composed of templates A and B, and all of those are added at the same ClusterChangedEvent handling, what guarantees proper order of actual addition? Are you saying that nothing guarantees that and the addition of C may fail and only succeed at the handling of the next ClusterChangedEvent?

eyalkoren · 2023-05-03T06:50:10Z

Any additional verifications you can think of that we need to apply?

The registry-based validation I think we should investigate (to see how hard it is at the least).

For example, I think we may consider upgrading this verification to observe the versions of the installed components. So that if a composable template C is composed of A and B and the required version of C is 3, then A and B need not only to be installed, but also have the right version. Does this make sense?

dakrone · 2023-05-03T15:00:36Z

if a composable template C is composed of templates A and B, and all of those are added at the same ClusterChangedEvent handling, what guarantees proper order of actual addition? Are you saying that nothing guarantees that and the addition of C may fail and only succeed at the handling of the next ClusterChangedEvent?

They aren't all added in the same ClusterChangedEvent, here's an example of how it would work:

Cluster starts up with a clean/empty state.
A new cluster state comes in (such as the master node being elected).
2a. The ClusterChangedEvent is invoked and all three methods — addLegacyTemplatesIfMissing (going to skip it for now), addComponentTemplatesIfMissing, and addComposableTemplatesIfMissing are invoked.
2b. addComponentTemplatesIfMissing sees that there are no component templates in the current state, so it kicks off two asynchronous requests to install A and B.
2c. addComposableTemplatesIfMissing sees that C is not yet present, but that C relies on A and B which are not present, so it returns early and is a no-op.
2d. The installation of A and B complete (the order is not important here since they are independent.
A new cluster state comes in (likely because of the installation of the A and B component templates) — the same three add* methods are invoked.
3a. addComponentTemplatesIfMissing sees that A and B already exist, so it is a no-op.
3b. addComposableTemplatesIfMissing sees that C is (still) not present, and that C relies on A and B, which are present in the cluster state, so it kicks off a request to add composable template C to the cluster.
3c. The installation of C completes.
A new cluster state comes in (probably because of the installation of C) — the three add* methods are invoked.
All three methods, addLegacyTemplatesIfMissing, addComponentTemplatesIfMissing, and addComposableTemplatesIfMissing, see that their pieces have already been installed, so they are no-ops.

Hopefully that helps clarify? Let me know if not.

For example, I think we may consider upgrading this verification to observe the versions of the installed components. So that if a composable template C is composed of A and B and the required version of C is 3, then A and B need not only to be installed, but also have the right version. Does this make sense?

I am not convinced for this validation, for example, we allow installing a higher version of a template (component or composable) as an escape hatch for when a user needs to quickly make a configuration change that will not be overridden. Adding this validation would break that behavior. Why should we validate this strictly? Are you concerned about the two being out of version alignment?

eyalkoren · 2023-05-03T15:52:56Z

Hopefully that helps clarify? Let me know if not

Extremely helpful, that was exactly what I was looking for, thanks! What I was mainly missing is the fact that each cluster state change, including such related to added components, will trigger the event handling sequence, which is what guarantees that these additions will be sequential.

Why should we validate this strictly? Are you concerned about the two being out of version alignment?

Maybe not likely as a permanent state, but I think this does break the sequentiality guarantee that you described above, at least for short terms, or for longer terms if a dependent template gets rejected for some reason.
Using the same A, B and C from before, if all of them are already installed with version n, then now trying to upgrade all to n+1 may cause temporary state where C[n+1] gets installed (because A and B exist), but A and B are still on version n, in which case I don't know what the implications may be. If A[n+1] gets rejected for some reason, than this version mismatch becomes a permanent state.
Whether or not this can actually happen or a real concern - I don't know, just raising this option 🤷‍♂️

eyalkoren · 2023-05-03T16:12:19Z

This issue is currently blocked on #95782

felixbarny assigned eyalkoren Apr 25, 2023

felixbarny added the :Data Management/Data streams Data streams and their lifecycles label Apr 25, 2023

elasticsearchmachine added the Team:Data Management Meta label for data/management team label Apr 25, 2023

eyalkoren changed the title ~~Assign a default @timestamp if missing~~ [Logs+] Assign a default @timestamp if missing Apr 30, 2023

eyalkoren mentioned this issue May 3, 2023

Adding ability to auto-install ingest pipelines #95782

Merged

eyalkoren mentioned this issue May 10, 2023

[Logs+] Default pipeline for logs data streams #95971

Merged

felixbarny mentioned this issue May 10, 2023

[Fleet] Add namespace-specific index and component templates elastic/kibana#121118

Closed

5 tasks

felixbarny closed this as completed in #95971 May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Logs+] Assign a default @timestamp if missing #95551

[Logs+] Assign a default @timestamp if missing #95551

felixbarny commented Apr 25, 2023

elasticsearchmachine commented Apr 25, 2023

dakrone commented Apr 25, 2023

felixbarny commented Apr 25, 2023

eyalkoren commented Apr 30, 2023

eyalkoren commented May 2, 2023 •

edited

Loading

eyalkoren commented May 2, 2023

dakrone commented May 2, 2023

eyalkoren commented May 3, 2023

eyalkoren commented May 3, 2023 •

edited

Loading

dakrone commented May 3, 2023 •

edited

Loading

eyalkoren commented May 3, 2023

eyalkoren commented May 3, 2023

[Logs+] Assign a default @timestamp if missing #95551

[Logs+] Assign a default @timestamp if missing #95551

Comments

felixbarny commented Apr 25, 2023

elasticsearchmachine commented Apr 25, 2023

dakrone commented Apr 25, 2023

felixbarny commented Apr 25, 2023

eyalkoren commented Apr 30, 2023

eyalkoren commented May 2, 2023 • edited Loading

eyalkoren commented May 2, 2023

dakrone commented May 2, 2023

eyalkoren commented May 3, 2023

eyalkoren commented May 3, 2023 • edited Loading

dakrone commented May 3, 2023 • edited Loading

eyalkoren commented May 3, 2023

eyalkoren commented May 3, 2023

eyalkoren commented May 2, 2023 •

edited

Loading

eyalkoren commented May 3, 2023 •

edited

Loading

dakrone commented May 3, 2023 •

edited

Loading