[Logs Explorer] Add support for cross cluster search #172905

ruflin · 2023-12-08T07:58:05Z

Logs explorer offers in addition to Discover prefiltering of data by Integrations and Datasets. Behind the scene this creates ad hoc data views but as these are ad-hoc, the user can't configure these to add remote clusters to it. Logs explorer plans to better support predefined data views (#172469) where CCS could be used with all the features in Kibana. But the focus of this discussion is on how to support CCS for the Integrations and Dataset selector.

The architecture in mind for the following proposal is having a "logs search" cluster which is connected to a list of remote clusters potentially in different regions, and by default users want to search across all clusters. In addition, the users querying the data in most scenarios should not have to worry about the number of remote clusters changing. The assumption is there is a user that manages all the remote clusters through the Elasticsearch API or Stack Management UI . The info about the list of connected remote clusters can be retrieved through the /remote/_info API or internal APIs in Kibana. The important part is, no special modifications are made to the index patterns. If a user searchs on logs-foo-*, the same index pattern is used on all remote clusters. To simplify things, no management of cluster list per user / per space is supported.

This is not an implementation issue but to discuss ideas and approaches on how we could tackle the problem in phases. Here is an initial proposal that is split up in 4 phases. Not necessarily all 4 phases are needed, each phase would already enabled a list of users.

Phase 1 - Toggle

In phase 1, a toggle exists in the Logs Explorer UI to include remote clusters. What it does is prefix all ad-hoc data views that are created with *:, for example *:logs-nginx.access-* for nginx access logs. This would by default query all the data on the remote clusters too.

The Integrations and dataset selector will only know about the integrations and datasets installed on the local cluster. If remote clusters contain other integrations or dataset, these would NOT show up.

Phase 2 - Toggle+

In Phase 1, the remote dataset and integrations would not show up. In Phase 2 we could offer an easy way for users to create the same integrations and dataset locally. This requires manual work and is not ideal. Instead of querying remote datasets ad hoc, we could offer a sync command that pulls in info about remote integrations and dataset and then installs the remote integrations and creates the datasets with an empty data stream. Not ideal but it would remove the overhead of doing expensive calls to get all the remote data often. The dataset and integrations selection might not show all the same info about remote clusters like number of docs to ensure data selection stays fast.

As the connection to the remote clusters is purely over Elasticsearch (no Kibana connection) the information about what integrations are installed would have to be taken out of the meta information from the data streams. Integrations that we directly created in Kibana and are not available in the registry could only be supported as datasets.

Phase 3 - Select remote clusters

In the previous phases, the selection of remote clusters was all or nothing. In phase 3, the list of available remote clusters can be read out from the Elasticsearch API and offered to the user for selections besides the integrations / dataset / data view selector. The user can pick which remote clusters to search on. For now, the list of dataset / integrations shown would not change dynamically, it would be always from all remote clusters.

Phase 4 - Dynamice integrations and datasets

So far, the list of dataset and integrations did not change dynamically when the selection of remote clusters was changed. To show users more accurate info on what is available on remote clusters, dataset and integration selection would change dynamically based on which remote clusters are selected. A sync of this info would happen automatically in the background from time to time or could be triggered manually.

Related issues

The text was updated successfully, but these errors were encountered:

elasticmachine · 2023-12-08T07:58:07Z

Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs)

weltenwort · 2023-12-08T12:53:28Z

💭 About Phase 2:

Instead of querying remote datasets ad hoc, we could offer a sync command that pulls in info about remote integrations and dataset and then installs the remote integrations and creates the datasets with an empty data stream.

Would we install the "real" integrations locally that we detect remotely? Or would we install light-weight "remote proxy" integrations that just mimic the log datasets. The implication of the former would be that it brings all the ES+Kibana assets, which might be useful but might also clutter the cluster. The latter would not be able to offer the assets, but it would be identifiable as a "remote proxy" and thereby record the intent of the user for later workflows.

About Phase 3:

Would it make sense to switch phase 2 and 3? The latter reads like a small increment over phase 1.

ruflin · 2023-12-08T13:15:01Z

Would it make sense to switch phase 2 and 3?

Yes. If 1 is good enough for most, 3 makes sense before 2.

On the integration installation: The lightweight remote proxy would be a neat thing, I don't know how much it would complicate the feature. One argument for installing the full integration can be, that now also the dashboards can be used. The problem here is, dashboards are tied to the logs-* data view and not the "integration" data view (yet).

gbamparop · 2023-12-11T10:26:21Z

Thank you for writing this up and for splitting it in implementation phases. I think that the first phase should include the work mentioned in the second one, or at least part of it.

As CCS is supported for data streams, it seems a limitation on our end to only support integrations that are already installed locally and the first phase would deliver part of the solution. It can also make troubleshooting more complex, as there can be cases where you have to access the remote cluster to see which integrations are installed.

An alternative option would be to only add support for CCS when All logs or uncategorised datasets are selected as part of phase 1 and move integration support to the second phase.

ruflin added the Team:obs-ux-logs Observability Logs User Experience Team label Dec 8, 2023

ruflin mentioned this issue Dec 8, 2023

[Logs Explorer][Meta] Data selector improvements #172908

Open

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Logs Explorer] Add support for cross cluster search #172905

[Logs Explorer] Add support for cross cluster search #172905

ruflin commented Dec 8, 2023 •

edited

Loading

elasticmachine commented Dec 8, 2023

This comment was marked as outdated.

weltenwort commented Dec 8, 2023

This comment was marked as outdated.

ruflin commented Dec 8, 2023

This comment was marked as resolved.

gbamparop commented Dec 11, 2023

This comment was marked as resolved.

[Logs Explorer] Add support for cross cluster search #172905

[Logs Explorer] Add support for cross cluster search #172905

Comments

ruflin commented Dec 8, 2023 • edited Loading

Phase 1 - Toggle

Phase 2 - Toggle+

Phase 3 - Select remote clusters

Phase 4 - Dynamice integrations and datasets

Related issues

elasticmachine commented Dec 8, 2023

This comment was marked as outdated.

weltenwort commented Dec 8, 2023

This comment was marked as outdated.

ruflin commented Dec 8, 2023

This comment was marked as resolved.

gbamparop commented Dec 11, 2023

This comment was marked as resolved.

ruflin commented Dec 8, 2023 •

edited

Loading