Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logs Explorer] Add support for cross cluster search #172905

Open
ruflin opened this issue Dec 8, 2023 · 8 comments
Open

[Logs Explorer] Add support for cross cluster search #172905

ruflin opened this issue Dec 8, 2023 · 8 comments
Labels
Team:obs-ux-logs Observability Logs User Experience Team

Comments

@ruflin
Copy link
Contributor

ruflin commented Dec 8, 2023

Logs explorer offers in addition to Discover prefiltering of data by Integrations and Datasets. Behind the scene this creates ad hoc data views but as these are ad-hoc, the user can't configure these to add remote clusters to it. Logs explorer plans to better support predefined data views (#172469) where CCS could be used with all the features in Kibana. But the focus of this discussion is on how to support CCS for the Integrations and Dataset selector.

The architecture in mind for the following proposal is having a "logs search" cluster which is connected to a list of remote clusters potentially in different regions, and by default users want to search across all clusters. In addition, the users querying the data in most scenarios should not have to worry about the number of remote clusters changing. The assumption is there is a user that manages all the remote clusters through the Elasticsearch API or Stack Management UI . The info about the list of connected remote clusters can be retrieved through the /remote/_info API or internal APIs in Kibana. The important part is, no special modifications are made to the index patterns. If a user searchs on logs-foo-*, the same index pattern is used on all remote clusters. To simplify things, no management of cluster list per user / per space is supported.

This is not an implementation issue but to discuss ideas and approaches on how we could tackle the problem in phases. Here is an initial proposal that is split up in 4 phases. Not necessarily all 4 phases are needed, each phase would already enabled a list of users.

Phase 1 - Toggle

In phase 1, a toggle exists in the Logs Explorer UI to include remote clusters. What it does is prefix all ad-hoc data views that are created with *:, for example *:logs-nginx.access-* for nginx access logs. This would by default query all the data on the remote clusters too.

The Integrations and dataset selector will only know about the integrations and datasets installed on the local cluster. If remote clusters contain other integrations or dataset, these would NOT show up.

Phase 2 - Toggle+

In Phase 1, the remote dataset and integrations would not show up. In Phase 2 we could offer an easy way for users to create the same integrations and dataset locally. This requires manual work and is not ideal. Instead of querying remote datasets ad hoc, we could offer a sync command that pulls in info about remote integrations and dataset and then installs the remote integrations and creates the datasets with an empty data stream. Not ideal but it would remove the overhead of doing expensive calls to get all the remote data often. The dataset and integrations selection might not show all the same info about remote clusters like number of docs to ensure data selection stays fast.

As the connection to the remote clusters is purely over Elasticsearch (no Kibana connection) the information about what integrations are installed would have to be taken out of the meta information from the data streams. Integrations that we directly created in Kibana and are not available in the registry could only be supported as datasets.

Phase 3 - Select remote clusters

In the previous phases, the selection of remote clusters was all or nothing. In phase 3, the list of available remote clusters can be read out from the Elasticsearch API and offered to the user for selections besides the integrations / dataset / data view selector. The user can pick which remote clusters to search on. For now, the list of dataset / integrations shown would not change dynamically, it would be always from all remote clusters.

Phase 4 - Dynamice integrations and datasets

So far, the list of dataset and integrations did not change dynamically when the selection of remote clusters was changed. To show users more accurate info on what is available on remote clusters, dataset and integration selection would change dynamically based on which remote clusters are selected. A sync of this info would happen automatically in the background from time to time or could be triggered manually.

Related issues

@ruflin ruflin added the Team:obs-ux-logs Observability Logs User Experience Team label Dec 8, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs)

@weltenwort

This comment was marked as outdated.

@weltenwort
Copy link
Member

💭 About Phase 2:

Instead of querying remote datasets ad hoc, we could offer a sync command that pulls in info about remote integrations and dataset and then installs the remote integrations and creates the datasets with an empty data stream.

Would we install the "real" integrations locally that we detect remotely? Or would we install light-weight "remote proxy" integrations that just mimic the log datasets. The implication of the former would be that it brings all the ES+Kibana assets, which might be useful but might also clutter the cluster. The latter would not be able to offer the assets, but it would be identifiable as a "remote proxy" and thereby record the intent of the user for later workflows.

About Phase 3:

Would it make sense to switch phase 2 and 3? The latter reads like a small increment over phase 1.

@ruflin

This comment was marked as outdated.

@ruflin
Copy link
Contributor Author

ruflin commented Dec 8, 2023

Would it make sense to switch phase 2 and 3?

Yes. If 1 is good enough for most, 3 makes sense before 2.

On the integration installation: The lightweight remote proxy would be a neat thing, I don't know how much it would complicate the feature. One argument for installing the full integration can be, that now also the dashboards can be used. The problem here is, dashboards are tied to the logs-* data view and not the "integration" data view (yet).

@ruflin

This comment was marked as resolved.

@gbamparop
Copy link
Contributor

Thank you for writing this up and for splitting it in implementation phases. I think that the first phase should include the work mentioned in the second one, or at least part of it.

As CCS is supported for data streams, it seems a limitation on our end to only support integrations that are already installed locally and the first phase would deliver part of the solution. It can also make troubleshooting more complex, as there can be cases where you have to access the remote cluster to see which integrations are installed.

An alternative option would be to only add support for CCS when All logs or uncategorised datasets are selected as part of phase 1 and move integration support to the second phase.

@weltenwort

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:obs-ux-logs Observability Logs User Experience Team
Projects
None yet
Development

No branches or pull requests

4 participants