-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Logs Explorer] Add support for cross cluster search #172905
Comments
Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs) |
This comment was marked as outdated.
This comment was marked as outdated.
💭 About Phase 2:
Would we install the "real" integrations locally that we detect remotely? Or would we install light-weight "remote proxy" integrations that just mimic the log datasets. The implication of the former would be that it brings all the ES+Kibana assets, which might be useful but might also clutter the cluster. The latter would not be able to offer the assets, but it would be identifiable as a "remote proxy" and thereby record the intent of the user for later workflows. About Phase 3: Would it make sense to switch phase 2 and 3? The latter reads like a small increment over phase 1. |
This comment was marked as outdated.
This comment was marked as outdated.
Yes. If 1 is good enough for most, 3 makes sense before 2. On the integration installation: The lightweight remote proxy would be a neat thing, I don't know how much it would complicate the feature. One argument for installing the full integration can be, that now also the dashboards can be used. The problem here is, dashboards are tied to the logs-* data view and not the "integration" data view (yet). |
This comment was marked as resolved.
This comment was marked as resolved.
Thank you for writing this up and for splitting it in implementation phases. I think that the first phase should include the work mentioned in the second one, or at least part of it. As CCS is supported for data streams, it seems a limitation on our end to only support integrations that are already installed locally and the first phase would deliver part of the solution. It can also make troubleshooting more complex, as there can be cases where you have to access the remote cluster to see which integrations are installed. An alternative option would be to only add support for CCS when |
Logs explorer offers in addition to Discover prefiltering of data by Integrations and Datasets. Behind the scene this creates ad hoc data views but as these are ad-hoc, the user can't configure these to add remote clusters to it. Logs explorer plans to better support predefined data views (#172469) where CCS could be used with all the features in Kibana. But the focus of this discussion is on how to support CCS for the Integrations and Dataset selector.
The architecture in mind for the following proposal is having a "logs search" cluster which is connected to a list of remote clusters potentially in different regions, and by default users want to search across all clusters. In addition, the users querying the data in most scenarios should not have to worry about the number of remote clusters changing. The assumption is there is a user that manages all the remote clusters through the Elasticsearch API or Stack Management UI . The info about the list of connected remote clusters can be retrieved through the
/remote/_info
API or internal APIs in Kibana. The important part is, no special modifications are made to the index patterns. If a user searchs onlogs-foo-*
, the same index pattern is used on all remote clusters. To simplify things, no management of cluster list per user / per space is supported.This is not an implementation issue but to discuss ideas and approaches on how we could tackle the problem in phases. Here is an initial proposal that is split up in 4 phases. Not necessarily all 4 phases are needed, each phase would already enabled a list of users.
Phase 1 - Toggle
In phase 1, a toggle exists in the Logs Explorer UI to include remote clusters. What it does is prefix all ad-hoc data views that are created with
*:
, for example*:logs-nginx.access-*
for nginx access logs. This would by default query all the data on the remote clusters too.The Integrations and dataset selector will only know about the integrations and datasets installed on the local cluster. If remote clusters contain other integrations or dataset, these would NOT show up.
Phase 2 - Toggle+
In Phase 1, the remote dataset and integrations would not show up. In Phase 2 we could offer an easy way for users to create the same integrations and dataset locally. This requires manual work and is not ideal. Instead of querying remote datasets ad hoc, we could offer a sync command that pulls in info about remote integrations and dataset and then installs the remote integrations and creates the datasets with an empty data stream. Not ideal but it would remove the overhead of doing expensive calls to get all the remote data often. The dataset and integrations selection might not show all the same info about remote clusters like number of docs to ensure data selection stays fast.
As the connection to the remote clusters is purely over Elasticsearch (no Kibana connection) the information about what integrations are installed would have to be taken out of the meta information from the data streams. Integrations that we directly created in Kibana and are not available in the registry could only be supported as datasets.
Phase 3 - Select remote clusters
In the previous phases, the selection of remote clusters was all or nothing. In phase 3, the list of available remote clusters can be read out from the Elasticsearch API and offered to the user for selections besides the integrations / dataset / data view selector. The user can pick which remote clusters to search on. For now, the list of dataset / integrations shown would not change dynamically, it would be always from all remote clusters.
Phase 4 - Dynamice integrations and datasets
So far, the list of dataset and integrations did not change dynamically when the selection of remote clusters was changed. To show users more accurate info on what is available on remote clusters, dataset and integration selection would change dynamically based on which remote clusters are selected. A sync of this info would happen automatically in the background from time to time or could be triggered manually.
Related issues
logs-*
andmetrics-*
index patterns get overwritten on install/removal/upgrade of packages, breaking runtime fields and CCS #120340The text was updated successfully, but these errors were encountered: