-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross Cluster search causes UI to hang while getting cross cluster field names/types #167706
Comments
Pinging @elastic/kibana-data-discovery (Team:DataDiscovery) |
thx for reporting, I'm interested, are there frozen data tiers included in those CCS searches. where do you experience the UI to hang? In Discover when loading or switching data views? Or when trying to add filters? In KQL? Thx! |
Anywhere kibana is using the dataviews. So everywhere. If the index pattern is This calls utilizes the ElasticSearch client specifically the fieldCapsAPI without any options so the request will take the default requestTimeout of 30000 ms After that it does a cache the dataview client side. So after the first load it is fast, until you open a new tab or refresh. If one of the crossclusters doesn't have connectivity or is over a slow connection this call takes 30 seconds every time you select an index for every user (hundreds) every time they refresh, or open a new tab. What I was hoping is that the results of the field_caps could be cached severside in kibana. This would speed up this call for every user that opens kibana. The server side caching should be easy to implement here |
We can't serve a cached version because that would potentially circumvent field level security - https://www.elastic.co/guide/en/elasticsearch/reference/current/field-level-security.html While its not unusual for cross cluster requests to take longer, this sounds extreme. Is the frozen data tier involved? We currently have a couple of efforts focused on improving worst case field loading scenarios but its hard to tell if this would help your case. Is your problematic cluster just slow? Or is it inconsistent? Its tricky work around an unreliable data source. |
The _fields_for_wildcard is constantly takes between 20 and 38s and returns only 12.6 kb. Our ILM only has hot and warm. |
I don't think the field meta data circumvents field level security because no data is being pulled in this call. Unless the field-level-security even prevents the users from knowing if the field even exists. |
This actually appears to compounded by a deeper issue with the http router handling requests sequentially. I booted all my users and scaled my kibana instances down to 1. The request for _fields_for_wildcard took exactly 2.3 seconds. I hit that end point a bunch of times and noticed that they were handled sequentially. This means that the more users we have the worse the problem is (which is why the ui consistently hangs between 20-30 seconds) If not longer. Even waiting the 2.3 seconds to gather the fields from the remote clusters is too long from a ui/user perspective. |
@desean1625 thx for sharing, we are currently aiming to reduce an optimize the request for fields. When you have multiple users, the request for fields should not be handled sequentially by user. The screenshot you were sharing is of your Browsers DevTool, right? in which part of Kibana did you view that pattern of so much requests for the same fields? thx |
@kertal The screenshots were from the dev tools. It was a manual test to simulate multiple requests. I created an example repo that does a "stress test" (only 10 requests) to show the router handles the requests sequentially. git clone it into kibana/plugins and build the plugin. |
Unfortunately thats exactly what it does. @desean1625 What are you doing in kibana that kicks off so many requests? The number of fields you're using sounds very reasonable and shouldn't be causing performance issues.
Absolutely. I would expect much faster times based on your description. If you're willing, providing a har file that captures the slow loading might be helpful. |
You can kick off the requests by clicking on the index pattern in discover. The popover doesn't close out until the request is completed. So you can click multiple times and reinitiate the request. Users can do this because it takes 9-35 seconds for the response. |
Yeah, I believe the browsers do this in case the first request returns cache headers in which case subsequent requests should be served from the cache (the odd case where caching is actually slower). But it's an interesting point to raise regardless, because if we have instances in Kibana where X number of the same field caps requests are fired at once (we do, unfortunately), then this makes the problem X times worse. It's not the root cause or solution to this performance issue, but since we know the |
@desean1625 So in order to kick off so many requests, you're selecting different data views before the previous one has finished loading? If that's the case, then we should focus on how we can speed up a particular fields_for_wildcard request
Which popover? Can you provide a screenshot? |
@desean1625 Slow field lists are definitely something that need to be addressed and that we're actively looking to improve, but as an aside I wonder if some of our planned CCS improvements would also be helpful for this use case: #164350. Not all of the plans have been shared publicly yet, but in general we're looking to give users greater control over their clusters such as notifying them of problematic/slow clusters and providing quick options in the UI to exclude them. Just curious if it seems like these types of improvements could help the issue from a slightly different angle? |
Yes I believe these planned improvements will help because it is fully integrating some of the plugins we have created. Specifically from our "Advanced cross cluster search" plugin that allows users to globally turn off specific clusters is being implemented in this ticket #99100. Our implementation didn't cover all cases because it was just a hook into the searchInterceptor and not everything is routed through @kbn/data-plugin namely TSVB, and other routes that do serverside requests like |
@mattkime if you want to simulate what our experience is like change this line to the following
|
@desean1625 I think the quickest way to improve your setup is to learn why the field_caps requests are slow. It would be helpful if you could use the kibana dev tools to verify that direct requests to ES take about the same amount of time as the fields_for_wildcard responses via the kibana dev console -
|
@mattkime
Just note that from my localbox the endpoint to elastic isn't exposed so the curl was run from the same server that hosts kibana so the curl is one less hop. but for 19kb I wouldn't expect a significant difference |
@desean1625 we're working on that: #167221 |
So when you're running a curl directly on the Kibana server, it just takes 1.5s vs when you run via Console on Kibana in the Browser it takes 4-16s? There shouldn't be so much difference in this case. It's clear that curl in this case is faster, because as you said it's one less hop. Could you look at the Timing of the request in the Browser's dev tools. It's the proxy request in the Network tab. it would be interesting how fast the server responds, how log the Content Download takes. To get more insights about the communication between Kibana and your Browser |
second run
third run
fourth run
|
Thx for sharing, so there seems to be a wide range of response times, but this isn't unfortunately something we can fix, this the request is sent to your CCS cluster, and it takes so long until all fields of all CCS instances are returned. Let's aim to fix what you reported initially, I've created an issue for that #169360 and #167221 should make switching data views fast again. |
Initial request is to cache the fields response. So, the users doesn't have to wait up to 30 seconds when adding a map layer or switching data views, or when trying to build a lens visualization. This is an issue that plagues all of Kibana and make it difficult to use. Users cannot do anything but wait. |
Here is the basic concept for caching it accounts for the current user and their associated roles. While always trying to keep the cache current. Maybe you could cache it as a stored object instead of keeping it in memory? |
We can't do that because of security concerns - different users may get different field lists. I'm exploring caching field requests based on http headers - #168910 |
I took another look at this and while I think we already addressed mostly what was discussed here, one thing is missing. Luckily this shouldn't be to complicated to address:' kibana/src/plugins/discover/public/application/main/hooks/utils/change_data_view.ts Lines 28 to 47 in 57b5546
Before the new data view is requested, we should show Discover's loading state, |
Kibana version:
8.6.2
Describe the bug:
Dataview route
/api/index_patterns/_fields_for_wildcard
causes UI not populate for a long time if cross cluster search has clusters that are slow to respond.Can a cached version be served while an async process keeps the cache up to date?
@ndmitch311
The text was updated successfully, but these errors were encountered: