Cross Cluster search causes UI to hang while getting cross cluster field names/types #167706

desean1625 · 2023-09-29T21:19:26Z

Kibana version:
8.6.2

Describe the bug:
Dataview route /api/index_patterns/_fields_for_wildcard causes UI not populate for a long time if cross cluster search has clusters that are slow to respond.
Can a cached version be served while an async process keeps the cache up to date?

@ndmitch311

The text was updated successfully, but these errors were encountered:

elasticmachine · 2023-10-04T13:01:34Z

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

kertal · 2023-10-05T14:22:23Z

thx for reporting, I'm interested, are there frozen data tiers included in those CCS searches. where do you experience the UI to hang? In Discover when loading or switching data views? Or when trying to add filters? In KQL? Thx!

desean1625 · 2023-10-06T16:55:43Z

Anywhere kibana is using the dataviews. So everywhere.
Specifically what is happening is in the DataviewsService.get method creates a dataview and it needs to get the fields to build the dataview and calls refreshFieldSpecMap.

If the index pattern is *:index-* it needs to get the fields for a wildcard (here in the data_views_api_client) pattern and pushes this to the server to do the request. Kibana server side then does a cross cluster search and the ui hangs until the results are returned.

This calls utilizes the ElasticSearch client specifically the fieldCapsAPI without any options so the request will take the default requestTimeout of 30000 ms

After that it does a cache the dataview client side. So after the first load it is fast, until you open a new tab or refresh.

If one of the crossclusters doesn't have connectivity or is over a slow connection this call takes 30 seconds every time you select an index for every user (hundreds) every time they refresh, or open a new tab.

What I was hoping is that the results of the field_caps could be cached severside in kibana. This would speed up this call for every user that opens kibana. The server side caching should be easy to implement here

mattkime · 2023-10-06T20:44:27Z

We can't serve a cached version because that would potentially circumvent field level security - https://www.elastic.co/guide/en/elasticsearch/reference/current/field-level-security.html

While its not unusual for cross cluster requests to take longer, this sounds extreme. Is the frozen data tier involved?

We currently have a couple of efforts focused on improving worst case field loading scenarios but its hard to tell if this would help your case. Is your problematic cluster just slow? Or is it inconsistent? Its tricky work around an unreliable data source.

desean1625 · 2023-10-06T21:07:36Z

The _fields_for_wildcard is constantly takes between 20 and 38s and returns only 12.6 kb.

Our ILM only has hot and warm.
We have a about 8 clusters geographically separated. Each with 20-80 indices linked to the ILM. The number of fields for each index is also high the rollup of all the fields returns 1178 fields for all of the indices and 382 total indices.

desean1625 · 2023-10-06T21:11:21Z

We can't serve a cached version because that would potentially circumvent field level security - https://www.elastic.co/guide/en/elasticsearch/reference/current/field-level-security.html

I don't think the field meta data circumvents field level security because no data is being pulled in this call. Unless the field-level-security even prevents the users from knowing if the field even exists.

desean1625 · 2023-10-06T22:14:41Z

This actually appears to compounded by a deeper issue with the http router handling requests sequentially.

I booted all my users and scaled my kibana instances down to 1. The request for _fields_for_wildcard took exactly 2.3 seconds. I hit that end point a bunch of times and noticed that they were handled sequentially.

This means that the more users we have the worse the problem is (which is why the ui consistently hangs between 20-30 seconds) If not longer.

Even waiting the 2.3 seconds to gather the fields from the remote clusters is too long from a ui/user perspective.

kertal · 2023-10-09T07:26:14Z

@desean1625 thx for sharing, we are currently aiming to reduce an optimize the request for fields. When you have multiple users, the request for fields should not be handled sequentially by user. The screenshot you were sharing is of your Browsers DevTool, right? in which part of Kibana did you view that pattern of so much requests for the same fields? thx

desean1625 · 2023-10-09T14:44:49Z

@kertal The screenshots were from the dev tools. It was a manual test to simulate multiple requests.

I created an example repo that does a "stress test" (only 10 requests) to show the router handles the requests sequentially.
https://github.com/desean1625/kibana_router_test

git clone it into kibana/plugins and build the plugin.

desean1625 · 2023-10-09T20:11:04Z

Disregard the issue with sequential requests. Issue this was being caused by the browser.
Edge is making the requests sequential if the url parameters haven't changed.
Edge

Chrome

mattkime · 2023-10-09T20:39:44Z

Unless the field-level-security even prevents the users from knowing if the field even exists.

Unfortunately thats exactly what it does.

@desean1625 What are you doing in kibana that kicks off so many requests?

The number of fields you're using sounds very reasonable and shouldn't be causing performance issues.

Even waiting the 2.3 seconds to gather the fields from the remote clusters is too long from a ui/user perspective.

Absolutely. I would expect much faster times based on your description.

If you're willing, providing a har file that captures the slow loading might be helpful.

desean1625 · 2023-10-09T21:15:35Z

You can kick off the requests by clicking on the index pattern in discover. The popover doesn't close out until the request is completed. So you can click multiple times and reinitiate the request. Users can do this because it takes 9-35 seconds for the response.

davismcphee · 2023-10-09T21:20:21Z

Disregard the issue with sequential requests. Issue this was being caused by the browser.
Edge is making the requests sequential if the url parameters haven't changed.

Yeah, I believe the browsers do this in case the first request returns cache headers in which case subsequent requests should be served from the cache (the odd case where caching is actually slower). But it's an interesting point to raise regardless, because if we have instances in Kibana where X number of the same field caps requests are fired at once (we do, unfortunately), then this makes the problem X times worse. It's not the root cause or solution to this performance issue, but since we know the _fields_for_wildcard endpoint doesn't use caching, we could avoid making the issue worse by including something like a cache busting query param or Cache-Control header when requesting fields, which should cause the requests to run simultaneously at least.

mattkime · 2023-10-09T21:32:00Z

@desean1625 So in order to kick off so many requests, you're selecting different data views before the previous one has finished loading?

If that's the case, then we should focus on how we can speed up a particular fields_for_wildcard request

The popover doesn't close out until the request is completed.

Which popover? Can you provide a screenshot?

davismcphee · 2023-10-09T21:44:02Z

@desean1625 Slow field lists are definitely something that need to be addressed and that we're actively looking to improve, but as an aside I wonder if some of our planned CCS improvements would also be helpful for this use case: #164350. Not all of the plans have been shared publicly yet, but in general we're looking to give users greater control over their clusters such as notifying them of problematic/slow clusters and providing quick options in the UI to exclude them. Just curious if it seems like these types of improvements could help the issue from a slightly different angle?

desean1625 · 2023-10-10T14:52:37Z

@desean1625 Slow field lists are definitely something that need to be addressed and that we're actively looking to improve, but as an aside I wonder if some of our planned CCS improvements would also be helpful for this use case: #164350. Not all of the plans have been shared publicly yet, but in general we're looking to give users greater control over their clusters such as notifying them of problematic/slow clusters and providing quick options in the UI to exclude them. Just curious if it seems like these types of improvements could help the issue from a slightly different angle?

Yes I believe these planned improvements will help because it is fully integrating some of the plugins we have created. Specifically from our "Advanced cross cluster search" plugin that allows users to globally turn off specific clusters is being implemented in this ticket #99100.

Our implementation didn't cover all cases because it was just a hook into the searchInterceptor and not everything is routed through @kbn/data-plugin namely TSVB, and other routes that do serverside requests like /api/index_patterns/_fields_for_wildcard

desean1625 · 2023-10-10T14:54:39Z

Which popover? Can you provide a screenshot?

desean1625 · 2023-10-10T15:35:46Z

@mattkime if you want to simulate what our experience is like change this line to the following

  await new Promise(r=> setTimeout(r,(Math.floor(Math.random()*30)+2)*1000))
  const { fields, indices } = await indexPatterns.getFieldsForWildcard({

mattkime · 2023-10-10T21:54:03Z

@desean1625 I think the quickest way to improve your setup is to learn why the field_caps requests are slow. It would be helpful if you could use the kibana dev tools to verify that direct requests to ES take about the same amount of time as the fields_for_wildcard responses via the kibana dev console -

GET /{index-pattern}/_field_caps?fields=<fields>

desean1625 · 2023-10-11T14:22:12Z

@mattkime
curl -k -o /dev/null -s https://elastic:changeme@mycluster:9200/*:myindex-*/_field_caps?fields=* -w "%{time_total}"
1.434182

GET /*:myindex-*/_field_caps?fields=*
4916ms-16305ms

Just note that from my localbox the endpoint to elastic isn't exposed so the curl was run from the same server that hosts kibana so the curl is one less hop. but for 19kb I wouldn't expect a significant difference

kertal · 2023-10-13T17:28:45Z

Which popover? Can you provide a screenshot?

@desean1625 we're working on that: #167221

kertal · 2023-10-17T06:15:48Z

@mattkime curl -k -o /dev/null -s https://elastic:changeme@mycluster:9200/*:myindex-*/_field_caps?fields=* -w "%{total_time}" 1.434182

GET /*:myindex-*/_field_caps?fields=* 4916ms-16305ms

Just note that from my localbox the endpoint to elastic isn't exposed so the curl was run from the same server that hosts kibana so the curl is one less hop. but for 19kb I wouldn't expect a significant difference

So when you're running a curl directly on the Kibana server, it just takes 1.5s vs when you run via Console on Kibana in the Browser it takes 4-16s? There shouldn't be so much difference in this case. It's clear that curl in this case is faster, because as you said it's one less hop. Could you look at the Timing of the request in the Browser's dev tools. It's the proxy request in the Network tab. it would be interesting how fast the server responds, how log the Content Download takes. To get more insights about the communication between Kibana and your Browser

desean1625 · 2023-10-17T14:45:33Z

GET /*:myindex-*/_field_caps?fields=*
the timing is as follows

request sent 1.01 ms
Waiting for server response 36.01s
Content Download 58.39ms

second run

request sent 2.64 ms
Waiting for server response 16.21s
Content Download 16.22ms

third run

request sent 1.41 ms
Waiting for server response 2.32s
Content Download 38.98ms

fourth run

request sent 0.96 ms
Waiting for server response 6.68s
Content Download 20.63ms

kertal · 2023-10-19T10:43:50Z

Thx for sharing, so there seems to be a wide range of response times, but this isn't unfortunately something we can fix, this the request is sent to your CCS cluster, and it takes so long until all fields of all CCS instances are returned. Let's aim to fix what you reported initially, I've created an issue for that #169360 and #167221 should make switching data views fast again.

desean1625 · 2023-10-19T19:28:49Z

Initial request is to cache the fields response. So, the users doesn't have to wait up to 30 seconds when adding a map layer or switching data views, or when trying to build a lens visualization. This is an issue that plagues all of Kibana and make it difficult to use. Users cannot do anything but wait.

desean1625 · 2023-10-19T22:56:25Z

Here is the basic concept for caching it accounts for the current user and their associated roles. While always trying to keep the cache current. Maybe you could cache it as a stored object instead of keeping it in memory?
https://gist.github.com/desean1625/cb6e019a6a3e137468918eef0ad5211d

mattkime · 2023-10-20T13:12:21Z

Maybe you could cache it as a stored object instead of keeping it in memory?

We can't do that because of security concerns - different users may get different field lists.

I'm exploring caching field requests based on http headers - #168910

kertal · 2024-02-15T12:04:18Z

I took another look at this and while I think we already addressed mostly what was discussed here, one thing is missing.
When initially a very slow data view is fetching its fields, we don't display any loading indication, so the interface doesn't change, and the user will wonder what is happening.

Luckily this shouldn't be to complicated to address:'

kibana/src/plugins/discover/public/application/main/hooks/utils/change_data_view.ts

Lines 28 to 47 in 57b5546

    
               services, 
        
               internalState, 
        
               appState, 
        
             }: { 
        
               services: DiscoverServices; 
        
               internalState: DiscoverInternalStateContainer; 
        
               appState: DiscoverAppStateContainer; 
        
             } 
        
           ) { 
        
             addLog('[ui] changeDataView', { id }); 
        
             const { dataViews, uiSettings } = services; 
        
             const dataView = internalState.getState().dataView; 
        
             const state = appState.getState(); 
        
             let nextDataView: DataView | null = null; 
        
             try { 
        
               nextDataView = typeof id === 'string' ? await dataViews.get(id, false) : id; 
        
             } catch (e) { 
        
               // 
        
             }

Before the new data view is requested, we should show Discover's loading state,

desean1625 added the bug Fixes for quality problems that affect the customer experience label Sep 29, 2023

botelastic bot added the needs-team Issues missing a team label label Sep 29, 2023

jughosta added the Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. label Oct 4, 2023

botelastic bot removed the needs-team Issues missing a team label label Oct 4, 2023

kertal added the feedback_needed label Oct 5, 2023

kertal mentioned this issue Oct 19, 2023

[UnifiedSearch] Remove refreshing fields request when switching data data views #169360

Closed

davismcphee added loe:large Large Level of Effort impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. and removed feedback_needed labels Jan 4, 2024

kertal added loe:small Small Level of Effort :DataDiscovery/fix-it-week and removed loe:large Large Level of Effort labels Feb 15, 2024

kertal self-assigned this Feb 20, 2024

kertal mentioned this issue Feb 20, 2024

[Discover] Show loading indicator when DataView is switched #177240

Merged

1 task

kertal closed this as completed in #177240 Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross Cluster search causes UI to hang while getting cross cluster field names/types #167706

Cross Cluster search causes UI to hang while getting cross cluster field names/types #167706

desean1625 commented Sep 29, 2023 •

edited

Loading

elasticmachine commented Oct 4, 2023

kertal commented Oct 5, 2023

desean1625 commented Oct 6, 2023 •

edited

Loading

mattkime commented Oct 6, 2023

desean1625 commented Oct 6, 2023

desean1625 commented Oct 6, 2023

desean1625 commented Oct 6, 2023 •

edited

Loading

kertal commented Oct 9, 2023

desean1625 commented Oct 9, 2023 •

edited

Loading

desean1625 commented Oct 9, 2023

mattkime commented Oct 9, 2023

desean1625 commented Oct 9, 2023

davismcphee commented Oct 9, 2023

mattkime commented Oct 9, 2023 •

edited

Loading

davismcphee commented Oct 9, 2023

desean1625 commented Oct 10, 2023

desean1625 commented Oct 10, 2023

desean1625 commented Oct 10, 2023

mattkime commented Oct 10, 2023

desean1625 commented Oct 11, 2023 •

edited

Loading

kertal commented Oct 13, 2023

kertal commented Oct 17, 2023

desean1625 commented Oct 17, 2023

kertal commented Oct 19, 2023

desean1625 commented Oct 19, 2023

desean1625 commented Oct 19, 2023

mattkime commented Oct 20, 2023

kertal commented Feb 15, 2024

Cross Cluster search causes UI to hang while getting cross cluster field names/types #167706

Cross Cluster search causes UI to hang while getting cross cluster field names/types #167706

Comments

desean1625 commented Sep 29, 2023 • edited Loading

elasticmachine commented Oct 4, 2023

kertal commented Oct 5, 2023

desean1625 commented Oct 6, 2023 • edited Loading

mattkime commented Oct 6, 2023

desean1625 commented Oct 6, 2023

desean1625 commented Oct 6, 2023

desean1625 commented Oct 6, 2023 • edited Loading

kertal commented Oct 9, 2023

desean1625 commented Oct 9, 2023 • edited Loading

desean1625 commented Oct 9, 2023

mattkime commented Oct 9, 2023

desean1625 commented Oct 9, 2023

davismcphee commented Oct 9, 2023

mattkime commented Oct 9, 2023 • edited Loading

davismcphee commented Oct 9, 2023

desean1625 commented Oct 10, 2023

desean1625 commented Oct 10, 2023

desean1625 commented Oct 10, 2023

mattkime commented Oct 10, 2023

desean1625 commented Oct 11, 2023 • edited Loading

kertal commented Oct 13, 2023

kertal commented Oct 17, 2023

desean1625 commented Oct 17, 2023

kertal commented Oct 19, 2023

desean1625 commented Oct 19, 2023

desean1625 commented Oct 19, 2023

mattkime commented Oct 20, 2023

kertal commented Feb 15, 2024

desean1625 commented Sep 29, 2023 •

edited

Loading

desean1625 commented Oct 6, 2023 •

edited

Loading

desean1625 commented Oct 6, 2023 •

edited

Loading

desean1625 commented Oct 9, 2023 •

edited

Loading

mattkime commented Oct 9, 2023 •

edited

Loading

desean1625 commented Oct 11, 2023 •

edited

Loading