Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Searchable Remote Index] Add safeguards to ensure a cluster cannot be over-subscribed #7033

Closed
andrross opened this issue Apr 6, 2023 · 2 comments · Fixed by #8208 or #8606
Closed
Assignees
Labels
bug Something isn't working distributed framework

Comments

@andrross
Copy link
Member

andrross commented Apr 6, 2023

A searchable snapshot index consumes some resources in the cluster and also requires some amount of local disk to store cached data. Therefore storage is not infinite and there are limits, beyond which the user experience will degrade to unacceptable levels. The goal of this task is to add safeguards to ensure a reasonable user experience is maintained.

We should implement limits to ensure the cluster stays healthy and cannot be unwittingly pushed into an unusable state. In particular, there are two areas to investigate here:

  • (Broken out as separate issue) At search time, if there are enough concurrent searches that there is no capacity remaining in the reserved disk cache, then searches should be rejected with an appropriate error message.
  • At searchable snapshot index creation time, we should implement limits and reject the requests if the cluster capacity is exceeded. This can likely be implemented as a ratio of cluster-based disk cache size to total data size in the remote repository.

Note that there is future work to implement features that give more of an "infinite" storage experience (at the cost of higher search latencies), but searchable snapshots as it is implemented today keeps some metadata loaded so that data readily searchable and therefore subject to some limits.

@andrross andrross added bug Something isn't working untriaged labels Apr 6, 2023
@anasalkouz anasalkouz moved this to Todo in Concurrent Search May 2, 2023
@anasalkouz anasalkouz changed the title [Searchable Snapshots] Add safeguards to ensure a cluster cannot be over-subscribed [Searchable Remote Index] Add safeguards to ensure a cluster cannot be over-subscribed May 4, 2023
@kotwanikunal kotwanikunal self-assigned this May 10, 2023
@kotwanikunal
Copy link
Member

kotwanikunal commented May 11, 2023

Digging a bit into this issue -

This can likely be implemented as a ratio of cluster-based disk cache size to total data size in the remote repository.

I think this will have to be a function of sum_of_all(restored_index_size) and sum_of_all * (disk_cache_size).

When restoring an index, we will check if sum_of_all * (restored_index_size) + (requested_restored_index_size) is within the bounds to the ratio of sum_of_all * (disk_cache_size) or total_disk_cache_size.

The size of the remote repository can be deceiving since the user might want to restore only a single index, as well as the fact that there can be multiple repositories within a cluster.

@xiaoshi2013
Copy link

good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment