Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Search Query Sandboxing: User Experience #12342

Open
kiranprakash154 opened this issue Feb 15, 2024 · 24 comments
Open

[RFC] Search Query Sandboxing: User Experience #12342

kiranprakash154 opened this issue Feb 15, 2024 · 24 comments
Assignees
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request experimental Marks issues as experimental RFC Issues requesting major changes Roadmap:Stability/Availability/Resiliency Project-wide roadmap label Search:Resiliency Search Search query, autocomplete ...etc v2.17.0 v2.18.0 Issues and PRs related to version 2.18.0

Comments

@kiranprakash154
Copy link
Contributor

kiranprakash154 commented Feb 15, 2024

Co-Author @kkhatua

#8879 - [RFC] High Level Vision for Core Search in OpenSearch
#11061 - [RFC] Query Sandboxing for search requests

A common challenge with managing resources on OpenSearch clusters has been keeping runaway queries in check. With Search Backpressure, the ability to avoid running out of resources is now available, but there is no capability of protecting tenants who might be unfairly penalized. The goal of this RFC is to propose a mechanism for how admin users will be able to organize tenants into different groups (aka Sandboxes), and limit the cumulative resource utilization of these groups. We will mention an idea of how the sandbox is enforced, but will likely be a separate RFC due to its complexity.

What is a Sandbox ?

A sandbox is a logic construct designed for running search requests within the virtual limits. The sandboxing service will track and aggregate a node's resource usage for different sandboxes, and based on the mode of containment, limit or terminate tasks of a sandbox that's exceeding its limits.
A sandbox's definition is stored in cluster state, hence the limits that need to be enforced are available to nodes across the cluster.

Tenets

  1. A sandbox can provide constraints on one or more resources.
  2. A user may hint preference for a sandbox
  3. A request can map to only one sandbox
  4. A request that maps to multiple sandboxes will be assigned to a sandbox with the highest similarity score
  5. A request maps to no sandbox or multiple sandboxes with the same similarity score will be assigned to a system-defined catchAll sandbox called genPop (general population).
  6. The mapping attributes of a sandbox must be unique as compared to other sandboxes
  7. The sandbox thresholds for a resource cumulatively cannot exceed 100%

Schemas

Sandbox Definition

The following is an abstract example of a sandbox’s definition, and is broadly broken into 4 essential elements within the document

{
    "name": "<nameOfSandbox>",
    "attributes": {
        "attributeA": "<condition1,condition2>",
        ...
    },
    "resources": {
        "resourceX": {
            "allocation": "<floatValue:(0-1)>", 
            "threshold": "<floatValue:(0-1)>" 
        },
        ...
    },    
    "enforcement": "<monitor|soft|hard>"
}
  • name : defines a unique String identifier. Internally a UUID is used
  • attributes : a collection of unique attributes that need to be matched by a request to be governed by that sandbox. We are considering user_name, indices_name, role, and custom_id
  • resources : The set of unique resources for which this sandbox is being defined. We are considering jvm and cpu
  • enforcement : How is the sandbox being enforced. This can be either monitor or soft or hard

Resource Definition

For each resource, a cluster level schema is also required, and the following is an abstract example

{
    "resources": {
        "resourceX": {
            "threshold": {
                "rejection" : "<floatValue:(0-1)>",
                "cancellation" : "<floatValue:(0-1)>"
            },
            "enforcement": "<monitor|soft|hard>"
        },
        ...
    }
}
  • resources : The limits of a resource within which all the sandboxes must cumulatively operate before requests get rejected or cancelled. rejectionThreshold <= cancellationThreshold
  • enforcement : How the containment of sandboxes for this resource will be managed. This can be either monitor or soft or hard , and can potentially override a sandbox’s enforcement.

High Level Flow

A request landing on a coordinator node will first need to be mapped to a sandbox, as per the tenets. Once a sandbox has been mapped, all child tasks spawned from the request will also inherit the sandbox allocation, irrespective on which node it runs.

HLF

Sandbox Resolution

The sandbox resolution happens on the coordinator node and will persist with all the tasks (and child tasks) of that request for the entirety of its lifecycle.

SCR-20240215-mfff

Thresholds Enforcement

As the sandboxes are enforced at a node level, for each resource, there are 2 thresholds defined cluster-wide:

  • rejection — Reject new tasks if this threshold is breached
  • cancellation — Cancel inflight tasks if this threshold is breached
thresholds life cycle

Each Sandbox level thresholds are always proportional to the node level thresholds

Thresholds are proportional

The following is an example where

  • 2 sandboxes have 20% and 25% thresholds for a resource, and the rest (100% - 20% - 25% = 55% ) is part of the general population (aka genPop) sandbox.
  • If a node exceeds 85% usage for a resource, it will actively reject new tasks destined for the overflowing sandboxes.
  • If the resource usage exceeds 95%, the node will actively cancel inflight tasks on the overflowing sandboxes
all sandboxes together

Additional Context

This RFC aims to discuss ideas at a high level. More details are provided in this google doc. Anyone with the link has comment access and we would love to gather feedback.

@kiranprakash154 kiranprakash154 added enhancement Enhancement or improvement to existing feature or request untriaged labels Feb 15, 2024
@github-actions github-actions bot added the Search Search query, autocomplete ...etc label Feb 15, 2024
@kiranprakash154 kiranprakash154 self-assigned this Feb 15, 2024
@kiranprakash154 kiranprakash154 added the RFC Issues requesting major changes label Feb 15, 2024
@kiranprakash154
Copy link
Contributor Author

kiranprakash154 commented Feb 15, 2024

@msfroh @dblock @reta @sohami @Bukhtawar @nknize would love your thoughts !

@kiranprakash154 kiranprakash154 added the discuss Issues intended to help drive brainstorming and decision making label Feb 15, 2024
@msfroh
Copy link
Collaborator

msfroh commented Feb 19, 2024

Could we piggy-back on the idea of "views" to enforce sandboxes? A view could be associated with a sandbox.

Alternatively, if we wanted to be more flexible, a view could enforce a search pipeline, then the search pipeline could (conditionally) modify the request to associate it with a sandbox.

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5]
@kiranprakash154 Thanks for creating this issue

@msfroh
Copy link
Collaborator

msfroh commented Feb 21, 2024

My next few messages contain notes from Search Backlog and Triage meeting: https://www.meetup.com/opensearch/events/298411954/. The @ references refer to the people who said the respective lines.

  • @kkhatua: This is a feature aimed at sysadmins. We want to allow them to segregate their user classes and ration resources per class.
  • @kiranprakash154: Enforcement has three levels: Monitor, soft, and hard enforcement. Monitor just gives you stats. Soft lets queries in until the node is under duress. Hard limit starts canceling queries when the sandbox is exceeded, whether the node is in duress or not.

Example from the Google doc that helps clarify things more than the example above:

Two Sandboxes defined for memory and CPU

[
 { 
   "name": "analysts",
   "attributes": {
       "role": "ba-*"
   },
   "resources": {
       "jvm": {
           "allocation": "0.5"
       },
       "cpu": {
           "allocation": "0.3"
       },
   },   
   "enforcement": "soft"
},
{
   "name": "operations",
   "attributes": {
       "role": "tellers",
       "indices_name": "transactions-*"
   },
   "resources": {
       "jvm": {
           "allocation": "0.25"
       }
   },   
   "enforcement": "hard"
 }
]

@reta
Copy link
Collaborator

reta commented Feb 21, 2024

So I think the usage of the sandbox concept is confusing here: we are not sandboxing anything (we cannot limit resources) but track the resource usage and react on that. Better names would be "resource limits group", "query prioritization group" or something along these lines.

@andrross
Copy link
Member

andrross commented Feb 21, 2024

Linking #1017 as it seems somewhat related

@msfroh
Copy link
Collaborator

msfroh commented Feb 21, 2024

  • Stavros: Is this here to prevent long-running queries? What if I have a long-running query and want to run it without impacting others. Can I suspend the query?
    • @kkhatua -- Not really. Existing search back pressure will cancel the oldest query if the system is under duress. If another user comes in and spams a lot of queries, we may want to stop those queries and let the long-running query finish.
  • @ansjcy: What do you mean by "usage"? Queries span multiple nodes. Which node do we measure usage from?
    • @kiranprakash154: We assign a query ID when it comes in and track the query across all nodes.
    • @jainankitk: We don't need to track resource usage across all nodes. If we have a 10% limit on CPU usage, we enforce that on every node.
  • @reta: Sandboxing probably isn't the best name. It's more around limits and monitoring. @getsaurabh02: Should probably refer to it as query prioritization.
  • @kiranprakash154: We map requests to sandboxes based on attributes. If a query maps to more than one sandbox, we map it to a default sandbox.
  • @msfroh: Putting sandboxing preference in the hands of users seems unnecessarily complex. Only sysadmins should need to worry about resource utilization.
    • @peternied: Users didn't want the rules in place. Operators want the rules in place.
    • @kkhatua: What if users want to run a load test or something?

@msfroh
Copy link
Collaborator

msfroh commented Feb 21, 2024

  • @macrakis: This seems like a very low level mechanism. If I'm a sysadmin, I would like to guarantee some performance characteristics, like some queuing system. Classic operating system behavior. How are users supposed to know when they can run their long-running queries effectively?
    • @peternied: Do we want to separate individual users?
    • @macrakis: This is a bottom-up mechanism. We need to think about this top-down. How are users going to manage long-running queries? We don't have a way of putting queries to sleep, but maybe we need to think about how to do that in future. This feels like a very implementation-oriented approach, rather than a functionality-oriented approach.

@msfroh
Copy link
Collaborator

msfroh commented Feb 21, 2024

  • @kkhatua: Example usage is folks who change their dashboard to refresh every 30 seconds, down from 5 minutes.
    • @macrakis: We can do that by limiting users to different levels. Allocate resources equitably between users. Each of 5 users should be treated equally well. What does equally mean? We can debate that. Maybe the 5th user runs a long-running query that can wait. We try to give them equal access to the system.
    • @kkhatua: This is focused on containing users who can impact the system. Big users don't seem to be concerned with fair allocation of resources, but containing users who are overusing the system.

@msfroh
Copy link
Collaborator

msfroh commented Feb 21, 2024

  • @kkhatua: Expectation is that sysadmins will first run in monitoring mode, and then set soft/hard limits based on the observed steady state.

@getsaurabh02
Copy link
Member

Thanks for the RFC @kiranprakash154 and notes @msfroh . Regarding the term "Sandbox" used in the proposal, I'd like to offer another perspective. The term might not fully capture the essence of our approach, as we are not actually pausing or decelerating query execution to manage system load, which is often associated with a sandbox environment. Instead, we're implementing a more granular strategy on the lines of query-level circuit breakers for various resources, such as CPU and memory. This method ensures more precise control over resource allocation.

Additionally, incorporating a priority system for query execution, along with the capability to cancel queries when resource usage exceeds certain thresholds, seems to align with the principles outlined in this proposal.

@peternied
Copy link
Member

Thanks for writing this all up and taking the time to present the document today. The level of detail and consideration for low level scenarios shows through. In the context of advancing this proposal - I would recommend creating a proof of concept and publishing it as a draft pull request. This approach will not only help us visualize the proposed mechanisms but also enable a hands-on exploration of scenarios that significantly impact users.

As we moving forward I would suggest probing into the following areas:

  • How much effort and understanding is needed by a sysadmin to enable and manage sandbox? The impact of a poorly tuned threshold can make the cluster unavailable, its important to experiment with how intuitive this process is especially with dynamically changing workloads.
  • Clusters evolve over time by scaling horizontally (additional nodes) or vertically (larger instances), how does changing these constraints impact sandboxing. What needs to be built so that after a scaling event sandboxing works without manual intervention?
  • There is overhead with additional monitoring systems and keeping 'head room' in thresholds that directly impacting billing. How can we make sure that the cluster is well tuned to be responsive without underutilization of resources?
  • When an enforcement an action is performed (queries are rejected or canceled), the feedback loop for the query author is crucial. How are users informed of these actions and what guidance are they provided to help them adjust their queries?
  • Specifically in OpenSearch Dashboards, how are these enforcement actions are communicated to the end-user is key to a smooth user experience. What methods will be used for graceful handling of these actions that avoiding user confusion?

@jainankitk
Copy link
Collaborator

@peternied - Thank you for reviewing the proposal and providing detailed feedback. Please find response to some of the questions below:

Clusters evolve over time by scaling horizontally (additional nodes) or vertically (larger instances), how does changing these constraints impact sandboxing. What needs to be built so that after a scaling event sandboxing works without manual intervention?

Scaling the cluster horizontally / vertically will not impact the sandboxing as constraints are such that they are agnostic to such events.

There is overhead with additional monitoring systems and keeping 'head room' in thresholds that directly impacting billing. How can we make sure that the cluster is well tuned to be responsive without underutilization of resources?

Underutilization is an interesting aspect and something we have talked about at length while coming up with the proposal. Every sandbox has option of soft enforcement mode that allows it to exceed the allocated quota assuming the node is not under duress. If the node is under duress, the sandbox will be hard enforced to its pre-allocated quota, since we want to prevent node from going down at any cost.

When an enforcement an action is performed (queries are rejected or canceled), the feedback loop for the query author is crucial. How are users informed of these actions and what guidance are they provided to help them adjust their queries?

While we will not provide any immediate guidance around adjusting the queries, the error message should inform the users clearly about the reason for rejection. In the later phases, we can leverage query insights for providing actionable feedback on making query more efficient.

@macrakis
Copy link

macrakis commented Feb 21, 2024 via email

@Bukhtawar
Copy link
Collaborator

I echo my thoughts with @reta and had similar comments on labelling this as "sandbox" since this is more a tenant specific resource limit. While this is a good starting point, we need to see how we tie the bigger picture with other multi-tenant capabilities like

  1. Index level encryption
  2. Tenant specific shard placement for index patterns where users can choose to allocate indices on certain node groups for better query isolation
  3. Prioritised queue to run some queries in the background based on constraints.

@kaushalmahi12
Copy link
Contributor

Thanks @macrakis for providing these scenarios

  • I have a steady stream of small live queries (Web users) where I want to bound latency (mean < 100ms, P99 < 1000ms, Pmax < 2000ms).

1st use case is more around end to end latency which gets affected due to multiple reasons and involve load characteristics on multiple nodes, while with this feature we are introducing node level resiliency so this doesn't come under the purview of this RFC.

  • I mix in some longer-running, lower-priority queries which may use a lot of resources, but which can be re-run if they fail.

This one hints more towards async completion of cancellable queries at node level, this is something we can support as 2nd iteration of this. Piggybacking too many things on this RFC will only clutter up and complicate the delivery of this.

  • I am also performing updates and I don’t want my search catalog to lag more than 10 sec behind the data feed.

This is something controlled using index.refresh_interval setting if I am right, So this is already present.

@kaushalmahi12
Copy link
Contributor

kaushalmahi12 commented Apr 2, 2024

Thanks @Bukhtawar! for taking time to provide your insightful thoughts on this. Naming convention is something we call can come to an agreement to use throughout.

Regarding tying this feature with other multi-tenant capabilities

Index level encryption

Not sure how does this one correlates with node level resiliency, can you be more specific ?

Tenant specific shard placement for index patterns where users can choose to allocate indices on certain node groups for better query isolation

But given this is a node level resiliency feature and given sharding mechanism distributes the data fairly well. I think this can potentially create the skewness in data distribution, Since we will track and limit the resource usage at node level for search workload then This will work fine as long as those query groups are not oversubscribing the resources. We can only isolate the search queries when the indexing traffic is not shared with search traffic.

Prioritised queue to run some queries in the background based on constraints.

I suppose you are hinting towards async completion of cancellable queries or do you mean something else ?

Since we are not introducing the priority concept for these query groups in the first iteration, We can take this up later.

Regarding naming convention instead of Sandbox for this feature, what should we go with (I can make few initial suggestions)

  1. QueryGroupResourceBucket
  2. QuerySquadron
  3. QueryConvoy

@stephen-crawford
Copy link
Contributor

Hi @kaushalmahi12, @jainankitk, @kiranprakash154 I just wanted to check in over here.

I know we've chatted a bit about this and I want to share that I am neither for/nor against making this or any of the related changes from a code perspective at the moment. I went ahead and left some basic comments on @kaushalmahi12's draft and don't have too many suggestions at this point.

It seems like there is some confusion around the use case for this feature; while I appreciate how detailed this RFC is (wow!), I don't know if you can point to some other issues or comments which ask for something like this?

@smacrakis
Copy link

Looking this over again, several of us have pointed out that "sandbox" is probably the wrong name. Besides implying incorrectly that it has something to do with testing or with security, it doesn't explicitly mention that it is about resources or groups of users.
I like @reta's suggestion of "resource limits group".

@kaushalmahi12
Copy link
Contributor

kaushalmahi12 commented Jun 19, 2024

Based on the discussion with folks we had decided to move the APIs to a plugin due to following reasons

  • Not to pollute the core code with feature specific APIs

We will make it as a core plugin due to following reasons

  1. Since this feature sits majorly in core as the tracking and cancellation logic will reside in the core.
  2. Fundamental constructs for this feature QueryGroup is also residing in core because of the tracking and cancellation.
  3. It would be hard to correlate the feature if the partial changes are in core and partial in plugin.
  4. We don;t expect iterative deleopment happening on the plugin side.

@andrross
Copy link
Member

@kaushalmahi12 Should it be a module? If the reasons to separate it from the core code are for modularity/architectural reasons, but the majority of the feature actually sits in the core, then it sounds like a module might be a better fit. There are two major differences between a module and a plugin:

  • Modules use the same classloader of core, whereas each plugin gets its own classloader
  • Modules are installed by default with no reasonable way for users to opt-out of them

This plugin appears to have no new dependencies, so I'm not sure the classloader difference is important. The other point is an important difference though. Are there cases where you would not want this feature to be installed?

@kaushalmahi12
Copy link
Contributor

@andrross These are important facts to consider when making the decision to move the feature into a plugin/module. I think it might not make sense for all users since this feature will need enablement so the code might sit dead in the artifact.
If that is fine with the community then It should be fine to move it to a module.
@andrross What are your thoughts on this ?

@andrross
Copy link
Member

code might sit dead in the artifact

Core plugins are in the min distribution artifact. They aren't loaded into the JVM unless installed, but the overhead of loading these classes is likely negligible.

feature will need enablement

If the act of installing the plugin is the enablement mechanism, then a plugin is probably the right place because then you'd have to build some other enabling mechanism.


@msfroh What do you think regarding plugin vs module here?

@kkhatua kkhatua moved this from New to In Progress in OpenSearch Roadmap Aug 26, 2024
@kkhatua kkhatua added the v2.18.0 Issues and PRs related to version 2.18.0 label Oct 7, 2024
@kkhatua kkhatua added the experimental Marks issues as experimental label Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request experimental Marks issues as experimental RFC Issues requesting major changes Roadmap:Stability/Availability/Resiliency Project-wide roadmap label Search:Resiliency Search Search query, autocomplete ...etc v2.17.0 v2.18.0 Issues and PRs related to version 2.18.0
Projects
Status: In Progress
Status: Now(This Quarter)
Development

No branches or pull requests