-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Enhancements for OSB Workloads #253
Comments
The bullets in the implementation notes cover a lot of cases, which is great! I just wanted to call out some issues that I've encountered recently (which I would love to see covered by future benchmarks):
|
@msfroh those are all very good points, that will need to be kept in mind as the coverage of data types is improved. We'll initially focus on adding queries for the data types that are not being exercised currently, and then add others for cases like the ones you mention above. |
Customer-representative workloads. The workloads included with OSB were inherited from ESRally. They were based on publicly-available data copora and sample sets of queries developed by the authors, to demonstrate how the tool might be used. They don’t necessarily represent real-life user queries. The title and the last disclaimer are counter to each other. Can you clarify does this represent customer workloads (representative not exact same query per se) so we can estimate the performance in real world ? |
@nandi-github The current workload is generated using public data but doesn't necessarily reflect customer workloads as its not generated using the actual customer data. It's almost impossible to create one workload that will be representative of all customers but a combination of these generic workloads along with other workloads focussed around additional areas mentioned in this RFC would help cover the broader surface area. |
Some of the field should have very very high cardinality while some of the fields should be repeatitive. |
@gashutos, the current implementation for |
I remembered something from one of the benchmark systems that I worked with previously: It would let you measure both red-line QPS and latency under normal load. Essentially, it would benchmark in two phases:
Usually a change that reduced the work involved in running a query would help both the red-line QPS and the observed latencies. Something like concurrent search would help observed latency, while hurting red-line QPS (unless it falls back to single-threaded under heavy load). I don't know enough about OSB to know if doing something like the above would require a special workload or changes to OSB itself. |
@msfroh Good suggestion. I agree with your comments. It still need the query traffic pattern to be defined otherwise it is too subjective.
|
@msfroh Are you looking for gradual ramp-up feature to identify the max threshold of a cluster especially when running with multiple concurrent clients? |
That's part of it. Step 1 is to identify that max QPS threshold (i.e. what is the capacity of the cluster?) Then step 2 is to run traffic (with multiple clients) at some proportion of max QPS, to simulate "reasonable load" (since a well-run system won't run at 100% all the time -- what happens when a data center goes down?) to measure latency under "normal conditions". As @nandi-github mentioned above:
In the project where we were doing this kind of benchmarking, we had hundreds of thousands of queries from our production logs. While different queries had different latency and put different load on the system, the overall pool was big enough that any random 10k queries probably had about the same distribution of costs. |
@msfroh, this is a feature other folks have been interested in as well, although the nuances of how it would work have been described in various ways. There is an issue that touches on adding the capability to OSB to auto-scale and ramp up the intensity of the workload. As you mentioned, this would need to take into account the differences between operations. For instance, a complex aggregation would behave quite differently than a term query. Scaling up will also need to take into account the capabilities of distributed workload generation. Once the features requested in the issue above are implemented, it will be the right point to address your request. |
Related to #199 and opensearch-project/neural-search#430, I think we are adding a lot more options in OpenSearch recently for improving search relevance. Users can use neural queries, neural sparse queries, custom re-rankers, etc. in order to achieve better quality results. On top of this, for the ml side, they may have several models to choose from for each configurations. That being said, with all of those options, it can be really hard for users to converge on an optimal configuration for search. While OSB allows users to get metrics around performance (latency, throughput etc.), users are unable to determine the benefits in search relevance they are getting. I know that it may not make sense to measure relevance information while running throughput/latency focused steps due to overhead, but I think that it would make sense to incorporate certain steps dedicated to relevance in the dedicated search workloads. That way, a user only needs to run one workload in order to understand tradeoffs wrt to performance and relevance. Please let me know if it makes sense to discuss in a separate issue. |
Referencing issue related to autoscale feature - #373 |
[RFC] Enhancements for OSB Workloads
Synopsis
This is an RFC for a proposal to enhance the workloads available with OpenSearch Benchmark. It encompasses both including new workloads and adding additional capabilities to the existing workloads.
OpenSearch Benchmark currently ships with a small set of workloads that are used to evaluate the ingestion and query performance of OpenSearch clusters. The limited number of workloads, small size of the associated data corpora and incomplete coverage of OpenSearch capabilities by the supplied queries is a hindrance to effective performance, scale and longevity testing.
This RFC outlines options to address some of these issues.
Motivation
OpenSearch Benchmark (OSB) is a performance testing framework intended for evaluating the performance of OpenSearch. OSB is a workload generator that ships with a set of workloads, which are the scenarios that will be executed by the generator against a target cluster. It is a fork of ESRally and can be used against OpenSearch clusters and Elasticsearch clusters that are v7.10 and under. When OSB was forked, most of the workloads that ship with Rally were also forked.
A common use of OSB, for instance, is to benchmark different OpenSearch versions and compare them. For example, different versions can be installed on the same hardware configuration, and the same workloads run against each. Then the performance of these versions can be compared (for example, as latency vs. throughput curves) to see where one version does better than another, or if the performance of a newer version has regressed in some regard. OSB’s capabilities include running benchmarks, recording the results, tracking system metrics on the target cluster and comparing tests.
Workloads have associated test scenarios (termed test_procedures) that include a set of operations including creating and deleting indices, checking the health of the cluster, merging segments, ingesting data and running queries. From the perspective of OpenSearch performance, the latter two are evidently the ones of most interest.
For a user, the notion of “performance” exists in a certain context. Most users and organizations are interested in how their cluster performs with regard to their own workload, which could be quite different from the workload run by a different user. Organizations are usually reticent in sharing their workloads, which often contain proprietary data. Therefore, the task of coming up with a representative set of workloads for OpenSearch in general is not a trivial one.
The workloads that ship with OSB contain an assorted set of scenarios from various domains, including search and log-analytics, with a range of document types, mappings and queries that the authors of Rally put together from publicly available data corpora. They exercise various use-cases and do provide good insight into OpenSearch performance. However, there are areas that are not covered well, leading to performance regressions such as this one. This RFC focuses on the creation of additional workloads and enhancements to the currently included ones so that they exercise additional scenarios of interest in measuring OpenSearch Performance.
Areas for Enhancement
There are a few major areas of improvement with regard to the workloads available with OSB. Some of them are enumerated below, not necessarily in priority order:
Stakeholders
Proposed Priorities
As indicated above, enhancing the OSB workloads would be a multifaceted, complex, long-term and ongoing endeavor. It will be need to be approached in prioritized phases. Each of the areas outlined above would need in-depth research and analysis before they can be embarked upon. Community engagement will help expedite the process, and indeed is crucial to enlarging the domains covered by the workloads.
With that in mind, this RFC suggests that a couple of the most pressing items in the list can be addressed first: providing a mechanism to increase the size of the data corpora and improving coverage for OpenSearch data types. The solutions proposed below would alleviate the issues substantially in the short-term, but there will need to be on-going enhancements to these capabilities going forward as well.
The priorities of the items to tackle subsequently need to be evaluated and community feedback will be helpful in this regard.
Requirements
Here are the anticipated requirements for the two proposed areas of focus:
Increasing the size of the data corpora
Better coverage of supported data types in OpenSearch
Use Cases
Implementation Notes
These rudimentary notes cover aspects relating to the first two proposed projects for now. More details will be added as they become available while the investigation proceeds.
Better coverage of supported data types in OpenSearch
Increasing the size of the data corpora
How Can You Help?
Next Steps
We will incorporate feedback and add more details on design, implementation and prototypes as they become available.
The text was updated successfully, but these errors were encountered: