-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] OpenSearch Performance Testing Proposal #7499
Comments
|
We also apparently have some automation already (on a limited workloads, opensearch-project/opensearch-build#129) but it is for releases only. Once we have a nightly one, it should not be needed anymore I think. Certainly +1 to have benchmarks run regularly and results being public, similar to https://home.apache.org/~mikemccand/lucenebench/ and https://elasticsearch-benchmarks.elastic.co/ |
Yes, we should do that. PERFORMANCE.md can explain the guidelines on how to run performance testing, point users to OpenSearch Benchmark and Workloads.
Currently: we have a repo to for tool (OpenSearch Benchmark) and a repo for DataSets/Workloads, but we don't have a place to record core features performance testing plan and results and that why I am proposing to have a separate repo for that purposes. In addition, It will give us some sign off mechanism to ship or not to ship features based on the performance testing results.
We will use the new repo to publish the details of performance testing results for new features, and for sure we can use blog posts to share the summary. |
@anasalkouz My 2 cents... Even if you end up creating your own repo for storing your specific workloads, I would also recommend adding any core specific workloads to the https://github.com/opensearch-project/opensearch-benchmark-workloads repo to make it a single source of truth for accessing all workloads for OpenSearch. Our goal is to execute and surface perf metrics to community at core engine level (using generic, specific workloads), plugin level, distribution level for multiple versions on a regular cadence along with providing a run-book, template to the community for reproducing the setup with simple steps for localized testing as well. You should be able to able to quickly setup the infra required to run your own tests pointing to workload branch to get metrics for blog and share the summary but eventually we want all these test to run on regular cadence and surface the metrics to community for better visibility and transparency. |
I think there is a misunderstanding, The new proposed repo only to track performance testing plan and results of a specific features such as SegRep. But if you have new/customized workloads, this still should be added to opensearch-project/opensearch-benchmark-workloads. TL;DR |
Would it possibly make sense to start off by putting this data in a new folder within the OpenSearch repo itself? It's easy enough to split out to new repo if that becomes necessary/desirable. We may end up doing that quite quickly, but it'll be easier to have an opinion about this once it is concrete with a specific example. |
Sure, I think this make sense. But not sure if this should be part of OpenSearch main repo or opensearch-project/opensearch-benchmark-workloads. |
First, +1 - anything we do that improves our testing is a good thing. Second, why do we need a repo for test plans? The opensearch-project/opensearch-benchmark-workloads are (or can be made to be) self-documenting. Many contributions could re-use the datasets from those existing, with different tests. The only thing not covered is where to store test results. Maybe there's some benefit to centralizing all results, but why not have a folder in each of the workload folders to store test results. And send them to the OpenSearch cluster that's backing the read-only Dashboards accessing them. So: Workload - choose a data set from existing, or create a more targeted one |
Thanks all for the feedback. |
Overview
OpenSearch performance is crucial to the success of the software. Having a mechanism to measure the performance on daily basis and point out any degradation; and providing a standardized process to test the performance of new features are really important to keep the performance at high standards.
Problem Statement
The issue has two main aspects. Firstly, The lack of standardized process for contributors to follow in order to test the performance of their core features before releasing them (i.e. before merging the code or before move it out of experimental), such as Segment Replication GA. Secondly, The lack of automated tools to identify ad-hoc changes and small commits that may degrade the performance of the software, such as in the change on the GeoJson Point format that cause a degradation on point data indexing, which results to revert the change. The lack of proper processes often results in delayed feature releases and erodes confidence in our software.
Proposal
Core Features: Proactive mechanism to identify any performance degradation for core features ahead of time and before releasing the feature. We propose developing a public process for performance testing that can be followed by all contributors and the following are the set of requirements:
In order to achieve the previously mentioned requirements, we propose to create new repository under the OpenSearch project for performance testing, the repo will have different templates for performance testing per use-case, users can submit new issue/PR to add new template that will cover missing use-cases. The repository will include templates to report the testing results. Owner of the feature will submit the testing report as PR and two or more of the repository’s maintainers should approve the PR in order to meet the criteria of sign-off. Initially, we will start with simple process that cover few use-cases, then we will evolve and improve it overtime as needed, based on feedback and changes to the software.
Ad-hoc Changes: Reactive mechanism to identify any commits that may cause a performance degradation and addressing them promptly. It is particularly effective in cases where we cannot anticipate the potential for performance degradation until the code merged. Additionally, this approach is beneficial in addressing small, unmeasurable slowdowns that may accumulate over time, suffering the fate of the boiling frog and result in an overall drop in the software’s performance.
In order to achieve it, we need a system to run a nightly benchmark that cover most common use-cases such as logging and geospatial, then it will generate a public dashboard with read-only access to review and compare against previous runs. This effort is already in progress that you can track it here, which is similar to Lucene’s nightly benchmarks. After completing the foundation of the nightly benchmarks, there may be opportunities for further enhancements, such as:
Overtime we should keep enriching our nightly benchmarks with more test-cases to increase the coverage. Despite our efforts to expand the coverage of the nightly benchmarks, it will stay limited to a certain number of workloads and use-cases. However, by developing the additional mechanism to test and report the performance of new features, we will make sure to keep the software’s performance at high bar and users will have more confidence to upgrade and adopt new features.
We are looking forward to your feedback and support for this proposal.
Related Issues:
opensearch-project/opensearch-benchmark#102
#3983
References:
https://blog.mikemccandless.com/2011/04/catching-slowdowns-in-lucene.html
https://webtide.com/the-jetty-performance-effort/
The text was updated successfully, but these errors were encountered: