Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] Analysis, Results Aggregation, and Reporting #102

Closed
9 tasks done
achitojha opened this issue Jan 6, 2022 · 5 comments
Closed
9 tasks done

[META] Analysis, Results Aggregation, and Reporting #102

achitojha opened this issue Jan 6, 2022 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@achitojha
Copy link
Contributor

achitojha commented Jan 6, 2022

OpenSearch Benchmark currently reports various performance metrics and creates a detailed report which can be published to an OpenSearch instance. The goal here is to:

  • Create aggregations across many of these metrics to provide a summary report.
  • Publish this report in a data store
  • Analysis : Report should have insights about the quality of test, if it was anomalous or the test ran successfully, etc.

Acceptance Criteria

  • We have a comprehensive dashboard that records the performance stats on regular basis that can be reproduced by anyone (Ex: using CDK to create the entire stack)
  • We can view associated PR's and other changes corresponding to a nightly build run
  • We can view aggregate stats of performance test runs at any point in time
  • We can analyze the raw stats corresponding to a specific performance run for deeper analysis at any point in time
  • The dashboard is updated automatically after ever performance test runs
  • The dashboard is accessible by everyone ( Read only access enabled for anonymous user )
  • Automated alerting is enabled to report on regressions / anomalies
  • Admin users have ability to create new dashboards / visualizations as needed ( integrate with OIDC ? )
  • The dashboard is maintained on regular basis ( up to date patches, upgrades etc..)
@bbarani
Copy link
Member

bbarani commented Feb 8, 2023

We should create a comprehensive reporting engine that can provide aggregated summary for offline performance test runs along with build related performance summary ( including the merged PR's, commits etc..) as needed. We should be able to drill in and view the specifics along with automated monitoring and alerting support.

@bbarani bbarani changed the title Analysis, Results Aggregation, and Reporting [META] Analysis, Results Aggregation, and Reporting Feb 10, 2023
@bbarani bbarani added the enhancement New feature or request label Feb 10, 2023
@rishabh6788
Copy link
Collaborator

rishabh6788 commented Apr 14, 2023

We are proposing a similar architecture that is in use to run nightly benchmarks, see here for existing logic.

  • Establish VPC peering between Jenkins VPC and testing account VPC in which the OS clusters will be spun up.
  • Use opensearch-cluster-cdk package to spin up multiple OS clusters with varying configurations. Currently it is being done using another repo which is private, supports only single-node cluster and doesn't provide flexibility to add new config at run time.
  • Use opensearch-benchmark docker image to run the OSB process on Jenkins agent node.
  • Like in the current scenario, immutable parameters such as network config, testing aws account and region for setting up the OS cluster will be fetched from S3 and the remaining parameters to configure the cluster will provided by user.
  • Use internal NLB to resolve OS cluster, this will give us advantage of running benchmarking against single and multi-node clusters. Internal NLBs can be resolved in a VPC peering scenario by allowing CIDR of the peered VPC in target ec2 security group.
  • Use managed OS cluster as datastore to store performance metrics. These metrics are emitted by OSB coordinator node running on Jenkins agent node, therefore, the OS cluster needs to be accessible by the agent node.
  • Proposing to spin up the managed OS cluster in jenkins infra account and in the same VPC as Jenkins infra so that it is easily accessible by agent node.
  • Once the nightly performance benchmark data starts flowing, we will work with Dashboards and UX team to create analysis and dashboards on performance run metrics and open it to public with read-only access to view and compare against their benchmarking runs.
    vpc-peering

@rishabh6788
Copy link
Collaborator

The P0 iteration of the product has been released and the dashboards are now available at opensearch.org/benchmarks.
The features include:

  1. Run performance tests against single-node and multi-node.
  2. Parameters such as 50% heap usage and # of data, master and ml nodes are now configurable.
  3. Experimental or additional features such as segment replication and remote store now possible to benchmark nightly.
  4. Add additional metadata tags to performance metrics to generate dedicated visualizations for different use-cases.
  5. The datastore is a self-managed OpenSearch cluster with dashboards server exposed to public.
  6. Admins can create and update dashboards.
  7. Anonymous access is enabled to for anyone to have read access to the dashboards and metric data for further analysis.

@rishabh6788
Copy link
Collaborator

P1 goals:

  1. Add alerting to track performance regressions.
  2. Add support to run nightly benchmark against min distribution of OpenSearch.
  3. Add support to choose ec2 instance type for data nodes and ml-nodes.
  4. Annotate commits to dashboards data points.

@rishabh6788
Copy link
Collaborator

Closing this issue. Will create issues for each of the individual tasks mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

No branches or pull requests

3 participants