-
Notifications
You must be signed in to change notification settings - Fork 637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Benchmarking and regression testing setup #3048
Comments
I'd like to add that it's critical to test in production networks (betanet/testnet) in addition to aforementioned simulated conditions, Otherwise we don't have visibility that some stuff just doesn't work in prod (e.g. recent issue is that we couldn't easily get claimed 200 TPS outside of some peak blocks and RPC performance generally degrades not nicely under load). Previously we also had issues where chain team would declare everything is working fine (as blocks are produced) while RPC was completely pillaged. None of app-level tools worked, but testing / monitoring wasn't done using app-level tools. Basically we need to get more of end to end testing happening on production networks. This involves using real tools we ship vs some internal optimized version just for benchmark which isn't making all the same HTTP queries that app will do. |
For that we need to collect organic traffic and then replay it. |
@nearmax this is very useful and we should do it, but it's orthogonal thing. We need to test on actual production networks. Simulated workload already exposes a lot of problems. |
We also have this issue already opened for pagerduty alerts + grafana setup: https://github.com/nearprotocol/near-ops/issues/61. |
Perhaps another metric to consider is performance persistence over time. For example, the existing mocknet load test running in nightly passes with a fresh network (newly restarted from genesis), but fails in subsequent runs. This indicates to me there could be some performance degradation over time.
Note: mocknet is not yet automatically updated in nightly, so there is no change to the network between these three runs, only time passing with them being up. |
@chefsale Nice, so you are covering some health metrics. @birchmd I agree, we should have special tests for degradation.
What would be the purpose of running it on nightly if it is not automatically updated to the new code? I thought nightly runs attempt to fetch the most recent code and run tests against it. Also, I think the currently nightly setup can be the platform for benchmarking and regression testing that we build on. |
@nearmax Nightly still fetches the most recent test code (the code it runs to execute tests), but it does not re-deploy a binary to the remote nodes which form the mocknet (they are all on their own machines in various gcloud regions). This is something we'll set up eventually, it's in the backlog. But for now, since I am actively working with the mocknet anyways, I manually update all the nodes once per day (by "manually", I mean "run a script" because there is already automation for this, just not integrated with nightly yet). |
Motivation
We need to have a way to benchmark our nodes in various scenarios and especially track regressions. We also need to have a clean and reproducible way to measure improvements to our core metrics.
Metrics
The following metrics are important for us and need to be constantly tracked for each of the releases:
Core metrics
Chain metrics
Runtime metrics
Health metrics
Setups
Type of traffic
Ideally we want to measure every metric mentioned above, for every combination of setup x type of traffic. We can start with 9 setups and progress to 3 * 2^3 setups to cover all combinations. Then measure each setup with each type of traffic (6 types of traffic). Ideally it yields 6 * 24 combinations, but we can narrow it down to few most important for now.
We would perform this benchmarking automatically on each release and then would use these numbers for the following:
Please discuss: which metrics, setups, and types of traffic we should cover first.
The text was updated successfully, but these errors were encountered: