-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: Add tracking latencies and flamegraphs in CI #11266
Comments
I think even redline QPS would be an amazing contribution here, everything else proposed seems like gravy. +1000. |
See #961. I desperately want this. This will require a lot of thought in terms of how to structure repeatable tests, but yes, we really need to do this. |
I don't know if this exists yet, but it would also be good to have a few relatively small benchmark scenarios that can be used for A/B comparison of performance after changes to data plane components, specially in cases where we expect some performance impact. Tracking performance data for the small benchmarks over time on a calibrated environment would be great. |
I have started exploring this. Tracking progress here. |
cc @marcomagdy who is also interested in helping with this effort. |
We'd also be interested in this, so let me know how I can help to move this forward |
Update: a good part of this is in review over at envoyproxy/nighthawk#337. Nighthawk is eating its own dogfeed via a new CI task, and is dropping simple visualizations per test (example). Cpu profiles are also collected, but flamegraphing needs more work as we need to consider the binaries and libraries involved in generating the profile to get sensible output for that. |
Any updates? |
Well, I got sidetracked for a but, but this has been happily test-driving in NHs own CI. So far so good. For example see the .html files in the artefacts of a recent PR. Some important improvements that others have expressed interest in tackling are:
For more detailed status, see https://github.com/envoyproxy/nighthawk/tree/master/benchmarks#todos |
Hello Folks. We have a design doc for a framework that we'd like your comments on: |
@abaptiste thanks. My super high-level comment is that as a developer and performance engineer (user story), I'd like to be able to have control over the benchmark execution environment. So, any framework should be capable of running 100% locally. It's fine to make it also available as a SaaS via buckets or e-mail, but I think we're limiting applicability if those are the only options. |
+1 I left a bunch of comments around this. I also want to make sure we have a clear post-MVP path for CI integration as IMO this is the thing we really want to unlock ASAP. Thank you for working on this! |
Thank you for the comments. These are the major themes I've captured:
If there are additional items I may have inadvertently missed or misunderstood, please let me know. |
@abaptiste that list LGTM and also similar to our offline conversation. Thanks for working on this! This will be awesome. |
I posted a separate doc based on the feedback from the initial review. Please feel free to take a look and comment. |
@abaptiste the new doc LGTM, tagging @oschaaf @mattklein123 @antoniovicente @mum4k @pamorgan @snowp for comments/sign-off. |
I looked at the doc and at a high level it looks great to me. Very excited for this work! |
Looks good to me! |
This is the initial 'official' commit for the Salvo tool. This aims to abstract the execution of nighthawk to benchmark a given envoy version. See this issue for some background. The two design docs for this project are referenced here. In this commit, salvo is placed into a separate directory of the envoy-perf repository, and is referenced from the main README.md. Testing: Unit tests included, Address as many pylint3 issues as feasible [#Issue] envoyproxy/envoy#11266 Signed-off-by: Alvin Baptiste <[email protected]>
Any updates? We are interested in the integration, is there any help I can offer? |
Hi @gyohuangxin, we will gladly accept help. We expect to be able to staff this work in about 6 months, but I would gladly work with you in the meantime if you have the cycles. If you are able to help, it would be good to get in touch and discuss priorities and the direction. Are you on the Envoy's Slack by any chance? |
@mum4k Thank you! yes, let's discuss on Slack. |
What's the latest on this effort? This would be extremely beneficial |
This effort has been de-staffed temporarily. If there is anyone who wants to pick it up in the meantime, I will gladly transfer the latest state and/or guide, review code as desired. |
Filing this issue to get a feel for interest in this
Goal:
Add a means to track and persist latency numbers and perf visualizations like flamegraphs over time in CI. This would allow us to track how we're doing over time as well as have perf information
at hand when a latency regression is observed.
Description:
Nighthawk uses a lightweight python-based framework for integration testing.
This framework serves as a basis for writing NH's own benchmarks.
With a small bit of modification this could be modified to:
More details, and some concrete scripts for getting an idea of what this would look like can be found here.
/cc @danzh2010 @htuch
The text was updated successfully, but these errors were encountered: