-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent load testing #1467
Agent load testing #1467
Conversation
💔 Tests Failed
Expand to view the summary
Build stats
Test stats 🧪
Test errorsExpand to view the tests failures
Steps errorsExpand to view the steps failures
Log outputExpand to view the last 100 lines of log output
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few typos, but LGTM!
Co-authored-by: Manuel de la Peña <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! Let's try to merge this in soon so we can give it a try and iterate from there.
Co-authored-by: Victor Martinez <[email protected]>
Co-authored-by: Victor Martinez <[email protected]>
Co-authored-by: Victor Martinez <[email protected]>
Co-authored-by: Victor Martinez <[email protected]>
Is JJBB required? |
Yes, it is. I was going to do a follow-up PR with those changes. |
To add to the wishlist for a future iteration, it would be great to be able to compare two runs with different agent configurations and otherwise exact same settings. The most common use case would be to determine the overhead of the agent or a particular agent feature. |
@felixbarny Absolutely! I'd like do this along with the (already requested) ability to do multiple runs for each scenario and then have the ability to also have it output an average out each set of runs for each scenario. (Of course it would still output individual runs as well.) |
@felixbarny I think all the review feedback has been addressed at this point so if you'd like to give this a final review and 👍 we should be able to get this in and start testing. |
.ci/load/Jenkinsfile
Outdated
} | ||
} | ||
stage('Test application') { | ||
agent { label 'metal' } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the workers guaranteed to be the same run-to-run? how many cpu cores do they have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an important discussion point. Thanks for bringing it up!
The tl;dr here is that they should be the same but an absolute guarantee is challenging. For example, the ES Performance Testing team has a fleet of machines but have discovered over time that there are going to be variances even when they try hard to avoid them. SSDs in arrays fail, machines aren't tagged the same way by the provider but aren't physically identical, etc, etc.
So, where does that leave us? I think the best thing to do here is to be cautious about comparing results between runs but that we continue to enhance this pipeline to support scenarios where we can run multiple invocations of a single test scenario multiple times on what we can guarantee to be the same machine(s) and maybe even some sort of comparative logic as well. (So, run scenario A and the scenario B on the same machine and output the results.)
I'm also going to file an issue in the infra repo to try to get an audit underway so we can know a bit more about what divergence we do have currently. As mentioned earlier, it should be a lot but it may be some. Additionally, we'll investigate the possibility of creating some dedicated groups of machines which we can try to ensure are as similar as we can make them instead of just assuming that they're similar, which is essentially the strategy that's in place right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional data here:
We have four workers right now, all of which vary in subtle ways. I propose that we make the following changes:
- Pin the stage that runs the application to the same worker every time. We can do this by using the
benchmark
label, which currently is only assigned to one machine. That machine has the following specs:
- Ubuntu 18.04
- 6CPUs
- 64 GB
-
We keep the load-generation stage marked as
metal
which will allow it to float between the other bare-metal machines which have slightly varying specs. However, there's not much reason to believe that they vary enough to modify the behavior of the load-generation script, which doesn't consume a great deal of resources at present. -
We decide on a plan to order some additional machines which we can put into the
benchmark
pool which will ensure better consistency going forward.
(I will link backward from a ticket which provides more info, so we don't link from a public repo into a private one.)
Let me know how this sounds @felixbarny and @v1v and if you give a 👍 I will make the necessary change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Summary
This adds support for load-testing the Java agent.
Details
This is a new pipeline which introduces the ability to provision any version of the Java agent, instrument a test application with it, and then apply load to that application in order to test the stability and performance characteristics of the instrumented application over time.
When launching the pipeline, users are presented with the following:
As you can see, it allows the user to select the version of the agent to deploy, along with the JDK to use. You can also select the duration of the test. Load is generated via Locust and if you wish to specify particulars about how load generation should be conducted, you can do so.
The tests runs on bare-metal, with both load-generation and the test application residing on separate bare-metal machines.
At the conclusion of the test run, a file is produced which shows the performance of the test application as instrumented with JFR.
If you wish, you may also enable the
metrics collection
checkbox, which periodically collects system metrics directly from the operating system. (This is useful if you don't wish to rely on JFR or you simple want the perspective from the OS itself instead of from inside the Java process.)Deployment
This PR may be merged at any time, but it will not be ready for use until the Bandstand application is also deployed. (This is a separate application developed in conjunction with this one that eases the burden of orchestrating multiple bare-metal machines and handles service discovery, etc.) It is not being released as an OSS application and as such, is not linked from this PR.
Future enhancements and caveats
Presently, this relies on a dedicated machine to receive requests from the agent. Nothing is done with this data, and we would like to find a way to run these tests without requiring a dedicated APM server or perhaps making a version of APM Server which does not require Elasticsearch to run. This is a topic for a follow-up discussion.
Additionally, we would like to make the orchestration application something fully managed in the lifetime of the pipeline instead of being a persistent service. This isn't urgent, however, and can be done later.
Finally, we are interested in potentially gathering metrics from a few other sources which can be displayed with the Elastic Stack. Specifically, it would be nice to include an option to monitor the host with Metricbeat and it would also be nice to find a way to collect and display the load generation metrics. Presently, this requires just looking at the logs and can clearly be improved.