Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent load testing #1467
Agent load testing #1467
Changes from 32 commits
cd1b20b
d3e93d2
0521d52
6c979b9
6d725d3
a0d2574
68f3251
96f624d
51ab260
7c6f52f
8e7cf64
7f4b66d
148cb63
90dd8f6
288e566
6898351
049d33a
72a8039
64643e6
f080229
eed86f7
7de0045
4d9aa1b
500f89b
61f9670
e51c8e4
9c6bb6f
0939974
e7012da
547e001
090a259
8329f48
b7d53c2
cbe6351
7ce0452
b76a251
719b585
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the workers guaranteed to be the same run-to-run? how many cpu cores do they have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an important discussion point. Thanks for bringing it up!
The tl;dr here is that they should be the same but an absolute guarantee is challenging. For example, the ES Performance Testing team has a fleet of machines but have discovered over time that there are going to be variances even when they try hard to avoid them. SSDs in arrays fail, machines aren't tagged the same way by the provider but aren't physically identical, etc, etc.
So, where does that leave us? I think the best thing to do here is to be cautious about comparing results between runs but that we continue to enhance this pipeline to support scenarios where we can run multiple invocations of a single test scenario multiple times on what we can guarantee to be the same machine(s) and maybe even some sort of comparative logic as well. (So, run scenario A and the scenario B on the same machine and output the results.)
I'm also going to file an issue in the infra repo to try to get an audit underway so we can know a bit more about what divergence we do have currently. As mentioned earlier, it should be a lot but it may be some. Additionally, we'll investigate the possibility of creating some dedicated groups of machines which we can try to ensure are as similar as we can make them instead of just assuming that they're similar, which is essentially the strategy that's in place right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional data here:
We have four workers right now, all of which vary in subtle ways. I propose that we make the following changes:
benchmark
label, which currently is only assigned to one machine. That machine has the following specs:We keep the load-generation stage marked as
metal
which will allow it to float between the other bare-metal machines which have slightly varying specs. However, there's not much reason to believe that they vary enough to modify the behavior of the load-generation script, which doesn't consume a great deal of resources at present.We decide on a plan to order some additional machines which we can put into the
benchmark
pool which will ensure better consistency going forward.(I will link backward from a ticket which provides more info, so we don't link from a public repo into a private one.)
Let me know how this sounds @felixbarny and @v1v and if you give a 👍 I will make the necessary change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍