Run a separate nightly schedule for an Agent 'soak' test (8 hrs / 24 hrs) using basic Agent actions + pause + assess (and repeat) #1468

EricDavisX · 2021-08-18T20:14:10Z

As discussed in Agent team meeting, and in this issue: elastic/beats#27299
we want to implement a 'soak' test, where we leave Agent running for a number of hours and where we do some repeat manipulations and then repeatedly check various status points.

I'll offer the following explicit details that come to mind off the top:

a separate run would be nice, as compared to the regular nightly runs because these will be in play for 8 hours or 16 or 24 (or even more maybe?). Let's start assuming we want to make it configurable based in hours.
probably a separate feature file would be easiest.
the big question is how is it easiest to create a loop of the desired noted tasks, all from within the same run
Agent can be spun up with the available building blocks... like below:
1 install FS
2 install Agent
3 set policy to desired
4 add or remove an Integration
5 check health of the processes
6 check health status in API
7 check data is still being ingested after the current time-frame
8 check memory usage stats + cpu usage stats and record them
repeat from step 4 until the time delay is completed for the test, looping from step 4 every 30 seconds, or as possible when last iteration finishes (can we guarantee new data will have been added in the last 30 seconds? maybe not...)

it could work for more supported OSes/environments but may be easier to target just one to start.

If we can modify the environment over time, we could reduce the specs of the Agent host to see when it may start to fail, which can give us minimum requirements recommendations, maybe?

ruflin · 2021-08-20T06:32:57Z

Huge +1 on this request. As Eric mentioned, being able to specify the note tasks would allow us to very easily modify it. We can start with a simple case and from there iterate on it.

What I would like to see is having multiple Elastic Agents enrolled. 3 of the Elastic Agents stay enrolled and keep receiving updates, 3 other Elastic Agents are randomly enrolled and unenrolled. During this time, the policy keeps constantly changing. The goal would be that in 8h we reproduce the number of changes that will likely happen over weeks / months and with it can reproduce edge cases errors and for example also memory leaks which implies we need to monitor this run.

mdelapenya · 2021-08-23T07:37:08Z

I wonder if this type of tests fall under the stress testing layer in the testing pyramid. Not sure if reusing the e2e-testing repo could be a perfect fit. The framework will resolve the provisioning part, the very same as the elastic-package, but the intrinsic behaviours described here should not fall under the e2e tests strategy, in my opinion, but into a specific stress testing one.

cachedout · 2021-08-23T07:40:43Z

@ruflin So, I think we've talked a bit about this before but beatbox should be able to accomplish some of this. We can certainly spin up an agent and connect it and then soak test it for a period of time. However, it doesn't have any kind of orchestration layer (like registering an agent for X period, then doing Y, then doing Z). Since it's really just a pile of Ansible and shell scripts, however, there's nothing preventing us from adding something like it.

It's really just a matter of finding the time and resources to finish it, since I'll be leaving very soon on paternity leave and it might be the well into 2022 before I'm fully back and able to focus on this. However, we could probably find a resource to hand this off to prior to then. (cc: @kseniia-kolpakova )

ruflin · 2021-08-23T11:36:48Z

No preference on my end on where it should be. I think the most important bit is constantly changing policies which would already get us a long way.

EricDavisX added the Team:Elastic-Agent Label for the Agent team label Aug 18, 2021

EricDavisX mentioned this issue Aug 19, 2021

Fleet: policy aren't assigned to agents (flaky) elastic/beats#27299

Closed

kuisathaverat added the Testing label Dec 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run a separate nightly schedule for an Agent 'soak' test (8 hrs / 24 hrs) using basic Agent actions + pause + assess (and repeat) #1468

Run a separate nightly schedule for an Agent 'soak' test (8 hrs / 24 hrs) using basic Agent actions + pause + assess (and repeat) #1468

EricDavisX commented Aug 18, 2021

ruflin commented Aug 20, 2021

mdelapenya commented Aug 23, 2021

cachedout commented Aug 23, 2021

ruflin commented Aug 23, 2021

Run a separate nightly schedule for an Agent 'soak' test (8 hrs / 24 hrs) using basic Agent actions + pause + assess (and repeat) #1468

Run a separate nightly schedule for an Agent 'soak' test (8 hrs / 24 hrs) using basic Agent actions + pause + assess (and repeat) #1468

Comments

EricDavisX commented Aug 18, 2021

ruflin commented Aug 20, 2021

mdelapenya commented Aug 23, 2021

cachedout commented Aug 23, 2021

ruflin commented Aug 23, 2021