Skip to content
This repository has been archived by the owner on Sep 17, 2024. It is now read-only.

Run a separate nightly schedule for an Agent 'soak' test (8 hrs / 24 hrs) using basic Agent actions + pause + assess (and repeat) #1468

Open
EricDavisX opened this issue Aug 18, 2021 · 4 comments
Labels
Team:Elastic-Agent Label for the Agent team Testing

Comments

@EricDavisX
Copy link
Contributor

As discussed in Agent team meeting, and in this issue: elastic/beats#27299
we want to implement a 'soak' test, where we leave Agent running for a number of hours and where we do some repeat manipulations and then repeatedly check various status points.

I'll offer the following explicit details that come to mind off the top:

  • a separate run would be nice, as compared to the regular nightly runs because these will be in play for 8 hours or 16 or 24 (or even more maybe?). Let's start assuming we want to make it configurable based in hours.
  • probably a separate feature file would be easiest.
  • the big question is how is it easiest to create a loop of the desired noted tasks, all from within the same run
  • Agent can be spun up with the available building blocks... like below:
  • 1 install FS
  • 2 install Agent
  • 3 set policy to desired
  • 4 add or remove an Integration
  • 5 check health of the processes
  • 6 check health status in API
  • 7 check data is still being ingested after the current time-frame
  • 8 check memory usage stats + cpu usage stats and record them
  • repeat from step 4 until the time delay is completed for the test, looping from step 4 every 30 seconds, or as possible when last iteration finishes (can we guarantee new data will have been added in the last 30 seconds? maybe not...)

it could work for more supported OSes/environments but may be easier to target just one to start.

If we can modify the environment over time, we could reduce the specs of the Agent host to see when it may start to fail, which can give us minimum requirements recommendations, maybe?

@ruflin
Copy link

ruflin commented Aug 20, 2021

Huge +1 on this request. As Eric mentioned, being able to specify the note tasks would allow us to very easily modify it. We can start with a simple case and from there iterate on it.

What I would like to see is having multiple Elastic Agents enrolled. 3 of the Elastic Agents stay enrolled and keep receiving updates, 3 other Elastic Agents are randomly enrolled and unenrolled. During this time, the policy keeps constantly changing. The goal would be that in 8h we reproduce the number of changes that will likely happen over weeks / months and with it can reproduce edge cases errors and for example also memory leaks which implies we need to monitor this run.

@mdelapenya
Copy link
Contributor

I wonder if this type of tests fall under the stress testing layer in the testing pyramid. Not sure if reusing the e2e-testing repo could be a perfect fit. The framework will resolve the provisioning part, the very same as the elastic-package, but the intrinsic behaviours described here should not fall under the e2e tests strategy, in my opinion, but into a specific stress testing one.

@cachedout
Copy link
Contributor

@ruflin So, I think we've talked a bit about this before but beatbox should be able to accomplish some of this. We can certainly spin up an agent and connect it and then soak test it for a period of time. However, it doesn't have any kind of orchestration layer (like registering an agent for X period, then doing Y, then doing Z). Since it's really just a pile of Ansible and shell scripts, however, there's nothing preventing us from adding something like it.

It's really just a matter of finding the time and resources to finish it, since I'll be leaving very soon on paternity leave and it might be the well into 2022 before I'm fully back and able to focus on this. However, we could probably find a resource to hand this off to prior to then. (cc: @kseniia-kolpakova )

@ruflin
Copy link

ruflin commented Aug 23, 2021

No preference on my end on where it should be. I think the most important bit is constantly changing policies which would already get us a long way.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Team:Elastic-Agent Label for the Agent team Testing
Projects
None yet
Development

No branches or pull requests

5 participants