Add ability to deploy custom elastic-agents on different OS or runtimes #787

marc-gr · 2022-04-11T08:37:22Z

There are some integrations that might require elastic-agents with custom configurations for them or their container ie: winlogbeat requires a windows container, auditbeat requires special container capabilities, etc.

I initially created #786 that adds ability to deploy custom agents as test services, there is still missing code specific to deal with the windows scenario.

I open this thread to discuss other approaches that might avoid adding the complexity to the test runner if possible.

EDIT:

As mentioned in #787 (comment) , there is now support in elastic-package to:

run tests in independent Elastic Agents (for now in Linux):
- these independent Elastic Agents can be customized (with capabilities and scripts).
- each system test writes into its own data streams
run system tests in parallel
- being able to set through an environment variable the maximum number of routines to run in parallel

It will be pending here to allow running Elastic Agents in other OS (e.g. Windows) or in other runtimes (VMs?).

The text was updated successfully, but these errors were encountered:

mtojek · 2022-04-11T14:48:37Z

Regarding #786:

I recommend not modifying the test runner much/at all. We'd like to rewrite that code eventually as it has too many responsibilities and it's error-prone now.

I see a few approaches that we can apply here.

Extend "profiles" with local patches

When a user or CI executes elastic-package stack up command, it checks if there are any local profile patches and creates a custom profile (if it doesn't exist) for that particular integration.

Extend Compose stack definition with environment variables

Let's not add anything special except allowing for var customizations in the Compose stack. Whenever a user or CI executes elastic-package stack up command, it will also load local vars.

Problem: it doesn't solve the problem of elastic-package stack booting on the Windows machine. Should we move it to another, separate issue? Actually, you can pair it with image overrides.

Hack: retag elatic-agent image

It's a hack but may work temporarily. Before running the elastic-package stack up we need to build the agent's Docker image for Windows and replace/retag the current one. It might be hard due to different stack versions and configuration changes.

mtojek · 2022-04-11T14:49:16Z

pinging @jsoriano for his thoughts around this and #786.

jsoriano · 2022-04-11T21:00:51Z

Another option could be to move agent initialization to the system test runner, and remove it from the default stack definition. Having it in the runner would allow it to have full control of the started agents, allowing to start them with different options, and handling platform-specific needings, as could be the case of Windows.
It could also allow to start different tests with different agent configuration, for example to test different auditbeat configurations with different capabilities.

We already have a custom agent for the Kubernetes service deployer, we could follow a similar strategy on any other deployer.
There could be general options, such as options to select the version to use, or things like capabilities to add. And there could be platform specific options, that would also allow things like selecting daemonset vs deployment in Kubernetes (#465). On a second iteration these options could be easily overridden with flags or environment variables.

If we remove agent from the default stack definition, we could still have a stack subcommand to start agents for manual tests. Something like this could also cover #548.

I think this could be a more future-proof option, but it can require an important effort.

And another option for the use case of starting an agent but no service, can be to add a new "system" deployer, that just starts an agent with a given configuration, intended for system-level monitoring. This could help with packages for auditbeat or for the system module itself. This could be extended in the future to start completely different OSs using VMs.

This would be more in line of #786, but without needing to hack over the current test runner and compose deployer.

mtojek · 2022-04-12T07:35:40Z

Another option could be to move agent initialization to the system test runner, and remove it from the default stack definition. Having it in the runner would allow it to have full control of the started agents, allowing to start them with different options, and handling platform-specific needings, as could be the case of Windows.
It could also allow to start different tests with different agent configuration, for example to test different auditbeat configurations with different capabilities.

There are two constraints related to this approach:

Don't forget that this is the mode we also use for development purposes. You can simply start the stack and have everything ready. It is really convenient.
Agent enrollment with fleet server takes time. We considered this option at the early stage and decided to follow the "enroll once" approach at startup. It's also easier to debug if you have the agent instance present, not wiped out.

We already have a custom agent for the Kubernetes service deployer, we could follow a similar strategy on any other deployer.
There could be general options, such as options to select the version to use, or things like capabilities to add. And there could be platform specific options, that would also allow things like selecting daemonset vs deployment in Kubernetes (#465). On a second iteration these options could be easily overridden with flags or environment variables.

I like the approach of having the custom agent setup. You're right that we could apply similar logic as for kind, to spawn a new agent. This way we don't need to modify test runners at all. Most likely we may need two setups: custom image properties and windows.

If we remove agent from the default stack definition, we could still have a stack subcommand to start agents for manual tests. Something like this could also cover #548.

I had that in mind before, hence the issue, but always considered its complexity as +Inf. Maybe we can evaluate it as a good first issue and "rebuild the stack command"?

To sum up, my vote would go to custom agent setup.

jsoriano · 2022-04-12T08:37:24Z

We already have a custom agent for the Kubernetes service deployer, we could follow a similar strategy on any other deployer.
There could be general options, such as options to select the version to use, or things like capabilities to add. And there could be platform specific options, that would also allow things like selecting daemonset vs deployment in Kubernetes (#465). On a second iteration these options could be easily overridden with flags or environment variables.

I like the approach of having the custom agent setup. You're right that we could apply similar logic as for kind, to spawn a new agent. This way we don't need to modify test runners at all. Most likely we may need two setups: custom image properties and windows.

Could this be done without modifying runners?

mtojek · 2022-04-12T09:16:23Z

Yes, I think so. Same way as closed most of the changes for Kubernetes service deployer in this file. There might be one inconvenience, the agent will be deployed during the first run of the system test.

jsoriano · 2022-04-12T11:29:27Z

Ah ok, but it would be modifying service deployers. Would you prefer to add an agent to the compose deployer, or to add a new deployer for these use cases?

mtojek · 2022-04-12T12:05:43Z

Would you prefer to add an agent to the compose deployer

It looks like it depends on the final infrastructure setup. Not sure if that option will work for @marc-gr and Windows containers.

add a new deployer for these use cases?

This option seems to be pluggable and flexible in terms of specific configuration properties or OS-specific logic. It has also an extra benefit, it will prevent copying a custom agent code to multiple places.

I'm thinking now if we aren't close to introducing a feature of using an agent under development. This way you could use even standalone builds. Maybe we should implement a proxy instead :)

jsoriano · 2022-05-03T08:47:51Z

Discussed offline about this with Marc, he is going to explore the option of implementing something like #786, but as a new deployer, so the runner is not modified. This could cover the current auditbeat needings.

We also discussed that probably we need something like vms for system tests, this will be neccesary to support running tests with windows, or even with linux if not enough privileges can be granted with containers for some use cases.
We could run these tests on specialized CI workers, as in elastic/integrations#1713 for Windows, but it'd be nice to have something in elastic-package so developers working on Mac/Linux can also run these tests locally.

cmacknz · 2024-01-30T14:32:39Z

but it'd be nice to have something in elastic-package so developers working on Mac/Linux can also run these tests locally.

For Linux https://multipass.run/ is a good cross-platform solution as long as you are fine with only supporting Ubuntu VMs. For Windows there is no cross-platform equivalent, you have to provision cloud VMs.

This is generally what we do in the Elastic Agent test framework, https://github.com/elastic/elastic-agent/blob/main/docs/test-framework-dev-guide.md. You can test locally against multipass Ubuntu VMs, otherwise we are provisioning Linux and Windows machines in the cloud. MacOS VM support is TBD.

It would be good if we could align the provisioning here with the agent framework so we aren't maintaining this functionality twice. The only quirk with the agent test framework is it uses https://github.com/adam-stokes/ogc for provisioning, we'd prefer to use Terraform but we haven't gotten that implemented yet. elastic/elastic-agent#2935

cmacknz · 2024-01-30T16:38:21Z

CC @blakerouse

mrodm · 2024-06-03T17:00:05Z

When enabling independent Elastic Agents, there are some packages that last around 3 hours to finish their tests (mainly system tests).

Added a new PR to allow creating a new Agent Policy per each test executed: #1866

This will allow us to:

One step closer to be able to run in parallel system tests since every tests is going to use a different data stream to ingest docs.
Reduce complexity when using stages in system tests (e.g. --no-provision flag)

mrodm · 2024-06-10T18:16:41Z

Two new PRs created to change how test runners work in elastic-package:

Move getting all tests to run inside each runner #1895
Update Tester instances to trigger just one test #1898
- requires to merge first Move getting all tests to run inside each runner #1895

These two PRs introduce two different interfaces to manage runners and tests:

Tester interface:
- it handles the execution of just one test with its own lifecycle
  - Each testrunner can define its own specific tests, for instance:
    - asset tests: just one test for all the package.
    - system tests: one test per configuration file and variant.
    - policy tests: one test per configuration file.
    - ...
TestRunner interface:
- it handles the creation (and destruction) of global resources required for tests.
- it handles the creation of Tester instances, one per test defined in the package.

mrodm · 2024-06-17T10:09:29Z

Next step is adding support to run system tests in parallel in elastic-package.

This work is being done in two different PRs:

Add main configuration file for tests package-spec#759: allow to define a new global configuration file for tests.
Run system tests in parallel #1909: add support in elastic-package to run in parallel system tests

This will allow us to set system tests in parallel in packages with large number of system tests like network_traffic or zeek.

mrodm · 2024-06-17T17:05:41Z

Running some tests in this PR from integrations with just 2 packages (network_traffic and zeek) elastic/integrations#10161

Comparing times among the different settings:

Package	Sequential (stack Elastic Agent)	Sequential (independent Elastic Agents)	Up to 3	Up to 5	Up to 8
network_traffic	1h 20min	2h 20min	1h 7min	42min	Error (timeouts)
zeek	1h 10min	2h 40min	1h 8min	42min	Error (timeouts)

CI builds:

Sequential times from this comment: Add ability to deploy custom elastic-agents on different OS or runtimes #787 (comment)
Up to 3 routines: https://buildkite.com/elastic/integrations/builds/12562
Up to 5 routines: https://buildkite.com/elastic/integrations/builds/12568
Up to 5 routines (2nd attempt): https://buildkite.com/elastic/integrations/builds/12575
Up to 5 routines (3rd attempt): https://buildkite.com/elastic/integrations/builds/12577
Up to 8 routines: https://buildkite.com/elastic/integrations/builds/12573

mrodm · 2024-06-17T17:41:39Z

I was wondering to close this issue once this PR (#1909) is merged.

All the support related to independent Elastic Agents and running system tests in parallel would be completed at that point.

It would be missing:

release a new version of elastic-package and being integrated to integrations repository.
enable independent Elastic Agents (through environment variable) and enable parallel system tests in network_traffic and zeek packages.

For that, it could be created a follow-up issue to enable those features in the integrations repository. There will be some packages to update while doing so. At least, auditd_manager and oracle (see related PoC PR about the changes required elastic/integrations#9862).

It could be created another issue to run the system tests using the independent Elastic Agents by default. Could this be done as part of a different issue too?

However, that means that developers would be triggering the tests using the Elastic Agent from the stack but the CI would be using the new independent Elastic Agents. If they want to be running independent Elastic Agents should be setting the environment variable: ELASTIC_PACKAGE_TEST_ENABLE_INDEPENDENT_AGENT=true

WDYT about closing this one (once the PR is merged) in favor of creating those new issues? @jsoriano @kpollich

mrodm · 2024-06-17T17:44:39Z

Just to add to the previous comment, it should be updated the docs too about these new settings.

I'll update the current PR with the changes required about the docs:
https://github.com/elastic/elastic-package/blob/main/docs/howto/system_testing.md#running-system-tests-with-independent-elastic-agents-in-each-test-technical-preview

EDIT: updated in 419f8ea

jsoriano · 2024-06-18T10:13:13Z

I was wondering to close this issue once this PR (#1909) is merged.

Yep, I mostly agree with closing this once we can enable independent agents more generally. But please take into account that one the original motivations for this issue was to be able to run winlog tests, and we are still unable to run Windows agents for this.
If we close this issue, please ensure that we keep some issue open for use cases on different operating systems.

mrodm · 2024-06-18T16:03:58Z

But please take into account that one the original motivations for this issue was to be able to run winlog tests, and we are still unable to run Windows agents for this.
If we close this issue, please ensure that we keep some issue open for use cases on different operating systems.

That's true, I could keep this issue open (since there are other issues already linked to this one) even if the above mentioned PRs are merged, until we could find time to work on adding support to run Elastic Agents in other OS or runtimes.

mrodm · 2024-06-19T15:06:29Z

Created package-spec release 3.2.0 (elastic/package-spec#764) that includes the definition of the new configuration files to enable or not system parallel tests.

mrodm · 2024-06-20T10:43:31Z

As a summary for what it has been achieved until now, with the latest Pull Requests merged linked to this issue, there is now support in elastic-package to:

run tests in independent Elastic Agents (for now in Linux):
- these independent Elastic Agents can be customized (with capabilities and scripts).
- each system test writes into its own data streams
run system tests in parallel
- being able to set through an environment variable the maximum number of routines to run in parallel

As a follow-up, I created this issue to enable these features in the integrations repository:
elastic/integrations#10201

It will be pending here to allow running Elastic Agents in other OS (e.g. Windows) or in other runtimes (VMs?).

Updated title and description accordingly.

cc @jsoriano @kpollich

kpollich · 2024-07-02T13:31:06Z

Thanks for providing a summary of where we are today, @mrodm. I'm moving this into a quality sprint for now as we'll need to dedicate a large amount of time here if we prioritize adding cross-platform support to this new type of test.

mrodm · 2024-09-04T14:04:37Z

A use case that this feature could be helpful would be for the system_audit package in the integrations repository (running Elastic agent in different VMs).

This would allow to run the system tests with Elastic Agents running in different Linux OS other than Ubuntu, e.g. Fedora. So it could be tested that it can collect the required logs from rpm package manager.
Related to elastic/integrations#11000

marc-gr added the discuss label Apr 11, 2022

jlind23 added the Team:Ecosystem Label for the Packages Ecosystem team label Apr 11, 2022

mtojek mentioned this issue May 9, 2022

Add config option to deploy custom elastic-agents as test services #786

Merged

marc-gr mentioned this issue May 24, 2022

[system tests] Add support to deploy custom elastic-agents in VMs #829

Open

marc-gr mentioned this issue Oct 31, 2023

[epic] Integrations can be tested on windows both locally and on CI #1527

Open

jsoriano mentioned this issue Jan 10, 2024

[System tests] Run system tests for integrations with a non root Elastic Agent #1586

Closed

jsoriano self-assigned this Feb 14, 2024

mrodm self-assigned this Mar 7, 2024

mrodm mentioned this issue Mar 13, 2024

Create different agent per each test execution #1724

Merged

15 tasks

This was referenced Apr 10, 2024

Allow to run multiple independent agents in kubernetes #1759

Merged

Remove variant data from agent deployer package #1762

Merged

Allow to define custom agents in system test configuration files #1765

Merged

jsoriano mentioned this issue Apr 12, 2024

Allow packages without service deployer to have system tests #1768

Merged

This was referenced Apr 12, 2024

Remove definitionsDir from k8s in agentdeployer #1769

Merged

Remove created agents in service deployer #1771

Merged

Set user depending on agent.privileges.root field from manifest #1789

Merged

This was referenced Apr 22, 2024

Drop all capabilities by default in Elastic Agent containers #1794

Merged

WIP Test independent agents - DO NOT MERGE elastic/integrations#9660

Closed

Add support to define expose ports for independent agents #1795

Merged

This was referenced May 20, 2024

Update installation/uninstallation package process in system tests #1845

Closed

Update azure network watcher packages docker scenario elastic/integrations#9940

Merged

Create Agent policies per each test execution #1866

Merged

kpollich unassigned jsoriano May 28, 2024

This was referenced Jun 17, 2024

Run system tests in parallel #1909

Merged

Add main configuration file for tests elastic/package-spec#759

Merged

mrodm mentioned this issue Jun 17, 2024

Test parallel tests - DO NOT MERGE elastic/integrations#10161

Closed

This was referenced Jun 19, 2024

Create different deployer folders per each test #1919

Merged

[Buildkite] Enable independent Elastic Agents and for some packages enable parallel system tests elastic/integrations#10201

Closed

mrodm changed the title ~~Add ability to deploy custom elastic-agents~~ Add ability to deploy custom elastic-agents on different OS or runtimes Jun 20, 2024

mrodm removed their assignment Jul 2, 2024

mrodm mentioned this issue Jul 4, 2024

Run independent Elastic Agents by default #1954

Closed

kpollich assigned mrodm Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to deploy custom elastic-agents on different OS or runtimes #787

Add ability to deploy custom elastic-agents on different OS or runtimes #787

marc-gr commented Apr 11, 2022 •

edited by mrodm

Loading

mtojek commented Apr 11, 2022

mtojek commented Apr 11, 2022

jsoriano commented Apr 11, 2022

mtojek commented Apr 12, 2022

jsoriano commented Apr 12, 2022

mtojek commented Apr 12, 2022

jsoriano commented Apr 12, 2022

mtojek commented Apr 12, 2022

jsoriano commented May 3, 2022

cmacknz commented Jan 30, 2024

cmacknz commented Jan 30, 2024

mrodm commented Jun 3, 2024

mrodm commented Jun 10, 2024 •

edited

Loading

mrodm commented Jun 17, 2024

mrodm commented Jun 17, 2024 •

edited

Loading

mrodm commented Jun 17, 2024

mrodm commented Jun 17, 2024 •

edited

Loading

jsoriano commented Jun 18, 2024

mrodm commented Jun 18, 2024

mrodm commented Jun 19, 2024 •

edited

Loading

mrodm commented Jun 20, 2024 •

edited

Loading

kpollich commented Jul 2, 2024

mrodm commented Sep 4, 2024 •

edited

Loading

Add ability to deploy custom elastic-agents on different OS or runtimes #787

Add ability to deploy custom elastic-agents on different OS or runtimes #787

Comments

marc-gr commented Apr 11, 2022 • edited by mrodm Loading

mtojek commented Apr 11, 2022

mtojek commented Apr 11, 2022

jsoriano commented Apr 11, 2022

mtojek commented Apr 12, 2022

jsoriano commented Apr 12, 2022

mtojek commented Apr 12, 2022

jsoriano commented Apr 12, 2022

mtojek commented Apr 12, 2022

jsoriano commented May 3, 2022

cmacknz commented Jan 30, 2024

cmacknz commented Jan 30, 2024

mrodm commented Jun 3, 2024

mrodm commented Jun 10, 2024 • edited Loading

mrodm commented Jun 17, 2024

mrodm commented Jun 17, 2024 • edited Loading

mrodm commented Jun 17, 2024

mrodm commented Jun 17, 2024 • edited Loading

jsoriano commented Jun 18, 2024

mrodm commented Jun 18, 2024

mrodm commented Jun 19, 2024 • edited Loading

mrodm commented Jun 20, 2024 • edited Loading

kpollich commented Jul 2, 2024

mrodm commented Sep 4, 2024 • edited Loading

marc-gr commented Apr 11, 2022 •

edited by mrodm

Loading

mrodm commented Jun 10, 2024 •

edited

Loading

mrodm commented Jun 17, 2024 •

edited

Loading

mrodm commented Jun 17, 2024 •

edited

Loading

mrodm commented Jun 19, 2024 •

edited

Loading

mrodm commented Jun 20, 2024 •

edited

Loading

mrodm commented Sep 4, 2024 •

edited

Loading