fix: update stack deployment to follow Users&Roles best practices #2064

mdelapenya · 2022-01-27T18:37:16Z

bump stack version 8.1.0-aa69d697
fix: use new kibana roles and users
feat: add a method for checking ES cluster health
fix: create fleet-server after getting a service token from elasticsearch

What does this PR do?

This PR tries to address the recent issue for the project:

it bumps the versions, cherry-picking [automation] update elastic stack version for testing 8.1.0-aa69d697 #2052
it mimics what the APM ITs project is doing to load users and roles for elasticsearch. For that we are adding admin user as the super user, included in the kibana_system role.
it retrieves an Elasticsearch service token that will be used by fleet-server to start
it removes fleet-server from the stack profile, and manually adds it to the deployment with proper configuration, including the aforementioned service token

Why is it important?

Two major things:

tests with security (users-roles)
bumps to latest versions
fix fleet-server communication with kibana/elasticsearch

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have run the Unit tests (make unit-test), and they are passing locally
I have run the End-2-End tests for the suite I'm working on, and they are passing locally
I have noticed new Go dependencies (run make notice in the proper directory)

Related issues

Bumps versions from [automation] update elastic stack version for testing 8.1.0-aa69d697 #2052

We are going to use 'admin' everywhere

…arch

mdelapenya · 2022-01-27T18:38:10Z

I'm skipping backport because of the cherry-picked commit including versions for 8.1

Will revert that commit once this is fixed, adding the labels for the proper backports

elasticmachine · 2022-01-27T19:21:03Z

💔 Tests Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-02-07T08:58:43.677+0000
Duration: 51 min 55 sec

Test stats 🧪

Test	Results
Failed	38
Passed	194
Skipped	0
Total	232

Test errors

Expand to view the tests failures

> Show only the first 10 test failures

`Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Un-enrolling the agent deactivates the agent – Fleet Mode Agent`

Expand to view the error details

 Step the agent is un-enrolled

no stacktrace

`Initializing / End-To-End Tests / fleet_centos8_arm64_fleet_mode_agent / Un-enrolling the agent deactivates the agent – Fleet Mode Agent`

Expand to view the error details

 Step the agent is un-enrolled

no stacktrace

`Initializing / End-To-End Tests / fleet_debian_arm64_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent`

Expand to view the error details

 Step the enrollment token is revoked

no stacktrace

`Initializing / End-To-End Tests / fleet_debian_amd64_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent`

Expand to view the error details

 Step the agent is un-enrolled

no stacktrace

`Initializing / End-To-End Tests / fleet_debian_amd64_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent`

Expand to view the error details

 Step the enrollment token is revoked

no stacktrace

`Initializing / End-To-End Tests / fleet_sles15_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent`

Expand to view the error details

 Step the agent is un-enrolled

no stacktrace

`Initializing / End-To-End Tests / fleet_sles15_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent`

Expand to view the error details

 Step the enrollment token is revoked

no stacktrace

`Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent`

Expand to view the error details

 Step the agent is un-enrolled

no stacktrace

`Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent`

Expand to view the error details

 Step the enrollment token is revoked

no stacktrace

`Initializing / End-To-End Tests / fleet_centos8_arm64_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent`

Expand to view the error details

 Step the agent is un-enrolled

no stacktrace

Steps errors

Expand to view the steps failures

Show only the first 10 steps failures

`Shell Script`

Took 13 min 5 sec . View more details here
Description: ssh -tt -o TCPKeepAlive=yes -o ServerAliveInterval=60 -o ServerAliveCountMax=10 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /var/lib/jenkins/workspace/PR-2064-43-00cebabf-e170-4d9c-9c28-d8db56213f36/e2essh [email protected] -- 'sudo bash /home/admin/e2e-testing/.ci/scripts/functional-test.sh "backend_processes && ~@nightly && ~@skip:amd64" '

`Archive the artifacts`

Took 0 min 0 sec . View more details here
Description: [2022-02-07T09:44:59.909Z] Archiving artifacts script returned exit code 2

`Shell Script`

Took 12 min 15 sec . View more details here
Description: ssh -tt -o TCPKeepAlive=yes -o ServerAliveInterval=60 -o ServerAliveCountMax=10 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /var/lib/jenkins/workspace/PR-2064-43-9ca2af07-33b5-4ab8-b5ce-22fcb6faef78/e2essh [email protected] -- 'sudo bash /home/ec2-user/e2e-testing/.ci/scripts/functional-test.sh "backend_processes && ~@nightly && ~@skip:amd64" '

`Archive the artifacts`

Took 0 min 0 sec . View more details here
Description: [2022-02-07T09:44:48.798Z] Archiving artifacts script returned exit code 2

`Shell Script`

Took 9 min 27 sec . View more details here
Description: ssh -tt -o TCPKeepAlive=yes -o ServerAliveInterval=60 -o ServerAliveCountMax=10 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /var/lib/jenkins/workspace/PR-2064-43-a87b85ca-1a06-4d67-bd11-75c3f3633d53/e2essh [email protected] -- 'sudo bash /home/admin/e2e-testing/.ci/scripts/functional-test.sh "running_on_beats && ~@nightly && ~@skip:arm64" '

`Archive the artifacts`

Took 0 min 0 sec . View more details here
Description: [2022-02-07T09:41:38.795Z] Archiving artifacts script returned exit code 2

`Shell Script`

Took 10 min 43 sec . View more details here
Description: ssh -tt -o TCPKeepAlive=yes -o ServerAliveInterval=60 -o ServerAliveCountMax=10 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /var/lib/jenkins/workspace/PR-2064-43-dc281c02-17c2-45a1-a1d8-24580b700939/e2essh [email protected] -- 'sudo bash /home/admin/e2e-testing/.ci/scripts/functional-test.sh "running_on_beats && ~@nightly && ~@skip:amd64" '

`Archive the artifacts`

Took 0 min 0 sec . View more details here
Description: [2022-02-07T09:41:28.061Z] Archiving artifacts script returned exit code 2

`Shell Script`

Took 8 min 0 sec . View more details here
Description: ssh -tt -o TCPKeepAlive=yes -o ServerAliveInterval=60 -o ServerAliveCountMax=10 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /var/lib/jenkins/workspace/PR-2064-43-0a1452f4-5838-419c-8a08-b0c8f312b4e9/e2essh [email protected] -- 'sudo bash /home/ec2-user/e2e-testing/.ci/scripts/functional-test.sh "running_on_beats && ~@nightly && ~@skip:amd64" '

`Archive the artifacts`

Took 0 min 0 sec . View more details here
Description: [2022-02-07T09:41:24.727Z] Archiving artifacts script returned exit code 2

🐛 Flaky test report

❕ There are test failures but not known flaky tests.

Expand to view the summary

Genuine test errors

💔 There are test failures but not known flaky tests, most likely a genuine test failure.

Name: Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Un-enrolling the agent deactivates the agent – Fleet Mode Agent
Name: Initializing / End-To-End Tests / fleet_centos8_arm64_fleet_mode_agent / Un-enrolling the agent deactivates the agent – Fleet Mode Agent
Name: Initializing / End-To-End Tests / fleet_debian_arm64_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent
Name: Initializing / End-To-End Tests / fleet_debian_amd64_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent
Name: Initializing / End-To-End Tests / fleet_debian_amd64_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent
Name: Initializing / End-To-End Tests / fleet_sles15_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent
Name: Initializing / End-To-End Tests / fleet_sles15_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent
Name: Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent
Name: Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent
Name: Initializing / End-To-End Tests / fleet_centos8_arm64_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent
Name: Initializing / End-To-End Tests / fleet_centos8_arm64_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent
Name: Initializing / End-To-End Tests / fleet_sles15_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of filebeat #2 – Running on top of Beats
Name: Initializing / End-To-End Tests / fleet_debian_arm64_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of filebeat #2 – Running on top of Beats
Name: Initializing / End-To-End Tests / fleet_debian_amd64_apm_server / Deploying a default stand-alone agent with the Elastic APM integration – APM Integration
Name: Initializing / End-To-End Tests / fleet_debian_amd64_apm_server / Deploying a ubi8 stand-alone agent with the Elastic APM integration – APM Integration
Name: Initializing / End-To-End Tests / fleet_debian_arm64_backend_processes / Un-enrolling the agent stops backend processes – Backend Processes
Name: Initializing / End-To-End Tests / fleet_debian_arm64_backend_processes / Re-enrolling the agent starts the elastic-agent process – Backend Processes
Name: Initializing / End-To-End Tests / fleet_debian_arm64_backend_processes / Un-installing the installed agent – Backend Processes
Name: Initializing / End-To-End Tests / fleet_debian_arm64_backend_processes / Un-enrolling Elastic Agent stops Elastic Endpoint – Backend Processes
Name: Initializing / End-To-End Tests / fleet_debian_arm64_backend_processes / Removing Endpoint from Agent policy stops the connected Endpoint – Backend Processes
Name: Initializing / End-To-End Tests / fleet_sles15_backend_processes / Un-enrolling the agent stops backend processes – Backend Processes
Name: Initializing / End-To-End Tests / fleet_sles15_backend_processes / Re-enrolling the agent starts the elastic-agent process – Backend Processes
Name: Initializing / End-To-End Tests / fleet_sles15_backend_processes / Un-installing the installed agent – Backend Processes
Name: Initializing / End-To-End Tests / fleet_sles15_backend_processes / Un-enrolling Elastic Agent stops Elastic Endpoint – Backend Processes
Name: Initializing / End-To-End Tests / fleet_sles15_backend_processes / Removing Endpoint from Agent policy stops the connected Endpoint – Backend Processes
Name: Initializing / End-To-End Tests / fleet_debian_amd64_backend_processes / Un-enrolling the agent stops backend processes – Backend Processes
Name: Initializing / End-To-End Tests / fleet_debian_amd64_backend_processes / Re-enrolling the agent starts the elastic-agent process – Backend Processes
Name: Initializing / End-To-End Tests / fleet_debian_amd64_backend_processes / Un-installing the installed agent – Backend Processes
Name: Initializing / End-To-End Tests / fleet_debian_amd64_backend_processes / Un-enrolling Elastic Agent stops Elastic Endpoint – Backend Processes
Name: Initializing / End-To-End Tests / fleet_debian_amd64_backend_processes / Removing Endpoint from Agent policy stops the connected Endpoint – Backend Processes
Name: Initializing / End-To-End Tests / fleet_debian_arm64_fleet_mode_agent / Un-installing the installed agent – Fleet Mode Agent
Name: Initializing / End-To-End Tests / fleet_debian_amd64_fleet_mode_agent / Un-installing the installed agent – Fleet Mode Agent
Name: Initializing / End-To-End Tests / fleet_sles15_fleet_mode_agent / Un-installing the installed agent – Fleet Mode Agent
Name: Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Un-installing the installed agent – Fleet Mode Agent
Name: Initializing / End-To-End Tests / fleet_centos8_arm64_fleet_mode_agent / Un-installing the installed agent – Fleet Mode Agent
Name: Initializing / End-To-End Tests / fleet_sles15_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of metricbeat #2 – Running on top of Beats
Name: Initializing / End-To-End Tests / fleet_debian_amd64_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of metricbeat #2 – Running on top of Beats
Name: Initializing / End-To-End Tests / fleet_debian_arm64_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of metricbeat #2 – Running on top of Beats

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.

Instead of calling the compose, we are calling the bootstrapping code for the Fleet test suite but without any valid tag. Because we are setting DEVELOPER_MODE=true for the stack node, it will keep the stack even though the scenarios and tags finished. We also pass a non-existing gherkin tag, to avoid running any scenario but the bootstrap code, only.

.ci/ansible/tasks/setup_test_script.yml

* main: chore: double wait time for SSH (elastic#2066)

Signed-off-by: Adam Stokes <[email protected]>

.ci/ansible/playbook.yml

adam-stokes · 2022-01-31T13:51:02Z

@mdelapenya looks like there is some sort of internal server error happening now

Co-authored-by: Noémi Ványi <[email protected]>

Signed-off-by: Adam Stokes <[email protected]>

Instead of calling stop & start right after it, we are leveraging services ability to be restarted. For linux, systemctl will use "restart", for MacOS it will use "stop and start", for Windows, it's not supported yet

mdelapenya · 2022-02-07T14:50:41Z

I'd say this PR is putting the code base into a state that satisfy latest changes in kibana and fleet. This is ready to merge on my side, although we are still seeing 30-40 test errors that are more or less consistent. I guess that once #2096 is addressed by @michel-laterman they will be solved.

@adam-stokes if you agree, I think we should merge this one.

adam-stokes · 2022-02-07T17:07:24Z

@mdelapenya +1 on merging

mdelapenya · 2022-02-07T19:10:42Z

Given this pipeline is almost green https://beats-ci.elastic.co/blue/organizations/jenkins/e2e-tests%2Fe2e-testing-mbp/detail/PR-2064/44/pipeline, we are going to merge this PR and open issues for the following items:

APM Server scenarios: the agent never gets to the online status. I checked locally and it works, so my guesses are the code tries to bootstrap another stack because it uses the docker-compose provider for starting the agent container. Maybe using a vanilla testcontainers container request would help 🤔
k8s-autodiscover for elastic-agent: an issue already exists k8s-autodiscover for elastic-agent fails #1992

mdelapenya · 2022-02-07T19:15:39Z

@jlind23 @ph we are in a much better situation to start with the support matrix once we merged this little baby. I'm gonna send a post-mortem with all the things we had to do (listed as commits in this PR)

Please relate to #2064 (comment) for the existing failures

elasticmachine · 2022-02-07T19:20:36Z

💔 Tests Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-02-07T18:13:48.375+0000
Duration: 67 min 37 sec

Test stats 🧪

Test	Results
Failed	2
Passed	247
Skipped	0
Total	249

Test errors

Expand to view the tests failures

`Initializing / End-To-End Tests / kubernetes-autodiscover_debian_amd64_elastic-agent / [empty] – TEST-x86_64-kubernetes-autodiscover-5f1c0854-2022-02-07-18:47:51.xml`

no error details

Expand to view the stacktrace

 Test report file /var/lib/jenkins/workspace/PR-2064-44-8fa86f6b-cc1b-459c-aa8f-e7a7e92790ba/outputs/18.220.224.56/TEST-x86_64-kubernetes-autodiscover-5f1c0854-2022-02-07-18:47:51.xml was length 0

`Initializing / End-To-End Tests / fleet_debian_amd64_apm_server / Deploying a default stand-alone agent with the Elastic APM integration – APM Integration`

Expand to view the error details

 Step the "Elastic APM" integration is "added" in the policy

no stacktrace

Steps errors

Expand to view the steps failures

`Shell Script`

Took 12 min 49 sec . View more details here
Description: ssh -tt -o TCPKeepAlive=yes -o ServerAliveInterval=60 -o ServerAliveCountMax=10 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /var/lib/jenkins/workspace/PR-2064-44-97946160-e15f-4ffa-8b08-4167aebf7904/e2essh [email protected] -- 'sudo bash /home/admin/e2e-testing/.ci/scripts/functional-test.sh "apm_server && ~@nightly && ~@skip:amd64" '

`Archive the artifacts`

Took 0 min 0 sec . View more details here
Description: [2022-02-07T18:59:18.718Z] Archiving artifacts script returned exit code 2

🐛 Flaky test report

❕ There are test failures but not known flaky tests.

Expand to view the summary

Genuine test errors

💔 There are test failures but not known flaky tests, most likely a genuine test failure.

Name: Initializing / End-To-End Tests / kubernetes-autodiscover_debian_amd64_elastic-agent / [empty] – TEST-x86_64-kubernetes-autodiscover-5f1c0854-2022-02-07-18:47:51.xml
Name: Initializing / End-To-End Tests / fleet_debian_amd64_apm_server / Deploying a default stand-alone agent with the Elastic APM integration – APM Integration

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.

) * bump stack version 8.1.0-aa69d697 * fix: use new kibana roles and users We are going to use 'admin' everywhere * feat: add a method for checking ES cluster health * fix: create fleet-server after getting a service token from elasticsearch * fix: start stack using Fleet's test suite code Instead of calling the compose, we are calling the bootstrapping code for the Fleet test suite but without any valid tag. Because we are setting DEVELOPER_MODE=true for the stack node, it will keep the stack even though the scenarios and tags finished. We also pass a non-existing gherkin tag, to avoid running any scenario but the bootstrap code, only. * fix: typo * fix: selective execution of the .env for fleet suite * chore: try 'not in' * chore: use AND conditionals as a list * fix: check for stckRunner to be defined * fix: pass stackRunner var to the stack creation * fix: check for suite is defined first * fix: check for suite var * chore: use multiline for when condirtionals * fix docker install Signed-off-by: Adam Stokes <[email protected]> * fix suite definition for autodiscover Signed-off-by: Adam Stokes <[email protected]> * add kubectl to path Signed-off-by: Adam Stokes <[email protected]> * chore: bump elastic-agent versions to 8.1.hashed snapshot * fix: use docker provider for APM integration It will run on Debian AMD/ARM and SLES15 * chore: add client alive SSH settings * Revert "chore: add client alive SSH settings" This reverts commit 306551c. * chore: define SSHD server settings for runners * chore: skip ubi8 scenarios We need to adapt them to the dnew deployment model. See #2088 * fix: transform response from bytes to string * fix: properly read Input Streams and Vars * fix: expose port for 0.0.0.0 * fix: streams could go empty * fix: expose port for 0.0.0.0 * fix: support checking for process count in containers * chore: unskip apm-server on ubi8 * chore: always install docker on runners * chore: bump elastic-package to v0.36.0 * chore: use elastic-package for apm-server scenarios * chore: use elastic-package for apm-server scenarios * Revert "chore: use elastic-package for apm-server scenarios" This reverts commit b5896a8. * Revert "chore: use elastic-package for apm-server scenarios" This reverts commit c6c29ac. * chore: run dockerised tests only on debian * chore: do not print out tar extract command * fix: keep a Docker deployer for docker-based tests * fix: install docker for ARM * fix: remove invalid role vars for ARM * fix: install python-pip on ARM first * fix: install docker on ARM properly See https://www.docker.com/blog/getting-started-with-docker-for-arm-on-linux/ * fix: install docker on Suse * fix: typo Co-authored-by: Noémi Ványi <[email protected]> * Fix error checking for revoked enroll token Signed-off-by: Adam Stokes <[email protected]> * fix logging Signed-off-by: Adam Stokes <[email protected]> * chore: use empty streams if error * fix: pass fleet-server policy to fleet-server on bootstrap * fix: retrieve default fleet-server policy instead of creating a new one * chore: restart services with restart command Instead of calling stop & start right after it, we are leveraging services ability to be restarted. For linux, systemctl will use "restart", for MacOS it will use "stop and start", for Windows, it's not supported yet * chore: increase expire timeout of the service token to the max (1h) Co-authored-by: apmmachine <[email protected]> Co-authored-by: Adam Stokes <[email protected]> Co-authored-by: Noémi Ványi <[email protected]> (cherry picked from commit a31f807) # Conflicts: # cli/config/compose/profiles/fleet/docker-compose.yml # go.mod # go.sum

) * bump stack version 8.1.0-aa69d697 * fix: use new kibana roles and users We are going to use 'admin' everywhere * feat: add a method for checking ES cluster health * fix: create fleet-server after getting a service token from elasticsearch * fix: start stack using Fleet's test suite code Instead of calling the compose, we are calling the bootstrapping code for the Fleet test suite but without any valid tag. Because we are setting DEVELOPER_MODE=true for the stack node, it will keep the stack even though the scenarios and tags finished. We also pass a non-existing gherkin tag, to avoid running any scenario but the bootstrap code, only. * fix: typo * fix: selective execution of the .env for fleet suite * chore: try 'not in' * chore: use AND conditionals as a list * fix: check for stckRunner to be defined * fix: pass stackRunner var to the stack creation * fix: check for suite is defined first * fix: check for suite var * chore: use multiline for when condirtionals * fix docker install Signed-off-by: Adam Stokes <[email protected]> * fix suite definition for autodiscover Signed-off-by: Adam Stokes <[email protected]> * add kubectl to path Signed-off-by: Adam Stokes <[email protected]> * chore: bump elastic-agent versions to 8.1.hashed snapshot * fix: use docker provider for APM integration It will run on Debian AMD/ARM and SLES15 * chore: add client alive SSH settings * Revert "chore: add client alive SSH settings" This reverts commit 306551c. * chore: define SSHD server settings for runners * chore: skip ubi8 scenarios We need to adapt them to the dnew deployment model. See #2088 * fix: transform response from bytes to string * fix: properly read Input Streams and Vars * fix: expose port for 0.0.0.0 * fix: streams could go empty * fix: expose port for 0.0.0.0 * fix: support checking for process count in containers * chore: unskip apm-server on ubi8 * chore: always install docker on runners * chore: bump elastic-package to v0.36.0 * chore: use elastic-package for apm-server scenarios * chore: use elastic-package for apm-server scenarios * Revert "chore: use elastic-package for apm-server scenarios" This reverts commit b5896a8. * Revert "chore: use elastic-package for apm-server scenarios" This reverts commit c6c29ac. * chore: run dockerised tests only on debian * chore: do not print out tar extract command * fix: keep a Docker deployer for docker-based tests * fix: install docker for ARM * fix: remove invalid role vars for ARM * fix: install python-pip on ARM first * fix: install docker on ARM properly See https://www.docker.com/blog/getting-started-with-docker-for-arm-on-linux/ * fix: install docker on Suse * fix: typo Co-authored-by: Noémi Ványi <[email protected]> * Fix error checking for revoked enroll token Signed-off-by: Adam Stokes <[email protected]> * fix logging Signed-off-by: Adam Stokes <[email protected]> * chore: use empty streams if error * fix: pass fleet-server policy to fleet-server on bootstrap * fix: retrieve default fleet-server policy instead of creating a new one * chore: restart services with restart command Instead of calling stop & start right after it, we are leveraging services ability to be restarted. For linux, systemctl will use "restart", for MacOS it will use "stop and start", for Windows, it's not supported yet * chore: increase expire timeout of the service token to the max (1h) Co-authored-by: apmmachine <[email protected]> Co-authored-by: Adam Stokes <[email protected]> Co-authored-by: Noémi Ványi <[email protected]> (cherry picked from commit a31f807) # Conflicts: # cli/config/compose/profiles/fleet/docker-compose.yml # cli/config/compose/services/elastic-agent/docker-compose.yml # go.mod # go.sum # internal/deploy/base_test.go

…les best practices (#2101) * fix: update stack deployment to follow Users&Roles best practices (#2064) * bump stack version 8.1.0-aa69d697 * fix: use new kibana roles and users We are going to use 'admin' everywhere * feat: add a method for checking ES cluster health * fix: create fleet-server after getting a service token from elasticsearch * fix: start stack using Fleet's test suite code Instead of calling the compose, we are calling the bootstrapping code for the Fleet test suite but without any valid tag. Because we are setting DEVELOPER_MODE=true for the stack node, it will keep the stack even though the scenarios and tags finished. We also pass a non-existing gherkin tag, to avoid running any scenario but the bootstrap code, only. * fix: typo * fix: selective execution of the .env for fleet suite * chore: try 'not in' * chore: use AND conditionals as a list * fix: check for stckRunner to be defined * fix: pass stackRunner var to the stack creation * fix: check for suite is defined first * fix: check for suite var * chore: use multiline for when condirtionals * fix docker install Signed-off-by: Adam Stokes <[email protected]> * fix suite definition for autodiscover Signed-off-by: Adam Stokes <[email protected]> * add kubectl to path Signed-off-by: Adam Stokes <[email protected]> * chore: bump elastic-agent versions to 8.1.hashed snapshot * fix: use docker provider for APM integration It will run on Debian AMD/ARM and SLES15 * chore: add client alive SSH settings * Revert "chore: add client alive SSH settings" This reverts commit 306551c. * chore: define SSHD server settings for runners * chore: skip ubi8 scenarios We need to adapt them to the dnew deployment model. See #2088 * fix: transform response from bytes to string * fix: properly read Input Streams and Vars * fix: expose port for 0.0.0.0 * fix: streams could go empty * fix: expose port for 0.0.0.0 * fix: support checking for process count in containers * chore: unskip apm-server on ubi8 * chore: always install docker on runners * chore: bump elastic-package to v0.36.0 * chore: use elastic-package for apm-server scenarios * chore: use elastic-package for apm-server scenarios * Revert "chore: use elastic-package for apm-server scenarios" This reverts commit b5896a8. * Revert "chore: use elastic-package for apm-server scenarios" This reverts commit c6c29ac. * chore: run dockerised tests only on debian * chore: do not print out tar extract command * fix: keep a Docker deployer for docker-based tests * fix: install docker for ARM * fix: remove invalid role vars for ARM * fix: install python-pip on ARM first * fix: install docker on ARM properly See https://www.docker.com/blog/getting-started-with-docker-for-arm-on-linux/ * fix: install docker on Suse * fix: typo Co-authored-by: Noémi Ványi <[email protected]> * Fix error checking for revoked enroll token Signed-off-by: Adam Stokes <[email protected]> * fix logging Signed-off-by: Adam Stokes <[email protected]> * chore: use empty streams if error * fix: pass fleet-server policy to fleet-server on bootstrap * fix: retrieve default fleet-server policy instead of creating a new one * chore: restart services with restart command Instead of calling stop & start right after it, we are leveraging services ability to be restarted. For linux, systemctl will use "restart", for MacOS it will use "stop and start", for Windows, it's not supported yet * chore: increase expire timeout of the service token to the max (1h) Co-authored-by: apmmachine <[email protected]> Co-authored-by: Adam Stokes <[email protected]> Co-authored-by: Noémi Ványi <[email protected]> (cherry picked from commit a31f807) # Conflicts: # cli/config/compose/profiles/fleet/docker-compose.yml # go.mod # go.sum * fix: resolve conflicts Co-authored-by: Manuel de la Peña <[email protected]>

…oles best practices (#2103) * fix: update stack deployment to follow Users&Roles best practices (#2064) * bump stack version 8.1.0-aa69d697 * fix: use new kibana roles and users We are going to use 'admin' everywhere * feat: add a method for checking ES cluster health * fix: create fleet-server after getting a service token from elasticsearch * fix: start stack using Fleet's test suite code Instead of calling the compose, we are calling the bootstrapping code for the Fleet test suite but without any valid tag. Because we are setting DEVELOPER_MODE=true for the stack node, it will keep the stack even though the scenarios and tags finished. We also pass a non-existing gherkin tag, to avoid running any scenario but the bootstrap code, only. * fix: typo * fix: selective execution of the .env for fleet suite * chore: try 'not in' * chore: use AND conditionals as a list * fix: check for stckRunner to be defined * fix: pass stackRunner var to the stack creation * fix: check for suite is defined first * fix: check for suite var * chore: use multiline for when condirtionals * fix docker install Signed-off-by: Adam Stokes <[email protected]> * fix suite definition for autodiscover Signed-off-by: Adam Stokes <[email protected]> * add kubectl to path Signed-off-by: Adam Stokes <[email protected]> * chore: bump elastic-agent versions to 8.1.hashed snapshot * fix: use docker provider for APM integration It will run on Debian AMD/ARM and SLES15 * chore: add client alive SSH settings * Revert "chore: add client alive SSH settings" This reverts commit 306551c. * chore: define SSHD server settings for runners * chore: skip ubi8 scenarios We need to adapt them to the dnew deployment model. See #2088 * fix: transform response from bytes to string * fix: properly read Input Streams and Vars * fix: expose port for 0.0.0.0 * fix: streams could go empty * fix: expose port for 0.0.0.0 * fix: support checking for process count in containers * chore: unskip apm-server on ubi8 * chore: always install docker on runners * chore: bump elastic-package to v0.36.0 * chore: use elastic-package for apm-server scenarios * chore: use elastic-package for apm-server scenarios * Revert "chore: use elastic-package for apm-server scenarios" This reverts commit b5896a8. * Revert "chore: use elastic-package for apm-server scenarios" This reverts commit c6c29ac. * chore: run dockerised tests only on debian * chore: do not print out tar extract command * fix: keep a Docker deployer for docker-based tests * fix: install docker for ARM * fix: remove invalid role vars for ARM * fix: install python-pip on ARM first * fix: install docker on ARM properly See https://www.docker.com/blog/getting-started-with-docker-for-arm-on-linux/ * fix: install docker on Suse * fix: typo Co-authored-by: Noémi Ványi <[email protected]> * Fix error checking for revoked enroll token Signed-off-by: Adam Stokes <[email protected]> * fix logging Signed-off-by: Adam Stokes <[email protected]> * chore: use empty streams if error * fix: pass fleet-server policy to fleet-server on bootstrap * fix: retrieve default fleet-server policy instead of creating a new one * chore: restart services with restart command Instead of calling stop & start right after it, we are leveraging services ability to be restarted. For linux, systemctl will use "restart", for MacOS it will use "stop and start", for Windows, it's not supported yet * chore: increase expire timeout of the service token to the max (1h) Co-authored-by: apmmachine <[email protected]> Co-authored-by: Adam Stokes <[email protected]> Co-authored-by: Noémi Ványi <[email protected]> (cherry picked from commit a31f807) # Conflicts: # cli/config/compose/profiles/fleet/docker-compose.yml # cli/config/compose/services/elastic-agent/docker-compose.yml # go.mod # go.sum # internal/deploy/base_test.go * fix: resolve conflicts * chore: run go mod tidy * fix: resolve more conflicts Co-authored-by: Manuel de la Peña <[email protected]>

…oles best practices (#2102) * fix: update stack deployment to follow Users&Roles best practices (#2064) * bump stack version 8.1.0-aa69d697 * fix: use new kibana roles and users We are going to use 'admin' everywhere * feat: add a method for checking ES cluster health * fix: create fleet-server after getting a service token from elasticsearch * fix: start stack using Fleet's test suite code Instead of calling the compose, we are calling the bootstrapping code for the Fleet test suite but without any valid tag. Because we are setting DEVELOPER_MODE=true for the stack node, it will keep the stack even though the scenarios and tags finished. We also pass a non-existing gherkin tag, to avoid running any scenario but the bootstrap code, only. * fix: typo * fix: selective execution of the .env for fleet suite * chore: try 'not in' * chore: use AND conditionals as a list * fix: check for stckRunner to be defined * fix: pass stackRunner var to the stack creation * fix: check for suite is defined first * fix: check for suite var * chore: use multiline for when condirtionals * fix docker install Signed-off-by: Adam Stokes <[email protected]> * fix suite definition for autodiscover Signed-off-by: Adam Stokes <[email protected]> * add kubectl to path Signed-off-by: Adam Stokes <[email protected]> * chore: bump elastic-agent versions to 8.1.hashed snapshot * fix: use docker provider for APM integration It will run on Debian AMD/ARM and SLES15 * chore: add client alive SSH settings * Revert "chore: add client alive SSH settings" This reverts commit 306551c. * chore: define SSHD server settings for runners * chore: skip ubi8 scenarios We need to adapt them to the dnew deployment model. See #2088 * fix: transform response from bytes to string * fix: properly read Input Streams and Vars * fix: expose port for 0.0.0.0 * fix: streams could go empty * fix: expose port for 0.0.0.0 * fix: support checking for process count in containers * chore: unskip apm-server on ubi8 * chore: always install docker on runners * chore: bump elastic-package to v0.36.0 * chore: use elastic-package for apm-server scenarios * chore: use elastic-package for apm-server scenarios * Revert "chore: use elastic-package for apm-server scenarios" This reverts commit b5896a8. * Revert "chore: use elastic-package for apm-server scenarios" This reverts commit c6c29ac. * chore: run dockerised tests only on debian * chore: do not print out tar extract command * fix: keep a Docker deployer for docker-based tests * fix: install docker for ARM * fix: remove invalid role vars for ARM * fix: install python-pip on ARM first * fix: install docker on ARM properly See https://www.docker.com/blog/getting-started-with-docker-for-arm-on-linux/ * fix: install docker on Suse * fix: typo Co-authored-by: Noémi Ványi <[email protected]> * Fix error checking for revoked enroll token Signed-off-by: Adam Stokes <[email protected]> * fix logging Signed-off-by: Adam Stokes <[email protected]> * chore: use empty streams if error * fix: pass fleet-server policy to fleet-server on bootstrap * fix: retrieve default fleet-server policy instead of creating a new one * chore: restart services with restart command Instead of calling stop & start right after it, we are leveraging services ability to be restarted. For linux, systemctl will use "restart", for MacOS it will use "stop and start", for Windows, it's not supported yet * chore: increase expire timeout of the service token to the max (1h) Co-authored-by: apmmachine <[email protected]> Co-authored-by: Adam Stokes <[email protected]> Co-authored-by: Noémi Ványi <[email protected]> (cherry picked from commit a31f807) # Conflicts: # cli/config/compose/profiles/fleet/docker-compose.yml # cli/config/compose/services/elastic-agent/docker-compose.yml # go.mod # go.sum # internal/deploy/base_test.go * fix: resolve conflicts * chore: run go mod tidy * fix: resolve more conflicts Co-authored-by: Manuel de la Peña <[email protected]>

apmmachine and others added 4 commits January 27, 2022 11:55

bump stack version 8.1.0-aa69d697

dbf4132

fix: use new kibana roles and users

dd004a0

We are going to use 'admin' everywhere

feat: add a method for checking ES cluster health

ba5d9f8

fix: create fleet-server after getting a service token from elasticse…

5254105

…arch

mdelapenya added the backport-skip Skip notification from the automated backport with mergify label Jan 27, 2022

mdelapenya self-assigned this Jan 27, 2022

mdelapenya requested a review from a team January 27, 2022 18:37

mdelapenya commented Jan 27, 2022

View reviewed changes

.ci/ansible/tasks/setup_test_script.yml Outdated Show resolved Hide resolved

mdelapenya and others added 10 commits January 27, 2022 21:53

fix: typo

ad786f5

fix: selective execution of the .env for fleet suite

019f166

chore: try 'not in'

7608d4b

chore: use AND conditionals as a list

42239f0

fix: check for stckRunner to be defined

f146866

fix: pass stackRunner var to the stack creation

15b819d

fix: check for suite is defined first

2a56928

fix: check for suite var

ad87b88

Merge branch 'main' into fix-fleet-server-tokens

5cffc9f

* main: chore: double wait time for SSH (elastic#2066)

chore: use multiline for when condirtionals

e60bbb4

adam-stokes approved these changes Jan 28, 2022

View reviewed changes

adam-stokes added 3 commits January 28, 2022 13:26

fix docker install

0c7d9b7

Signed-off-by: Adam Stokes <[email protected]>

fix suite definition for autodiscover

6dd4774

Signed-off-by: Adam Stokes <[email protected]>

add kubectl to path

f90337f

Signed-off-by: Adam Stokes <[email protected]>

mdelapenya mentioned this pull request Jan 29, 2022

Add test for Kibana bootstrap using service token #1613

Open

mdelapenya commented Jan 31, 2022

View reviewed changes

.ci/ansible/playbook.yml Outdated Show resolved Hide resolved

Merge branch 'main' into fix-fleet-server-tokens

2f4d8aa

chore: bump elastic-agent versions to 8.1.hashed snapshot

1278d32

mdelapenya and others added 7 commits February 3, 2022 16:03

fix: typo

6d0187d

Co-authored-by: Noémi Ványi <[email protected]>

Fix error checking for revoked enroll token

4b384e9

Signed-off-by: Adam Stokes <[email protected]>

fix logging

c0f7c6c

Signed-off-by: Adam Stokes <[email protected]>

chore: use empty streams if error

f43683e

fix: pass fleet-server policy to fleet-server on bootstrap

49f64e6

fix: retrieve default fleet-server policy instead of creating a new one

be2e305

chore: restart services with restart command

34eedbe

Instead of calling stop & start right after it, we are leveraging services ability to be restarted. For linux, systemctl will use "restart", for MacOS it will use "stop and start", for Windows, it's not supported yet

mdelapenya mentioned this pull request Feb 4, 2022

Request 400 error during enrollment #2096

Closed

chore: increase expire timeout of the service token to the max (1h)

fc29242

mdelapenya merged commit a31f807 into elastic:main Feb 7, 2022

adam-stokes added backport-v7.16.0 Automated backport with mergify backport-v7.17.0 Automated backport with mergify backport-v8.0.0 Automated backport with mergify and removed backport-skip Skip notification from the automated backport with mergify labels Feb 7, 2022

mergify bot mentioned this pull request Feb 7, 2022

[8.0](backport #2064) fix: update stack deployment to follow Users&Roles best practices #2101

Merged

mergify bot mentioned this pull request Feb 7, 2022

[7.17](backport #2064) fix: update stack deployment to follow Users&Roles best practices #2102

Merged

mergify bot mentioned this pull request Feb 7, 2022

[7.16](backport #2064) fix: update stack deployment to follow Users&Roles best practices #2103

Merged

mdelapenya mentioned this pull request Feb 8, 2022

Remove dependency on Fleet Default policy #2039

Closed

mdelapenya deleted the fix-fleet-server-tokens branch March 9, 2022 06:42

fix: update stack deployment to follow Users&Roles best practices #2064

fix: update stack deployment to follow Users&Roles best practices #2064

Conversation

mdelapenya commented Jan 27, 2022

What does this PR do?

Why is it important?

Checklist

Related issues

mdelapenya commented Jan 27, 2022

elasticmachine commented Jan 27, 2022 • edited Loading

💔 Tests Failed

Build stats

Test stats 🧪

Test errors

Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Un-enrolling the agent deactivates the agent – Fleet Mode Agent

Initializing / End-To-End Tests / fleet_centos8_arm64_fleet_mode_agent / Un-enrolling the agent deactivates the agent – Fleet Mode Agent

Initializing / End-To-End Tests / fleet_debian_arm64_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent

Initializing / End-To-End Tests / fleet_debian_amd64_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent

Initializing / End-To-End Tests / fleet_debian_amd64_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent

Initializing / End-To-End Tests / fleet_sles15_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent

Initializing / End-To-End Tests / fleet_sles15_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent

Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent

Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent

Initializing / End-To-End Tests / fleet_centos8_arm64_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent

Steps errors

Shell Script

Archive the artifacts

Shell Script

Archive the artifacts

Shell Script

Archive the artifacts

Shell Script

Archive the artifacts

Shell Script

Archive the artifacts

🐛 Flaky test report

Genuine test errors

🤖 GitHub comments

adam-stokes commented Jan 31, 2022

mdelapenya commented Feb 7, 2022

adam-stokes commented Feb 7, 2022

mdelapenya commented Feb 7, 2022

mdelapenya commented Feb 7, 2022

elasticmachine commented Feb 7, 2022

💔 Tests Failed

Build stats

Test stats 🧪

Test errors

Initializing / End-To-End Tests / kubernetes-autodiscover_debian_amd64_elastic-agent / [empty] – TEST-x86_64-kubernetes-autodiscover-5f1c0854-2022-02-07-18:47:51.xml

Initializing / End-To-End Tests / fleet_debian_amd64_apm_server / Deploying a default stand-alone agent with the Elastic APM integration – APM Integration

Steps errors

Shell Script

Archive the artifacts

🐛 Flaky test report

Genuine test errors

🤖 GitHub comments

elasticmachine commented Jan 27, 2022 •

edited

Loading

`Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Un-enrolling the agent deactivates the agent – Fleet Mode Agent`

`Initializing / End-To-End Tests / fleet_centos8_arm64_fleet_mode_agent / Un-enrolling the agent deactivates the agent – Fleet Mode Agent`

`Initializing / End-To-End Tests / fleet_debian_arm64_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent`

`Initializing / End-To-End Tests / fleet_debian_amd64_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent`

`Initializing / End-To-End Tests / fleet_debian_amd64_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent`

`Initializing / End-To-End Tests / fleet_sles15_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent`

`Initializing / End-To-End Tests / fleet_sles15_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent`

`Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent`

`Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent`

`Initializing / End-To-End Tests / fleet_centos8_arm64_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent`

`Shell Script`

`Archive the artifacts`

`Shell Script`

`Archive the artifacts`

`Shell Script`

`Archive the artifacts`

`Shell Script`

`Archive the artifacts`

`Shell Script`

`Archive the artifacts`

`Initializing / End-To-End Tests / kubernetes-autodiscover_debian_amd64_elastic-agent / [empty] – TEST-x86_64-kubernetes-autodiscover-5f1c0854-2022-02-07-18:47:51.xml`

`Initializing / End-To-End Tests / fleet_debian_amd64_apm_server / Deploying a default stand-alone agent with the Elastic APM integration – APM Integration`

`Shell Script`

`Archive the artifacts`