Skip to content
This repository has been archived by the owner on Sep 17, 2024. It is now read-only.

Run Fleet's 'upgrade agent' tests in nightly builds (keep them skipped in PR CI) #652

Closed
mdelapenya opened this issue Jan 25, 2021 · 9 comments
Assignees
Labels
automation enhancement New feature or request Team:Elastic-Agent Label for the Agent team

Comments

@mdelapenya
Copy link
Contributor

These tests are currently skipped, but we'd like to run them in our nightly builds.

For that:

  • we should add the @nightly annotation to the scenarios, removing @skip
  • we should remove @nightly from PRs
  • we should support passing tags from scheduled jobs to the main, general-purpose job, in the form of a Jenkins input parameter
  • we should pass the @nightly annotation from the scheduled, nightly jobs

Thoughts? @elastic/observablt-robots @ph @EricDavisX @michalpristas

@mdelapenya mdelapenya added enhancement New feature or request automation Team:Elastic-Agent Label for the Agent team labels Jan 25, 2021
@mdelapenya mdelapenya self-assigned this Jan 25, 2021
@ph
Copy link
Contributor

ph commented Jan 25, 2021

Can we link the blocker issue that force us to skip theses tests?

@EricDavisX
Copy link
Contributor

I think this would work. @ph here is the only reference I have for the disabling problem:
#537

@mdelapenya
Copy link
Contributor Author

I'm thinking about this issue more thoroughly: I'd prefer defining what to run in the PR build instead, running all tests in the nightly build.

For that, I'd like to engage with the entire Elastic Agent team (devs and PM) so that we all together define the priorities for what to run. IMHO we must:

  1. visit all scenarios, so that each party knows about them
  2. identify what is the priority for each one. This priority should be aligned across functional teams: product managers, devs, testing. I suggest defining priorities each team separately, and then sharing and comparing results. A good initial criteria would be:
    • P1: critical priority, run on each PR AND nightly builds. If the scenario fails, the user is not able to perform the main goal of the software under tests.
    • P2: medium priority, run on nightly builds. If the scenario fails, the user has a workaround to continue.
    • P3: low priority, run on nightly builds. If the scenario fails, the user doesn't care about the error, as it's a cosmetic or not related one.
  3. tag each scenario with the proper priority label: @p1, @p2, @p3, so that we are able to run P1s in the PR builds.

After this prioritisation session we should have a clear understanding of what is tested, when, and most importantly why.

As a benefit of this initiative, the PR jobs will reduce the build time, improving the time to receive feedback after a PR is sent (builds are starting to take 30-40 mins, including a lot of wait-for-results situations -some times it takes 5-10 minutos to receive agent events).

@ph @kseniia-kolpakova I'm open to any new ideas on this.

@EricDavisX
Copy link
Contributor

I have a few thoughts, I like prioritization. But, the 'prioritization' here overloads the term in a bad way - the upgrade tests are extremely important, but we just can't test them during a PR easily. I rather prefer the idea of having a '@skip-pr' tag for any tests that shall be skipped during PR CI, regardless of it's priority. And we can prioritize them too, for when we need to reduce scope. We'll need to upgrade linting rules to allow more tags, fyi.

More discourse: I think we would attempt to keep as many tests as possible in the short term, to be run during PR CI. We don't have that many tests yet such that it is a problem I recommend tackling in a reduction of tests executed. One theory is that if the test is important enough to be written, we should keep it running with *some expected value (even if we adjust it over time). About this one test, the 'upgrade' test is the only one we've found that can't easily be written to run against PRs. It may be the only one for a long time. I think we'll appreciate all the test work more, if it informs developers precisely when they push commits as to whether they broke a test. It is the absolute best way to engage them to fix / update them, and scale the team.

Having said that, I don't mind using P1/P2/P3 notation, it has advantages for sure, and pulling in the Devs / PM / Leads is a great practice.

@mdelapenya
Copy link
Contributor Author

Related to the implementation details, adding a @nightly tag will be simpler, as we can instrument the pipeline to add that for branches and not for PRs. And this would be the simplest approach.

@ph
Copy link
Contributor

ph commented Jan 26, 2021

I would like to raise here, as @EricDavisX said we want to keep as many tests as possible. Looking at the prioritization and tagging are we effectively looking at the right problem? Looking at @mdelapenya comment concerning the wait-for-results should we instead focus on reducing the overall run time? As the suite grows we might want to have a way to select priority but I am not sure we need to cross the bridge yet.

@mdelapenya Can you provide some stats concerning the run time and we can take a look at how we could make them faster?

@mdelapenya
Copy link
Contributor Author

mdelapenya commented Jan 26, 2021

Yes, I'm building a dashboard in Kibana for test times, using our jenkins-stats cluster. Will share results soon

SPOILER: I'm not a Kibana user

@EricDavisX EricDavisX changed the title Run Fleet's update tests in nightly builds Run Fleet's 'upgrade agent' tests in nightly builds (keep them skipped in PR CI) Feb 3, 2021
@EricDavisX
Copy link
Contributor

We got tests passing today, and I see some merges, like the above - can we re-sync up and see what else needs to be done? And if nothing, we can make notes in the e2e-testing docs to make sure devs know expectations and how to work with the system.

@mdelapenya
Copy link
Contributor Author

This task has been accomplished. Closing

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
automation enhancement New feature or request Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

3 participants