Elastic Agent docker: new revisions are not getting released #24198

mtojek · 2021-02-24T08:45:15Z

Hi,

due to the bug introduced to the 7.x (and 7.12) branch, the latest snapshot is failing. The bug has been fixed in #24163 and #24161 , but the Docker image hasn't been released due to other issues in the unified release process.

With this problem in SNAPSHOT the package development is blocked for few days in elastic/integration, elastic/elastic-package (all statuses are red). We can hardcode the correct Docker image in both repos, but it means for us a lot of operational work - start a day by reviewing all Beats branches and check if there might be a blocker (failed build), then update a map of hardcoded image references in elastic-package (for 7.13, 7.12, 7.11).

The goal of this issue is to figure out and implement a way of publishing Agent's images independently from other potentially problematic parties.

cc @andresrc @ph @ruflin @ycombinator

ruflin · 2021-02-24T08:50:59Z

I'm not sure if the goal should be to introduce a separate build process. It is unfortunate that the build is broken and we should ensure on the Beats / Agent end, we detect such problems before they are merged.

Is the package development really blocked? It seems the master snapshots are available so development against mater should still work?

mtojek · 2021-02-24T08:57:53Z

It is, because we can't test against current/older releases. Let me bring few issues:

elastic/integrations#736 - I can't reverify if the problem is fixed for 7.11.
elastic/elastic-package#260 - updating the package spec and checking if it's correct (packages can be installed in particular stack version)
elastic/integrations#740 - package compatibility check with older stacks (work in progress, but won't be possible).
elastic/elastic-package#261 - can't verify if we can bump up the stack to 7.13, if everything works. We would have to do it blindly and keep fingers crossed that eventually it will work

Also:

currently every CI status blinks red for masters, we can't just change it to 8.0.0, as there is a risk of introducing faulty PRs (valid for 8.00, not valid for 7.12/7.13).
Integrations dev will start ignoring the CI status as it can't even bring the Elastic stack to a stable state ("CI issues unrelated").

I'm not sure if the goal should be to introduce a separate build process. It is unfortunate that the build is broken and we should ensure on the Beats / Agent end, we detect such problems before they are merged.

My impression is that there is too much coupling in the build process, that even correct products/bugfixes are blocked from releasing by a single item.

elasticmachine · 2021-02-24T09:01:08Z

Pinging @elastic/agent (Team:Agent)

botelastic · 2021-02-24T09:01:18Z

This issue doesn't have a Team:<team> label.

andresrc · 2021-02-24T10:16:19Z

Why are old versions a problem?

mtojek · 2021-02-24T10:20:22Z

It's because of affected branches (backport not released):

7.x contains changes scheduled for 7.13
7.12

7.11 is old enough that we'll skip testing for some packages (compatible with newer Kibana versions)

EDIT:

Yet another build has just failed, which probably means next 12h of delay.

ph · 2021-02-24T16:24:14Z

@mtojek Just to clarify, the problem is because the unified process produced artifacts are olds and do not include the latest fixes, the new build arent completing?

First, are we responsible for the failure of the build? Are our beats or Elastic Agent breaking the build? This we can control and prioritize, if this is not the case we need to look with infra, they are looking into better way to notify us.

Like @ruflin said, I don't think introducing a new build is the solution. We have many moving parts, Elastic-Agent, Endpoint, Beats, ES, and Kibana, there are a lot of dependencies and things that need to happen to have confidence in the binary.

mtojek · 2021-02-24T17:27:33Z

@mtojek Just to clarify, the problem is because the unified process produced artifacts are olds and do not include the latest fixes, the new build arent completing?

Yes.

First, are we responsible for the failure of the build? Are our beats or Elastic Agent breaking the build? This we can control and prioritize, if this is not the case we need to look with infra, they are looking into better way to notify us.

I think we should put more emphasis on ownership here and be more proactive in emergency situations, not just sit and wait until the next build appears. In this particular situation the image has been built correctly, but it was discovered later that it doesn't boot up correctly. I had an interesting conversation around this issue with @ycombinator . Of course we can improve the test coverage, but there might be always a situation in which this is insufficient. Such cases should be treated differently to reduce the blast and don't block next customers (e.g. integrations developers, etc.).

What do you think about introducing a tagging based solution for Docker images? Let's say that the "unified" builder tags the latest built image with the -STABLE tag. In case of detecting a faulty behavior, we can control the tag to simply revert the feature if it's possible by just retagging.

ruflin · 2021-02-25T08:55:18Z

I expect future community developers to develop against stable version of the stack, so I would assume this should not happen. A bit similar on our end, I expect us to be more and more able to develop against stable / released versions.

Broken / failed builds will keep happening from time to time, be it that we are in control or someone else. One thing that was great during the development of the package-registry was that each PR and each commit to master had its own docker image / tag. So in case of a broken "latest/SNAPSHOT", a temporary tag could be used. It would be nice, if we would have something similar for the SNAPSHOT builds, so not only having 8.0.0-SNAPSHOT but also 8.0.0-3ac34-SNAPSHOT available so we could go back a few days in case things are broken. AFAIK this does not exist yet? This is very similar to what @mtojek proposed or would be a requirement because otherwise a STABLE tag cannot be introduced if this older images don't exist anymore. @kuisathaverat You might know more here?

kuisathaverat · 2021-03-02T09:41:46Z

we have short of that, recently we have added the package to the main pipeline, so every time you merge in the master and the build reach the elastic-agent package stage, if that stage end well, a bunch of Docker images would be published in our repository.

docker push docker.elastic.co/observability-ci/elastic-agent:8.0.0-SNAPSHOT-amd64
docker push docker.elastic.co/observability-ci/elastic-agent:a84508c749455ef9228ba1024580279e4cc86ab7-amd64
docker push docker.elastic.co/observability-ci/elastic-agent:8.0-SNAPSHOT-amd64
docker push docker.elastic.co/observability-ci/elastic-agent-ubi8:8.0.0-SNAPSHOT-amd64
docker push docker.elastic.co/observability-ci/elastic-agent-ubi8:a84508c749455ef9228ba1024580279e4cc86ab7-amd64
docker push docker.elastic.co/observability-ci/elastic-agent-ubi8:8.0-SNAPSHOT-amd64

ARM Docker images are also published.

PRs also publish the Docker images

docker push docker.elastic.co/observability-ci/elastic-agent:pr-24220-amd64
docker push docker.elastic.co/observability-ci/elastic-agent:896efa2c57bb8be6eeca1d5a62b76a613960a614-amd64
docker push docker.elastic.co/observability-ci/elastic-agent-ubi8:pr-24220-amd64
docker push docker.elastic.co/observability-ci/elastic-agent-ubi8:896efa2c57bb8be6eeca1d5a62b76a613960a614-amd64

mtojek · 2021-03-02T13:34:18Z

Are these images publicly available (docker.elastic.co/observability-ci/elastic-agent)? For the purpose of community contributors, we'll need something that doesn't require any special auth.

kuisathaverat · 2021-03-03T08:49:52Z

you need to login to access that namespace, we can publish them in another public place

mtojek · 2021-03-03T12:10:31Z

you need to login to access that namespace, we can publish them in another public place

Yes, that would be a good idea. I don't know green/red stats for packaging of Elastic Agent, but I hope it's relatively rare, so we can leverage from such tags.

botelastic · 2022-03-03T12:34:43Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

mtojek added the Agent label Feb 24, 2021

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 24, 2021

mtojek removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 24, 2021

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 24, 2021

mtojek added the Team:Elastic-Agent Label for the Agent team label Feb 24, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 24, 2021

mtojek added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 24, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 24, 2021

This was referenced Feb 24, 2021

Update fields for Netflow module elastic/integrations#697

Merged

Change s3 input name to aws-s3 elastic/integrations#631

Merged

This was referenced Feb 24, 2021

Temporarily hardcode agent image reference elastic/elastic-package#263

Merged

Temporarily use hardcoded agent image reference elastic/integrations#744

Merged

botelastic bot added the Stalled label Mar 3, 2022

botelastic bot closed this as completed Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elastic Agent docker: new revisions are not getting released #24198

Elastic Agent docker: new revisions are not getting released #24198

mtojek commented Feb 24, 2021

ruflin commented Feb 24, 2021

mtojek commented Feb 24, 2021 •

edited

Loading

elasticmachine commented Feb 24, 2021

botelastic bot commented Feb 24, 2021

andresrc commented Feb 24, 2021

mtojek commented Feb 24, 2021 •

edited

Loading

ph commented Feb 24, 2021

mtojek commented Feb 24, 2021

ruflin commented Feb 25, 2021

kuisathaverat commented Mar 2, 2021 •

edited

Loading

mtojek commented Mar 2, 2021

kuisathaverat commented Mar 3, 2021

mtojek commented Mar 3, 2021 •

edited

Loading

botelastic bot commented Mar 3, 2022

Elastic Agent docker: new revisions are not getting released #24198

Elastic Agent docker: new revisions are not getting released #24198

Comments

mtojek commented Feb 24, 2021

ruflin commented Feb 24, 2021

mtojek commented Feb 24, 2021 • edited Loading

elasticmachine commented Feb 24, 2021

botelastic bot commented Feb 24, 2021

andresrc commented Feb 24, 2021

mtojek commented Feb 24, 2021 • edited Loading

ph commented Feb 24, 2021

mtojek commented Feb 24, 2021

ruflin commented Feb 25, 2021

kuisathaverat commented Mar 2, 2021 • edited Loading

mtojek commented Mar 2, 2021

kuisathaverat commented Mar 3, 2021

mtojek commented Mar 3, 2021 • edited Loading

botelastic bot commented Mar 3, 2022

mtojek commented Feb 24, 2021 •

edited

Loading

mtojek commented Feb 24, 2021 •

edited

Loading

kuisathaverat commented Mar 2, 2021 •

edited

Loading

mtojek commented Mar 3, 2021 •

edited

Loading