-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker pool image automation #47
Comments
Also, we can probably apply similar ideas to our Azure images. |
To expand a bit on how "in-repo image upgrades" would work, I think To avoid too many unused pools lying around, we could have a check that inspects what pools each configured project is using, and warn when there are unused pools / images. |
i don't this this approach scales, and gives too much control to ci-config which is outside of project authority. I would prefer an approach where projects had full autonomy. Something like how Dockerfiles in tree are used for building docker images, under the full autonomy of the development team. I think we need to provide apis and services that allow teams to build their own images, rather than own the image configurations ourselves and allow people to use what we created. |
something along the lines of taskcluster/taskcluster-rfcs#122 |
Having full control of image building from within each project sounds wonderful! Is this something the TC team is planning to work towards? It might still be worth tackling some of this in the meantime as we're feeling the pain and need some kind of improvement here soon :) I'm not sure why this approach wouldn't scale however, to me it seems like it could almost be entirely automated other than needing to merge PRs, but perhaps I'm missing something. I agree that having full in-project control is the ideal, I guess I'm just struggling to see a concrete path that gets us there. |
With the Azure images, much of this is in line with what we are doing and planning on doing. |
While not necessarily a blocker, a potential downside to doing this is that it would put image building in the critical path of running builds/tests. Obviously this is already the case for docker images - but depending on how much time it adds to the critical path it may have some significant downside. This is one of the big upsides about putting a reference to an already existing image in the tree - you gain the in-tree control over what things are built on without putting anything new in the critical path. (I also agree that the ideal state is everything in the tree however.) |
Apologies, it is certainly a more automated approach than we currently have and a definite improvement. By not scaling, I really mean that any time a human needs to intervene to approve something (such as merge a PR), from a different team, we potentially block each other. We don't have 24/7 coverage in teams, so people will invariably need to wait. The more images that are created and managed, the more human resources you need to handle the requests. You are always constrained by the number of people that can respond to requests. But agreed, it is a lot better than the current approach, but I think it would be good to aim for one that doesn't require any central approval, so teams can have full autonomy. |
I think a key here is that it would be project maintainers merging the PRs, not releng or relops (well we would merge the PRs for new images, but not for new worker pools). Tbh, I really don't feel comfortable about this stuff just automatically going live into production without any human intervention. Edit: Re-reading your comment I don't think that's what you're suggesting, and I think you misunderstood my proposal. I'm proposing we move away from centralized gatekeepers here. See this line from initial comment:
|
/me notes that "decentralizing" some of this does change security boundaries, at least for the Fx CI case. I.e. RRA at some point, please. |
Following a monopacker cross-training session with aerickson, Ben and I had a chat around potential avenues for automation. I wanted to jot down some of the ideas while they are fresh in my mind. We can figure out how to put them into action via proper RFCs later.
I'll try to order them from easiest to wildest.
Automate monopacker builds
One of the pain points aerickson mentioned was that it was difficult to tell if you break another build while working on a current one (as they often use the same scripts). A simple first step could be to have tasks that build each image definition (without publishing). Then it's clear when something breaks and what broke it.
Automate image dependency upgrades
The next step could be to have cron tasks that look for new versions of certain dependencies. Things like Taskcluster / generic-worker / worker-runner etc. These could run and publish the images and we can update pools to them at our leisure.
aerickson mentioned image storage as a potential concern. May need a strategy for cleaning up old unused images.
Automate worker pool upgrades
The next step would be to automate using these images in the various pools. We could have a cron task that runs out of
ci-config
that looks for new images and then creates a pull request to update them (bear in mindci-config
is moving to Github).I don't think we would want these changes to go live automatically, but automated PRs or phab revisions would be very welcome!
In-repo image upgrades
This is the pinnacle of automation. The same cron task from the previous section (in ci-config) would run and look for newer versions of the images. Except in this case, the images are no longer defined in
ci-config
but rather in-repo. Or rather, the images are defined in ci-config, but the image a pool uses would be defined in-repo.The cron task would iterate through all projects, look at what images the repo is using, and if there is a newer one, create a PR to update it. Maintainers for each repo could decide to merge or close the PR.
This has many benefits:
It's worth noting that Gecko won't be using pull requests, so we'd need to submit a phab revision in that case.
The text was updated successfully, but these errors were encountered: