-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switched to Run Checks for Building Images. #11276
Conversation
The CI and PROD Docker Images for the build are prepared in a separate "Build Image" workflow, You can checks the status of those images in The workflow run |
@potiuk This is better than it was, certainly, but a question: Is it possible to have the build image workflow run against the actual PR as a second workflow, as we do with the "Up to Date" check, WIP, or the CodeQL? I.e. what is special about the Build Image workflow that means it has to be against master, not the PR? ("I didn't have time to look" is of course a perfectly acceptable answer, just catching up here as this build feature was added when I was off on paterntiy leave and don't want to duplicate any effort) |
The Build Workflow run is cancelling this PR. The job has been cancelled by another workflow. |
It runs as 'workflow_run' which is the only way to get "write" permission that allows to push the build image to the registry. There is no other way to do it.
Sure :) I woudl be more than happy that more people get to know the build process. |
We cannot do it easily. Currently, there is no easy way to find out which of the workflows in the "workflow_run" is the one that runs the build. This link is missing in the current API. It might require even more complex querying which is the workflow_run tha we should link to and it requires another custom Github Action likely to loop through all the workflow runs running and finding the right one. I (or maybe someone else who would like to take care about it) might want to do it, but I want to do now minimal set of changes that are possible. |
The CI and PROD Docker Images for the build are prepared in a separate "Build Image" workflow, You can checks the status of those images in The workflow run |
FYI - this one was really helpful in this case. By looking at the link in the comment, you will find that the build of 3.8 image failed because of some transient network problem resulting in ("Unknown blob"). that's why I re-pushed it. @turbaszek -> I think it is rather straightforward to go to the link and see that the image build has been canceled ? Any improvements here that we can think of? |
Ahhh, PRs on forked repos only have read permissions, got it. Makes sense.
Ah yeah, just looked at how this all works. Got it. |
And yeah. This is complex. Far complex than if GA had better support for canceling workflows and (highly requested) possibility of having a selective tokens (for example only allowing to push to github registry by the PR builds). But we currently have to leave with "PR = read" , " Worfklow_run" = "Write" and that it is a feature not fully, perfectly integrated with the UI of GitHub. I think it will get better over time, but if we can get our users to wait 30% less for the builds (and that we are not queuing continuously) and in very near future after #10507 got implemented, it will allow us reduce it by 60% or more, then I think we should use it. I think we would not be able to survive the MySQL8 added without it because then every job added the build time for images which is now pretty much gone. |
@@ -307,6 +316,20 @@ jobs: | |||
- name: "Push PROD images ${{ matrix.python-version }}:${{ github.event.workflow_run.id }}" | |||
run: ./scripts/ci/images/ci_push_production_images.sh | |||
if: matrix.image-type == 'PROD' && steps.defaults.outputs.proceed == 'true' | |||
- name: GitHub Checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- name: GitHub Checks | |
- name: Add details link to image build job |
I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They actually add Github Checks but yeah. I can make it clearer
Oh yeah, the "build once" is definitely a huge win and was on my list of "back-burner niggles to look at at some point". I'm a little bit confused where this details shows up though? I looked in : |
This is where actual jobs are executed and you can see errors that result from - for example - wrong dependencies. Here you do not have much information and can't easily have as I explained in #11276 (comment) - happy if someone writes another action (I already added quite a lot to the "cancel workflow" action to get it all possible using GitHub APIS. that might be a good exercise for someone try to learn about the process. Those jobs are basically polling for images to appear in Github Registry. This is by far the easiest way to check if the images are ready. Trying to use GithHub APIs to retrieve this information will be significantly more complex. GitHub does not have good support for workflow_runs linking to the PR :(. So I am using "Github Checks" to simulate that. So I am reporting status using checks from the other workflow. Unfortunately When I try to pass "details_url" to this workflow, Github apparently overrides the detail_url with link to the "Check" and not link to the "workflow build job". So instead of passing the URL in "Details_url" I am passing it in the "summary message" - this way from the "Build Image Check" you can get the link to the "Build Image Job". |
You can see how it will look like after this PR gets merged: https://github.com/potiuk/airflow/pull/107/checks?check_run_id=1208262319 In the CI Builds you get "Build CI/PROD Image X.Y" checks - that show the status (in_progress/succeeded/failed) and have a link top the actual "Build CI/PROD Image" job from the other workflow. |
Ah right. Think my confusion was do to do with the change not showing up in this PR, but given the build has to happen from master, and that can't securely use the scripts from this PR, that makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, but please change the name of the "GitHub Checks" jobs to be a little bit clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one
Yes it's clearer but what should I do as a contributor (without right to re-run jobs) to start check on my PR if the "image building" is failing and I have no influence over that? |
Those images do not fail "like that" - they fail for some reasons and now the user will at least have a chance to know what to do and where to look at. So it's the same as before but you at least know where to look for a problem. Image building will fail if a) there is a transient error or b) when you made a change - for example in the Dockerfile that made it to fail. c) someone else merged a failing change. In a) case you can commit --amend and re-push the change. I think previously it was all the same but you did not know which of the a) b) c) cases you are at. This change actually makes it possible to find it out easily (by following the link in the comment and exploring the logs). |
Do you thin we should explain it somewhere @turbaszek ? I think those three cases are general cases that people running CI build experience - no matter if it's building the image failurre, running test failure, doc check failure, spell check failure, static check failure. In all those cases those a) b) c) cases are what can happen. Should we maybe start to gather some kind of cheatsheet for all those cases like "if you see this and that in the loigs -> it's probably transient error". We already have quite a number of cases like that that we can tell from the top of our head. Do you think it makes sense to do something like that? Or maybe we should do it differently ? Maybe you'd like to start such a "cheatsheet" ? |
To be honest, I'm not sure what we should do. I just express my confusion that comes from what I see now in PRs that I'm reviewing. For example:
and even more, it shows "When sending tasks to celery from a sub-process, rese..." PR which is totally unrelated to the original change. I would prefer to not require our contributors to scratch their heads and wonder what's going on. Also, pinging people on slack to make CI runnable is not a best approach imho. |
Replaces the annoying comments with "workflow_run" links with Run Checks. Now we will be able to see the "Build Image" checks in the "Checks" section including their status and direct link to the steps running the image builds as "Details" link. Unfortunately Github Actions do not handle well the links to details - even if you provide details_url to link to the other run, the "Build Image" checks appear in the original workflow, that's why we had to introduce another link in the summary of the Build Image check that links to the actual workflow.
The CI and PROD Docker Images for the build are prepared in a separate "Build Image" workflow, You can checks the status of those images in The workflow run |
Apparently there is a transient problem with the image building this time so a) case. |
This is precisely what I mean by cheatsheet. If we have such cheatsheet we could somehow explain to the users "if you see unknown blob" then it is likely a transient error which requires to "commit --amend and push". It's rather difficult to foresee all kind of transient errors you might have - such as "403 error" when downloading an apt package. Do you complain about the link now or about the message printed in case of the "unknown blob" transient error?
I cannot do much about it - workflow runs run with the "latest master" and those "latest master" are the only worflows than can have write access that can push stuff to Github Registry. I already provided all the necessary information in the name of the "Build Info" step above - when you click at it you will see the original branch of such pull request among others. Do you have any suggestions what we can do instead? I would love to collaborate on making it clearer for the users. Any constructive suggestions are welcome. |
I updated the names of the steps to be more accurate, but I also changed something else. In order to differentiate between the "Build Image step" and "Build Image Check" I named them visibly differently now: In the original workflow you will see the "Status of the image build" checks: Those have links to the "Build image" jobs. Then the link will lead you to the "Workflow_run" where you will see the "Build image" jobs themselves: |
I think we should merge this one and adjust with the time being. I'm sorry but I don't have enough capacity to fully understand the problem now |
Yep. This is something I am planning to iterate over continuously (doing it for years now). And GitHub Actions will eventually improve the UI. They are adding features very fast, but it means that some of the UI around that (like missing links to workflow_runs) is not perfect. But I think our problems with CI are mostly around how to make it really fast and minimize unnecessary overhead as we are often blocked by the queue. |
FY: This is even nicer. I could not see it reliably working in my fork, but I see that it nicely works in Airflow. The "Status of the Image Build" checks are also showing up now directly in the "Checks" window in the Pull request - from there you can click the "Image Buld" URL and get to the "Image Build job". I think this is as close as we can get it now - I hope Github Actions will integrate this kind of Workflow better in the future, but for now I think I fixed all the missing features there:
|
Replaces the annoying comments with "workflow_run" links with Run Checks. Now we will be able to see the "Build Image" checks in the "Checks" section including their status and direct link to the steps running the image builds as "Details" link. Unfortunately Github Actions do not handle well the links to details - even if you provide details_url to link to the other run, the "Build Image" checks appear in the original workflow, that's why we had to introduce another link in the summary of the Build Image check that links to the actual workflow. (cherry picked from commit a33a919)
Replaces the annoying comments with "workflow_run" links with Run Checks. Now we will be able to see the "Build Image" checks in the "Checks" section including their status and direct link to the steps running the image builds as "Details" link. Unfortunately Github Actions do not handle well the links to details - even if you provide details_url to link to the other run, the "Build Image" checks appear in the original workflow, that's why we had to introduce another link in the summary of the Build Image check that links to the actual workflow. (cherry picked from commit a33a919)
Replaces the annoying comments with "workflow_run" links with Run Checks. Now we will be able to see the "Build Image" checks in the "Checks" section including their status and direct link to the steps running the image builds as "Details" link. Unfortunately Github Actions do not handle well the links to details - even if you provide details_url to link to the other run, the "Build Image" checks appear in the original workflow, that's why we had to introduce another link in the summary of the Build Image check that links to the actual workflow. (cherry picked from commit a33a919)
Replaces the annoying comments with "workflow_run" links with Run Checks. Now we will be able to see the "Build Image" checks in the "Checks" section including their status and direct link to the steps running the image builds as "Details" link. Unfortunately Github Actions do not handle well the links to details - even if you provide details_url to link to the other run, the "Build Image" checks appear in the original workflow, that's why we had to introduce another link in the summary of the Build Image check that links to the actual workflow. (cherry picked from commit a33a919)
Replaces the annoying comments with "workflow_run" links with Run Checks. Now we will be able to see the "Build Image" checks in the "Checks" section including their status and direct link to the steps running the image builds as "Details" link. Unfortunately Github Actions do not handle well the links to details - even if you provide details_url to link to the other run, the "Build Image" checks appear in the original workflow, that's why we had to introduce another link in the summary of the Build Image check that links to the actual workflow. (cherry picked from commit a33a919)
Replaces the annoying comments with "workflow_run" links with Run Checks. Now we will be able to see the "Build Image" checks in the "Checks" section including their status and direct link to the steps running the image builds as "Details" link. Unfortunately Github Actions do not handle well the links to details - even if you provide details_url to link to the other run, the "Build Image" checks appear in the original workflow, that's why we had to introduce another link in the summary of the Build Image check that links to the actual workflow. (cherry picked from commit a33a919)
Replaces the annoying comments with "workflow_run" links
with Run Checks. Now we will be able to see the "Build Image"
checks in the "Checks" section including their status and direct
link to the steps running the image builds as "Details" link.
Unfortunately Github Actions do not handle well the links to
details - even if you provide details_url to link to the other
run, the "Build Image" checks appear in the original workflow,
that's why we had to introduce another link in the summary of
the Build Image check that links to the actual workflow.
This is the best I can do now:
The "Build Image" checks appear in the "checks" of the PR. You can see their status (in_progress/ok/failed) and when you click such a "Build Image check" you will see the link to the other workflow running the build:
I think it's a very good solution - much better than the annoying comment.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.