Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switched to Run Checks for Building Images. #11276

Merged
merged 1 commit into from
Oct 5, 2020

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Oct 5, 2020

Replaces the annoying comments with "workflow_run" links
with Run Checks. Now we will be able to see the "Build Image"
checks in the "Checks" section including their status and direct
link to the steps running the image builds as "Details" link.

Unfortunately Github Actions do not handle well the links to
details - even if you provide details_url to link to the other
run, the "Build Image" checks appear in the original workflow,
that's why we had to introduce another link in the summary of
the Build Image check that links to the actual workflow.

This is the best I can do now:

The "Build Image" checks appear in the "checks" of the PR. You can see their status (in_progress/ok/failed) and when you click such a "Build Image check" you will see the link to the other workflow running the build:

Screenshot from 2020-10-05 11-17-12

I think it's a very good solution - much better than the annoying comment.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@github-actions
Copy link

github-actions bot commented Oct 5, 2020

The CI and PROD Docker Images for the build are prepared in a separate "Build Image" workflow,
that you will not see in the list of checks (you will see "Wait for images" jobs instead).

You can checks the status of those images in The workflow run

@ashb
Copy link
Member

ashb commented Oct 5, 2020

@potiuk This is better than it was, certainly, but a question: Is it possible to have the build image workflow run against the actual PR as a second workflow, as we do with the "Up to Date" check, WIP, or the CodeQL? I.e. what is special about the Build Image workflow that means it has to be against master, not the PR?

("I didn't have time to look" is of course a perfectly acceptable answer, just catching up here as this build feature was added when I was off on paterntiy leave and don't want to duplicate any effort)

@github-actions
Copy link

github-actions bot commented Oct 5, 2020

The Build Workflow run is cancelling this PR. The job has been cancelled by another workflow.

@ashb
Copy link
Member

ashb commented Oct 5, 2020

We should also show such a message on the "Wait for X images" stage:

image

@potiuk
Copy link
Member Author

potiuk commented Oct 5, 2020

@potiuk This is better than it was, certainly, but a question: Is it possible to have the build image workflow run against the actual PR as a second workflow, as we do with the "Up to Date" check, WIP, or the CodeQL? I.e. what is special about the Build Image workflow that means it has to be against master, not the PR?

It runs as 'workflow_run' which is the only way to get "write" permission that allows to push the build image to the registry. There is no other way to do it.

("I didn't have time to look" is of course a perfectly acceptable answer, just catching up here as this build feature was added when I was off on paterntiy leave and don't want to duplicate any effort)

Sure :) I woudl be more than happy that more people get to know the build process.

@potiuk
Copy link
Member Author

potiuk commented Oct 5, 2020

We should also show such a message on the "Wait for X images" stage:

We cannot do it easily. Currently, there is no easy way to find out which of the workflows in the "workflow_run" is the one that runs the build. This link is missing in the current API. It might require even more complex querying which is the workflow_run tha we should link to and it requires another custom Github Action likely to loop through all the workflow runs running and finding the right one. I (or maybe someone else who would like to take care about it) might want to do it, but I want to do now minimal set of changes that are possible.

@github-actions
Copy link

github-actions bot commented Oct 5, 2020

The CI and PROD Docker Images for the build are prepared in a separate "Build Image" workflow,
that you will not see in the list of checks (you will see "Wait for images" jobs instead).

You can checks the status of those images in The workflow run

@potiuk
Copy link
Member Author

potiuk commented Oct 5, 2020

The Build Workflow run is cancelling this PR. The job has been cancelled by another workflow.

FYI - this one was really helpful in this case. By looking at the link in the comment, you will find that the build of 3.8 image failed because of some transient network problem resulting in ("Unknown blob"). that's why I re-pushed it. @turbaszek -> I think it is rather straightforward to go to the link and see that the image build has been canceled ? Any improvements here that we can think of?

@ashb
Copy link
Member

ashb commented Oct 5, 2020

@potiuk This is better than it was, certainly, but a question: Is it possible to have the build image workflow run against the actual PR as a second workflow, as we do with the "Up to Date" check, WIP, or the CodeQL? I.e. what is special about the Build Image workflow that means it has to be against master, not the PR?

It runs as 'workflow_run' which is the only way to get "write" permission that allows to push the build image to the registry. There is no other way to do it.

Ahhh, PRs on forked repos only have read permissions, got it. Makes sense.

We should also show such a message on the "Wait for X images" stage:

We cannot do it easily. Currently, there is no easy way to find out which of the workflows in the "workflow_run" is the one that runs the build. This link is missing in the current API. It might require even more complex querying which is the workflow_run tha we should link to and it requires another custom Github Action likely to loop through all the workflow runs running and finding the right one. I (or maybe someone else who would like to take care about it) might want to do it, but I want to do now minimal set of changes that are possible.

Ah yeah, just looked at how this all works. Got it.

@potiuk
Copy link
Member Author

potiuk commented Oct 5, 2020

And yeah. This is complex. Far complex than if GA had better support for canceling workflows and (highly requested) possibility of having a selective tokens (for example only allowing to push to github registry by the PR builds). But we currently have to leave with "PR = read" , " Worfklow_run" = "Write" and that it is a feature not fully, perfectly integrated with the UI of GitHub. I think it will get better over time, but if we can get our users to wait 30% less for the builds (and that we are not queuing continuously) and in very near future after #10507 got implemented, it will allow us reduce it by 60% or more, then I think we should use it.

I think we would not be able to survive the MySQL8 added without it because then every job added the build time for images which is now pretty much gone.

@@ -307,6 +316,20 @@ jobs:
- name: "Push PROD images ${{ matrix.python-version }}:${{ github.event.workflow_run.id }}"
run: ./scripts/ci/images/ci_push_production_images.sh
if: matrix.image-type == 'PROD' && steps.defaults.outputs.proceed == 'true'
- name: GitHub Checks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: GitHub Checks
- name: Add details link to image build job

I think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They actually add Github Checks but yeah. I can make it clearer

@ashb
Copy link
Member

ashb commented Oct 5, 2020

Oh yeah, the "build once" is definitely a huge win and was on my list of "back-burner niggles to look at at some point".

I'm a little bit confused where this details shows up though? I looked in :

@potiuk
Copy link
Member Author

potiuk commented Oct 5, 2020

I'm a little bit confused where this details shows up though? I looked in :

This is where actual jobs are executed and you can see errors that result from - for example - wrong dependencies.

Here you do not have much information and can't easily have as I explained in #11276 (comment) - happy if someone writes another action (I already added quite a lot to the "cancel workflow" action to get it all possible using GitHub APIS. that might be a good exercise for someone try to learn about the process. Those jobs are basically polling for images to appear in Github Registry. This is by far the easiest way to check if the images are ready. Trying to use GithHub APIs to retrieve this information will be significantly more complex.

GitHub does not have good support for workflow_runs linking to the PR :(. So I am using "Github Checks" to simulate that.
Github Checks is the way that for example Travis CI builds can report their progress in Github PR. They provide status and optional link that you can use to see the details, but usually it is a 3rd-party link. In this case, it is not really "3rd party" but another Workflow run - loosely related to the original PR.

So I am reporting status using checks from the other workflow. Unfortunately When I try to pass "details_url" to this workflow, Github apparently overrides the detail_url with link to the "Check" and not link to the "workflow build job". So instead of passing the URL in "Details_url" I am passing it in the "summary message" - this way from the "Build Image Check" you can get the link to the "Build Image Job".

@potiuk
Copy link
Member Author

potiuk commented Oct 5, 2020

You can see how it will look like after this PR gets merged:

https://github.com/potiuk/airflow/pull/107/checks?check_run_id=1208262319

In the CI Builds you get "Build CI/PROD Image X.Y" checks - that show the status (in_progress/succeeded/failed) and have a link top the actual "Build CI/PROD Image" job from the other workflow.

@ashb
Copy link
Member

ashb commented Oct 5, 2020

Ah right. Think my confusion was do to do with the change not showing up in this PR, but given the build has to happen from master, and that can't securely use the scripts from this PR, that makes sense.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but please change the name of the "GitHub Checks" jobs to be a little bit clearer.

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one

@turbaszek
Copy link
Member

@turbaszek -> I think it is rather straightforward to go to the link and see that the image build has been canceled ? Any improvements here that we can think of?

Yes it's clearer but what should I do as a contributor (without right to re-run jobs) to start check on my PR if the "image building" is failing and I have no influence over that?

@potiuk
Copy link
Member Author

potiuk commented Oct 5, 2020

Yes it's clearer but what should I do as a contributor (without right to re-run jobs) to start check on my PR if the "image building" is failing and I have no influence over that?

Those images do not fail "like that" - they fail for some reasons and now the user will at least have a chance to know what to do and where to look at. So it's the same as before but you at least know where to look for a problem.

Image building will fail if a) there is a transient error or b) when you made a change - for example in the Dockerfile that made it to fail. c) someone else merged a failing change.

In a) case you can commit --amend and re-push the change.
In b) case you can fix it
in c) case you can look at the build history and see if others experienced it and ping the slack/devlist.

I think previously it was all the same but you did not know which of the a) b) c) cases you are at. This change actually makes it possible to find it out easily (by following the link in the comment and exploring the logs).

@potiuk
Copy link
Member Author

potiuk commented Oct 5, 2020

I think previously it was all the same but you did not know which of the a) b) c) cases you are at. This change actually makes it possible to find it out easily (by following the link in the comment and exploring the logs).

Do you thin we should explain it somewhere @turbaszek ? I think those three cases are general cases that people running CI build experience - no matter if it's building the image failurre, running test failure, doc check failure, spell check failure, static check failure. In all those cases those a) b) c) cases are what can happen. Should we maybe start to gather some kind of cheatsheet for all those cases like "if you see this and that in the loigs -> it's probably transient error".

We already have quite a number of cases like that that we can tell from the top of our head.

Do you think it makes sense to do something like that? Or maybe we should do it differently ? Maybe you'd like to start such a "cheatsheet" ?

@turbaszek
Copy link
Member

To be honest, I'm not sure what we should do. I just express my confusion that comes from what I see now in PRs that I'm reviewing. For example:
This link #10829 (comment)
leads to this build:
https://github.com/apache/airflow/actions/runs/289022244
which shows:

unknown blob
258
###########################################################################################
259
                   EXITING WITH STATUS CODE 1
260
###########################################################################################
261

262
Finished the script ci_push_ci_images.sh
263
Elapsed time spent in the script: 90 seconds
264
Exit code 1
265

266
Error: Process completed with exit code 1.

and even more, it shows "When sending tasks to celery from a sub-process, rese..." PR which is totally unrelated to the original change. I would prefer to not require our contributors to scratch their heads and wonder what's going on. Also, pinging people on slack to make CI runnable is not a best approach imho.

Replaces the annoying comments with "workflow_run" links
with Run Checks. Now we will be able to see the "Build Image"
checks in the "Checks" section including their status and direct
link to the steps running the image builds as "Details" link.

Unfortunately Github Actions do not handle well the links to
details - even if you provide details_url to link to the other
run, the "Build Image" checks appear in the original workflow,
that's why we had to introduce another link in the summary of
the Build Image check that links to the actual workflow.
@github-actions
Copy link

github-actions bot commented Oct 5, 2020

The CI and PROD Docker Images for the build are prepared in a separate "Build Image" workflow,
that you will not see in the list of checks (you will see "Wait for images" jobs instead).

You can checks the status of those images in The workflow run

@potiuk
Copy link
Member Author

potiuk commented Oct 5, 2020

unknown blob

Apparently there is a transient problem with the image building this time so a) case.

@potiuk
Copy link
Member Author

potiuk commented Oct 5, 2020

unknown blob

Apparently there is a transient problem with the image building this time so a) case.

This is precisely what I mean by cheatsheet. If we have such cheatsheet we could somehow explain to the users "if you see unknown blob" then it is likely a transient error which requires to "commit --amend and push".

It's rather difficult to foresee all kind of transient errors you might have - such as "403 error" when downloading an apt package.

Do you complain about the link now or about the message printed in case of the "unknown blob" transient error?

"When sending tasks to celery from a sub-process, rese..."

I cannot do much about it - workflow runs run with the "latest master" and those "latest master" are the only worflows than can have write access that can push stuff to Github Registry. I already provided all the necessary information in the name of the "Build Info" step above - when you click at it you will see the original branch of such pull request among others.

Do you have any suggestions what we can do instead? I would love to collaborate on making it clearer for the users. Any constructive suggestions are welcome.

@potiuk
Copy link
Member Author

potiuk commented Oct 5, 2020

I updated the names of the steps to be more accurate, but I also changed something else.

In order to differentiate between the "Build Image step" and "Build Image Check" I named them visibly differently now:

In the original workflow you will see the "Status of the image build" checks:

Screenshot from 2020-10-05 17-35-56

Those have links to the "Build image" jobs. Then the link will lead you to the "Workflow_run" where you will see the "Build image" jobs themselves:

Screenshot from 2020-10-05 17-39-25

@turbaszek
Copy link
Member

Do you have any suggestions what we can do instead? I would love to collaborate on making it clearer for the users. Any constructive suggestions are welcome.

I think we should merge this one and adjust with the time being. I'm sorry but I don't have enough capacity to fully understand the problem now

@potiuk potiuk merged commit a33a919 into apache:master Oct 5, 2020
@potiuk potiuk deleted the check-runs branch October 5, 2020 16:35
@potiuk
Copy link
Member Author

potiuk commented Oct 5, 2020

I think we should merge this one and adjust with the time being. I'm sorry but I don't have enough capacity to fully understand the problem now

Yep. This is something I am planning to iterate over continuously (doing it for years now). And GitHub Actions will eventually improve the UI. They are adding features very fast, but it means that some of the UI around that (like missing links to workflow_runs) is not perfect. But I think our problems with CI are mostly around how to make it really fast and minimize unnecessary overhead as we are often blocked by the queue.

@potiuk
Copy link
Member Author

potiuk commented Oct 6, 2020

FY: This is even nicer. I could not see it reliably working in my fork, but I see that it nicely works in Airflow.

The "Status of the Image Build" checks are also showing up now directly in the "Checks" window in the Pull request - from there you can click the "Image Buld" URL and get to the "Image Build job".

I think this is as close as we can get it now - I hope Github Actions will integrate this kind of Workflow better in the future, but for now I think I fixed all the missing features there:

  • you can see the progress of builds image in the PR
  • you can click through from te details to get to the actual jobs building the images
  • when the build image fails you get a comment with the link that leads you directly to the failing build image jobs

Screenshot from 2020-10-06 08-43-23

potiuk added a commit that referenced this pull request Oct 6, 2020
Replaces the annoying comments with "workflow_run" links
with Run Checks. Now we will be able to see the "Build Image"
checks in the "Checks" section including their status and direct
link to the steps running the image builds as "Details" link.

Unfortunately Github Actions do not handle well the links to
details - even if you provide details_url to link to the other
run, the "Build Image" checks appear in the original workflow,
that's why we had to introduce another link in the summary of
the Build Image check that links to the actual workflow.

(cherry picked from commit a33a919)
RaviTezu pushed a commit to RaviTezu/airflow that referenced this pull request Oct 25, 2020
Replaces the annoying comments with "workflow_run" links
with Run Checks. Now we will be able to see the "Build Image"
checks in the "Checks" section including their status and direct
link to the steps running the image builds as "Details" link.

Unfortunately Github Actions do not handle well the links to
details - even if you provide details_url to link to the other
run, the "Build Image" checks appear in the original workflow,
that's why we had to introduce another link in the summary of
the Build Image check that links to the actual workflow.

(cherry picked from commit a33a919)
kaxil pushed a commit that referenced this pull request Nov 12, 2020
Replaces the annoying comments with "workflow_run" links
with Run Checks. Now we will be able to see the "Build Image"
checks in the "Checks" section including their status and direct
link to the steps running the image builds as "Details" link.

Unfortunately Github Actions do not handle well the links to
details - even if you provide details_url to link to the other
run, the "Build Image" checks appear in the original workflow,
that's why we had to introduce another link in the summary of
the Build Image check that links to the actual workflow.

(cherry picked from commit a33a919)
@potiuk potiuk added the type:misc/internal Changelog: Misc changes that should appear in change log label Nov 14, 2020
@potiuk potiuk added this to the Airflow 1.10.13 milestone Nov 14, 2020
potiuk added a commit that referenced this pull request Nov 16, 2020
Replaces the annoying comments with "workflow_run" links
with Run Checks. Now we will be able to see the "Build Image"
checks in the "Checks" section including their status and direct
link to the steps running the image builds as "Details" link.

Unfortunately Github Actions do not handle well the links to
details - even if you provide details_url to link to the other
run, the "Build Image" checks appear in the original workflow,
that's why we had to introduce another link in the summary of
the Build Image check that links to the actual workflow.

(cherry picked from commit a33a919)
kaxil pushed a commit that referenced this pull request Nov 18, 2020
Replaces the annoying comments with "workflow_run" links
with Run Checks. Now we will be able to see the "Build Image"
checks in the "Checks" section including their status and direct
link to the steps running the image builds as "Details" link.

Unfortunately Github Actions do not handle well the links to
details - even if you provide details_url to link to the other
run, the "Build Image" checks appear in the original workflow,
that's why we had to introduce another link in the summary of
the Build Image check that links to the actual workflow.

(cherry picked from commit a33a919)
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Mar 5, 2021
Replaces the annoying comments with "workflow_run" links
with Run Checks. Now we will be able to see the "Build Image"
checks in the "Checks" section including their status and direct
link to the steps running the image builds as "Details" link.

Unfortunately Github Actions do not handle well the links to
details - even if you provide details_url to link to the other
run, the "Build Image" checks appear in the original workflow,
that's why we had to introduce another link in the summary of
the Build Image check that links to the actual workflow.

(cherry picked from commit a33a919)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools type:misc/internal Changelog: Misc changes that should appear in change log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants