-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow committer builds to use scripts/ci, dev and actions from the PR #37057
Allow committer builds to use scripts/ci, dev and actions from the PR #37057
Conversation
Once we merge that one, committers will have no more the problem that their PRs modifying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the answer to my question is yes, then LGTM
2c649de
to
0cbf5ff
Compare
I will do some more testing in my own fork, to see if I have everything righ. And I'd appreciate a bit more thorough review for that one. This is a sensitive part of our actions and it would be good to scrutinize it a bit more :) |
00b00e0
to
a3460d0
Compare
In our process, we generally do not let the scripts in the "build images" workflow to use `scripts/ci`, `dev` and `action` scripts to come from the PR. This is a security feature that prevent Pull Requests from forks to run code on a worker that can potentially access sensitive information - such as GITHUB_TOKEN with write access to Github Registry. This, however, causes troubles, because in order to test any changes in those scripts affecting building image, you have to close your PR from the fork and push one directly to Apache repository (there in-line build workflows are used from "Test" workflow and those PRs are safe to run, because only committers can push directly to the `apache/airflow` repository branches. This PR changes default behaviour for committer PRs. Rather than do the same as "regular" PRs, those PRs will not use scripts from the target branch, instead they will use scripts from the incoming PRs of the committers. This is equally safe as running PRs from the `apache/airflow` branch - because we have a reviewed list of committers in our code and "selective checks" job that checks it is run always in the context and with the code of the "target" branch, which means that you cannot manipulate the list of actors. The Girhub actor is retrieved from pull requests github context (event/pull_request/user/login) so it is impossible to spoof it by the incoming PR. As part of this PR - list of available selective checks and documentation of PR labels and selective checks (wrongly named as "static checks") were reviewed and updated. While impolementing this, we also realised that we can simplify branch information retrieval. The code that we had in workflow was written a long time ago, when the target branch was always "main" - so we had to check-out the target commit to be able to retrieve branch_defaults.py and get the branches from there. However it's already for quite some time that "pull request workflow" uses "base_ref" as the base commit, which means that in `main` it is `main` and in `v2-8-test` it is `v2-8-stable`, which means that we already have the correct `AIRFLOW_BRANCH` and the `DEFAULT_AIRFLOW_CONSTRAINTS_BRANCH` without having to check the incoming commit. Which means that we do not need to override scripts in the build-info step, we only need to check it out temporarily to fetch the incoming PR and it's parent to see what files are changed in the incoming PR.
a3460d0
to
89acfeb
Compare
All right I did some testing and applied a few smal fixes (optimiized away the checkout for target-branch for commiter build as it turned out to be not needed For anyone looking here (and I would love at least 2 reviews) for security reasons. I've run some simulated builds from my higrys -> potiuk PRs after pushing the branch to main: Here is an example of committer-originated build:
Example where non-committer is the actor in PR
The builds here (you can see different actors): |
Well done! |
cc: @ephraimbuddy FYI -> this is something I wil cherry-pick shortly (together with other recent refactors) to v2-8-test and do some testing if it works as expected (there are some smalll caveats in retrieving DEFAULT_BRANCH and DEFAULT_CONSTRAINTS_BRANCH that I need to test there, so don't be surprised :). |
…#37057) In our process, we generally do not let the scripts in the "build images" workflow to use `scripts/ci`, `dev` and `action` scripts to come from the PR. This is a security feature that prevent Pull Requests from forks to run code on a worker that can potentially access sensitive information - such as GITHUB_TOKEN with write access to Github Registry. This, however, causes troubles, because in order to test any changes in those scripts affecting building image, you have to close your PR from the fork and push one directly to Apache repository (there in-line build workflows are used from "Test" workflow and those PRs are safe to run, because only committers can push directly to the `apache/airflow` repository branches. This PR changes default behaviour for committer PRs. Rather than do the same as "regular" PRs, those PRs will not use scripts from the target branch, instead they will use scripts from the incoming PRs of the committers. This is equally safe as running PRs from the `apache/airflow` branch - because we have a reviewed list of committers in our code and "selective checks" job that checks it is run always in the context and with the code of the "target" branch, which means that you cannot manipulate the list of actors. The Girhub actor is retrieved from pull requests github context (event/pull_request/user/login) so it is impossible to spoof it by the incoming PR. As part of this PR - list of available selective checks and documentation of PR labels and selective checks (wrongly named as "static checks") were reviewed and updated. While impolementing this, we also realised that we can simplify branch information retrieval. The code that we had in workflow was written a long time ago, when the target branch was always "main" - so we had to check-out the target commit to be able to retrieve branch_defaults.py and get the branches from there. However it's already for quite some time that "pull request workflow" uses "base_ref" as the base commit, which means that in `main` it is `main` and in `v2-8-test` it is `v2-8-stable`, which means that we already have the correct `AIRFLOW_BRANCH` and the `DEFAULT_AIRFLOW_CONSTRAINTS_BRANCH` without having to check the incoming commit. Which means that we do not need to override scripts in the build-info step, we only need to check it out temporarily to fetch the incoming PR and it's parent to see what files are changed in the incoming PR. (cherry picked from commit 07fd364)
…#37057) In our process, we generally do not let the scripts in the "build images" workflow to use `scripts/ci`, `dev` and `action` scripts to come from the PR. This is a security feature that prevent Pull Requests from forks to run code on a worker that can potentially access sensitive information - such as GITHUB_TOKEN with write access to Github Registry. This, however, causes troubles, because in order to test any changes in those scripts affecting building image, you have to close your PR from the fork and push one directly to Apache repository (there in-line build workflows are used from "Test" workflow and those PRs are safe to run, because only committers can push directly to the `apache/airflow` repository branches. This PR changes default behaviour for committer PRs. Rather than do the same as "regular" PRs, those PRs will not use scripts from the target branch, instead they will use scripts from the incoming PRs of the committers. This is equally safe as running PRs from the `apache/airflow` branch - because we have a reviewed list of committers in our code and "selective checks" job that checks it is run always in the context and with the code of the "target" branch, which means that you cannot manipulate the list of actors. The Girhub actor is retrieved from pull requests github context (event/pull_request/user/login) so it is impossible to spoof it by the incoming PR. As part of this PR - list of available selective checks and documentation of PR labels and selective checks (wrongly named as "static checks") were reviewed and updated. While impolementing this, we also realised that we can simplify branch information retrieval. The code that we had in workflow was written a long time ago, when the target branch was always "main" - so we had to check-out the target commit to be able to retrieve branch_defaults.py and get the branches from there. However it's already for quite some time that "pull request workflow" uses "base_ref" as the base commit, which means that in `main` it is `main` and in `v2-8-test` it is `v2-8-stable`, which means that we already have the correct `AIRFLOW_BRANCH` and the `DEFAULT_AIRFLOW_CONSTRAINTS_BRANCH` without having to check the incoming commit. Which means that we do not need to override scripts in the build-info step, we only need to check it out temporarily to fetch the incoming PR and it's parent to see what files are changed in the incoming PR. (cherry picked from commit 07fd364)
In our process, we generally do not let the scripts in the "build images" workflow to use
scripts/ci
,dev
andaction
scripts to come from the PR. This is a security feature that prevent Pull Requests from forks to run code on a worker that can potentially access sensitive information - such as GITHUB_TOKEN with write access to Github Registry.This, however, causes troubles, because in order to test any changes in those scripts affecting building image, you have to close your PR from the fork and push one directly to Apache repository (there in-line build workflows are used from "Test" workflow and those PRs are safe to run, because only committers can push directly to the
apache/airflow
repository branches.This PR changes default behaviour for committer PRs. Rather than do the same as "regular" PRs, those PRs will not use scripts from the target branch, instead they will use scripts from the incoming PRs of the committers. This is equally safe as running PRs from the
apache/airflow
branch - because we have a reviewed list of committers in our code and "selective checks" job that checks it is run always in the context and with the code of the "target" branch, which means that you cannot manipulate the list of actors.The Girhub actor is retrieved from pull requests github context (event/pull_request/user/login) so it is impossible to spoof it by the incoming PR.
As part of this PR - list of available selective checks and documentation of PR labels and selective checks (wrongly named as "static checks") were reviewed and updated.
While impolementing this, we also realised that we can simplify
branch information retrieval. The code that we had in workflow
was written a long time ago, when the target branch was always
"main" - so we had to check-out the target commit to be able to
retrieve branch_defaults.py and get the branches from there. However
it's already for quite some time that "pull request workflow"
uses "base_ref" as the base commit, which means that in
main
itis
main
and inv2-8-test
it isv2-8-stable
, which means thatwe already have the correct
AIRFLOW_BRANCH
and theDEFAULT_AIRFLOW_CONSTRAINTS_BRANCH
without having to checkthe incoming commit. Which means that we do not need to override
scripts in the build-info step, we only need to check it out
temporarily to fetch the incoming PR and it's parent to see
what files are changed in the incoming PR.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.