Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow committer builds to use scripts/ci, dev and actions from the PR #37057

Merged
merged 1 commit into from
Jan 28, 2024

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Jan 28, 2024

In our process, we generally do not let the scripts in the "build images" workflow to use scripts/ci, dev and action scripts to come from the PR. This is a security feature that prevent Pull Requests from forks to run code on a worker that can potentially access sensitive information - such as GITHUB_TOKEN with write access to Github Registry.

This, however, causes troubles, because in order to test any changes in those scripts affecting building image, you have to close your PR from the fork and push one directly to Apache repository (there in-line build workflows are used from "Test" workflow and those PRs are safe to run, because only committers can push directly to the apache/airflow repository branches.

This PR changes default behaviour for committer PRs. Rather than do the same as "regular" PRs, those PRs will not use scripts from the target branch, instead they will use scripts from the incoming PRs of the committers. This is equally safe as running PRs from the apache/airflow branch - because we have a reviewed list of committers in our code and "selective checks" job that checks it is run always in the context and with the code of the "target" branch, which means that you cannot manipulate the list of actors.

The Girhub actor is retrieved from pull requests github context (event/pull_request/user/login) so it is impossible to spoof it by the incoming PR.

As part of this PR - list of available selective checks and documentation of PR labels and selective checks (wrongly named as "static checks") were reviewed and updated.

While impolementing this, we also realised that we can simplify
branch information retrieval. The code that we had in workflow
was written a long time ago, when the target branch was always
"main" - so we had to check-out the target commit to be able to
retrieve branch_defaults.py and get the branches from there. However
it's already for quite some time that "pull request workflow"
uses "base_ref" as the base commit, which means that in main it
is main and in v2-8-test it is v2-8-stable, which means that
we already have the correct AIRFLOW_BRANCH and the
DEFAULT_AIRFLOW_CONSTRAINTS_BRANCH without having to check
the incoming commit. Which means that we do not need to override
scripts in the build-info step, we only need to check it out
temporarily to fetch the incoming PR and it's parent to see
what files are changed in the incoming PR.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Jan 28, 2024

Once we merge that one, committers will have no more the problem that their PRs modifying dev scripts will have to close it and re-open it from the apache repository. It should just work.

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the answer to my question is yes, then LGTM

@potiuk potiuk marked this pull request as draft January 28, 2024 14:09
@potiuk potiuk force-pushed the allow-dev-scripts-for-committer-builds branch from 2c649de to 0cbf5ff Compare January 28, 2024 14:59
@potiuk potiuk marked this pull request as ready for review January 28, 2024 14:59
@potiuk
Copy link
Member Author

potiuk commented Jan 28, 2024

I will do some more testing in my own fork, to see if I have everything righ.

And I'd appreciate a bit more thorough review for that one. This is a sensitive part of our actions and it would be good to scrutinize it a bit more :)

@potiuk potiuk force-pushed the allow-dev-scripts-for-committer-builds branch 3 times, most recently from 00b00e0 to a3460d0 Compare January 28, 2024 18:35
In our process, we generally do not let the scripts in the "build
images" workflow to use `scripts/ci`, `dev` and `action` scripts to come
from the PR. This is a security feature that prevent Pull Requests from
forks to run code on a worker that can potentially access sensitive
information - such as GITHUB_TOKEN with write access to Github
Registry.

This, however, causes troubles, because in order to test any changes
in those scripts affecting building image, you have to close your
PR from the fork and push one directly to Apache repository (there
in-line build workflows are used from "Test" workflow and those
PRs are safe to run, because only committers can push directly to the
`apache/airflow` repository branches.

This PR changes default behaviour for committer PRs. Rather than
do the same as "regular" PRs, those PRs will not use scripts from
the target branch, instead they will use scripts from the incoming
PRs of the committers. This is equally safe as running PRs from
the `apache/airflow` branch - because we have a reviewed list
of committers in our code and "selective checks" job that
checks it is run always in the context and with the code of
the "target" branch, which means that you cannot manipulate the
list of actors.

The Girhub actor is retrieved from pull requests github
context (event/pull_request/user/login) so it is impossible to
spoof it by the incoming PR.

As part of this PR - list of available selective checks and
documentation of PR labels and selective checks (wrongly named
as "static checks") were reviewed and updated.

While impolementing this, we also realised that we can simplify
branch information retrieval. The code that we had in workflow
was written a long time ago, when the target branch was always
"main" - so we had to check-out the target commit to be able to
retrieve branch_defaults.py and get the branches from there. However
it's already for quite some time that "pull request workflow"
uses "base_ref" as the base commit, which means that in `main` it
is `main` and in `v2-8-test` it is `v2-8-stable`, which means that
we already have the correct `AIRFLOW_BRANCH` and the
`DEFAULT_AIRFLOW_CONSTRAINTS_BRANCH` without having to check
the incoming commit. Which means that we do not need to override
scripts in the build-info step, we only need to check it out
temporarily to fetch the incoming PR and it's parent to see
what files are changed in the incoming PR.
@potiuk potiuk force-pushed the allow-dev-scripts-for-committer-builds branch from a3460d0 to 89acfeb Compare January 28, 2024 18:40
@potiuk
Copy link
Member Author

potiuk commented Jan 28, 2024

All right I did some testing and applied a few smal fixes (optimiized away the checkout for target-branch for commiter build as it turned out to be not needed

For anyone looking here (and I would love at least 2 reviews) for security reasons.

I've run some simulated builds from my higrys -> potiuk PRs after pushing the branch to main:

Here is an example of committer-originated build:

Example where non-committer is the actor in PR

The builds here (you can see different actors):

Screenshot 2024-01-28 at 20 03 26

@potiuk potiuk merged commit 07fd364 into apache:main Jan 28, 2024
81 checks passed
@potiuk potiuk deleted the allow-dev-scripts-for-committer-builds branch January 28, 2024 20:02
@hussein-awala
Copy link
Member

Well done!

@potiuk
Copy link
Member Author

potiuk commented Jan 29, 2024

cc: @ephraimbuddy FYI -> this is something I wil cherry-pick shortly (together with other recent refactors) to v2-8-test and do some testing if it works as expected (there are some smalll caveats in retrieving DEFAULT_BRANCH and DEFAULT_CONSTRAINTS_BRANCH that I need to test there, so don't be surprised :).

potiuk added a commit that referenced this pull request Feb 7, 2024
…#37057)

In our process, we generally do not let the scripts in the "build
images" workflow to use `scripts/ci`, `dev` and `action` scripts to come
from the PR. This is a security feature that prevent Pull Requests from
forks to run code on a worker that can potentially access sensitive
information - such as GITHUB_TOKEN with write access to Github
Registry.

This, however, causes troubles, because in order to test any changes
in those scripts affecting building image, you have to close your
PR from the fork and push one directly to Apache repository (there
in-line build workflows are used from "Test" workflow and those
PRs are safe to run, because only committers can push directly to the
`apache/airflow` repository branches.

This PR changes default behaviour for committer PRs. Rather than
do the same as "regular" PRs, those PRs will not use scripts from
the target branch, instead they will use scripts from the incoming
PRs of the committers. This is equally safe as running PRs from
the `apache/airflow` branch - because we have a reviewed list
of committers in our code and "selective checks" job that
checks it is run always in the context and with the code of
the "target" branch, which means that you cannot manipulate the
list of actors.

The Girhub actor is retrieved from pull requests github
context (event/pull_request/user/login) so it is impossible to
spoof it by the incoming PR.

As part of this PR - list of available selective checks and
documentation of PR labels and selective checks (wrongly named
as "static checks") were reviewed and updated.

While impolementing this, we also realised that we can simplify
branch information retrieval. The code that we had in workflow
was written a long time ago, when the target branch was always
"main" - so we had to check-out the target commit to be able to
retrieve branch_defaults.py and get the branches from there. However
it's already for quite some time that "pull request workflow"
uses "base_ref" as the base commit, which means that in `main` it
is `main` and in `v2-8-test` it is `v2-8-stable`, which means that
we already have the correct `AIRFLOW_BRANCH` and the
`DEFAULT_AIRFLOW_CONSTRAINTS_BRANCH` without having to check
the incoming commit. Which means that we do not need to override
scripts in the build-info step, we only need to check it out
temporarily to fetch the incoming PR and it's parent to see
what files are changed in the incoming PR.

(cherry picked from commit 07fd364)
@potiuk potiuk added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Feb 8, 2024
@potiuk potiuk added this to the Airflow 2.8.2 milestone Feb 8, 2024
ephraimbuddy pushed a commit that referenced this pull request Feb 22, 2024
…#37057)

In our process, we generally do not let the scripts in the "build
images" workflow to use `scripts/ci`, `dev` and `action` scripts to come
from the PR. This is a security feature that prevent Pull Requests from
forks to run code on a worker that can potentially access sensitive
information - such as GITHUB_TOKEN with write access to Github
Registry.

This, however, causes troubles, because in order to test any changes
in those scripts affecting building image, you have to close your
PR from the fork and push one directly to Apache repository (there
in-line build workflows are used from "Test" workflow and those
PRs are safe to run, because only committers can push directly to the
`apache/airflow` repository branches.

This PR changes default behaviour for committer PRs. Rather than
do the same as "regular" PRs, those PRs will not use scripts from
the target branch, instead they will use scripts from the incoming
PRs of the committers. This is equally safe as running PRs from
the `apache/airflow` branch - because we have a reviewed list
of committers in our code and "selective checks" job that
checks it is run always in the context and with the code of
the "target" branch, which means that you cannot manipulate the
list of actors.

The Girhub actor is retrieved from pull requests github
context (event/pull_request/user/login) so it is impossible to
spoof it by the incoming PR.

As part of this PR - list of available selective checks and
documentation of PR labels and selective checks (wrongly named
as "static checks") were reviewed and updated.

While impolementing this, we also realised that we can simplify
branch information retrieval. The code that we had in workflow
was written a long time ago, when the target branch was always
"main" - so we had to check-out the target commit to be able to
retrieve branch_defaults.py and get the branches from there. However
it's already for quite some time that "pull request workflow"
uses "base_ref" as the base commit, which means that in `main` it
is `main` and in `v2-8-test` it is `v2-8-stable`, which means that
we already have the correct `AIRFLOW_BRANCH` and the
`DEFAULT_AIRFLOW_CONSTRAINTS_BRANCH` without having to check
the incoming commit. Which means that we do not need to override
scripts in the build-info step, we only need to check it out
temporarily to fetch the incoming PR and it's parent to see
what files are changed in the incoming PR.

(cherry picked from commit 07fd364)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants