Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-17621: [CI] Audit workflows #14155

Merged
merged 17 commits into from
Sep 22, 2022
Merged

ARROW-17621: [CI] Audit workflows #14155

merged 17 commits into from
Sep 22, 2022

Conversation

assignUser
Copy link
Member

@assignUser assignUser commented Sep 16, 2022

In this PR I:

  • reduced the scope of the automatically generated GITHUB_TOKEN as much as possible (technically contents:none would be the minimum but it is a bit unintuitive as it does not prevent checkout of public repos, I set contents:read in those cases)
  • update all actions used to the newest version (checking for breaking changes, only case is actions/github-script which remains on v3 for that reason -> follow up)
  • move the creation of envvars containing secrets as close to their usage as possible (-> the step they are used in), this duplicates them in workflows with multiple jobs but is safer.

I have opted NOT to pin the different actions by SHA as recommended in some places as the con outweigh the possible protection in my opinion. The main danger with pinning tags or branches is that a malicious actor changes the commit the tag points to and exfiltrates secrets (either repository secrets or in case of private repos code/ip) or takes some other damaging action like deleting branches, rewriting history etc..

We only ever pass actions the GITHUB_TOKEN which is ephemeral (deleted after workflow is finished) and scope limited so exfiltration of that token would worst case allow an attacker to create/delete labels and pr comments as well as modify PR branches (if the submitter activated the checkbox for maintainer access). Actions can not access secrets without the workflow author explicitly passing them as input (envvars might reveal them though)

The Apache Org limits the actions that can be used in repos, so we only use well known allow-listed actions, while this does of course not prevent malicious actions it reduces the risk substantially.

Pinning SHAs would mitigate these risks (provided the action at that sha was audited...) but would also necessitate regularly checking + re-auditing the actions as to not miss security patches in these actions (e.g. here). IMHO that would be a considerable effort (+ needing real expertise in typescript/node to spot any malicious additions outside of blatant secret exfiltration or nuking) resulting in a small gain.

@assignUser
Copy link
Member Author

assignUser commented Sep 16, 2022

It looks like Dependabot can also scan & update actions used in workflows: https://docs.github.com/en/code-security/dependabot/working-with-dependabot/keeping-your-actions-up-to-date-with-dependabot

This would remove most of the workload of updating the actions, while this would still not protect against properly obfuscated attacks combined with the other mitigations in this PR it might be a good compromise. @amol- @raulcd thoughts?

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@assignUser assignUser marked this pull request as ready for review September 16, 2022 18:16
@pitrou pitrou requested a review from kou September 20, 2022 09:08
@amol-
Copy link
Member

amol- commented Sep 20, 2022

It looks like Dependabot can also scan & update actions used in workflows

Dependabot makes a PR to suggest the update right?
My concern would only be to contain the risk for sudden breaking changes, but if those updates are conducted in a dedicated PR, then the PR would prove that the CI still passes before getting merged.

@assignUser
Copy link
Member Author

Yes Dependabot opens a PR, I think we can even configure a PR title format via a config yaml to match our needs.

@assignUser
Copy link
Member Author

(I'll create a follow up jira for dependabot)

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rebase on the master to resolve known CI failures?

.github/workflows/dev_pr.yml Outdated Show resolved Hide resolved
.github/workflows/go.yml Outdated Show resolved Hide resolved
.github/workflows/java.yml Outdated Show resolved Hide resolved
.github/workflows/r.yml Outdated Show resolved Hide resolved
.github/workflows/cpp.yml Show resolved Hide resolved
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit 36928ec into apache:master Sep 22, 2022
uses: actions/[email protected]
(github.event.action == 'opened' ||
github.event.action == 'synchronize')
uses: actions/labeler@4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use @v4 here...

kou added a commit to kou/arrow that referenced this pull request Sep 23, 2022
This is a follow-up of ARROW-17621 / apache#14155.
kou added a commit that referenced this pull request Sep 23, 2022
This is a follow-up of ARROW-17621 / #14155.

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
@ursabot
Copy link

ursabot commented Sep 23, 2022

Benchmark runs are scheduled for baseline = 44ae852 and contender = 36928ec. 36928ec is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️1.02% ⬆️0.07%] test-mac-arm
[Failed ⬇️0.82% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.25% ⬆️0.04%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 36928ec3 ec2-t3-xlarge-us-east-2
[Failed] 36928ec3 test-mac-arm
[Failed] 36928ec3 ursa-i9-9960x
[Finished] 36928ec3 ursa-thinkcentre-m75q
[Finished] 44ae8523 ec2-t3-xlarge-us-east-2
[Failed] 44ae8523 test-mac-arm
[Failed] 44ae8523 ursa-i9-9960x
[Finished] 44ae8523 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

zagto pushed a commit to zagto/arrow that referenced this pull request Oct 7, 2022
In this PR I:
- reduced the scope of the automatically generated `GITHUB_TOKEN` as much as possible (technically `contents:none` would  be the minimum but it is a bit unintuitive as it does not prevent checkout of public repos, I set `contents:read` in those cases)  
- update all actions used to the newest version (checking for breaking changes, only case is actions/github-script which remains on v3 for that reason -> follow up)
- move the creation of envvars containing secrets as close to their usage as possible (-> the step they are used in), this duplicates them in workflows with multiple jobs but is safer.

I have opted **NOT** to pin the different actions by SHA as recommended in some places as the con outweigh the possible protection in my opinion. The main danger with pinning tags or branches is that a malicious actor changes the commit the tag points to and exfiltrates secrets (either repository secrets or in case of private repos code/ip) or takes some other damaging action like deleting branches, rewriting history etc..

We only ever pass actions the `GITHUB_TOKEN` which is ephemeral (deleted after workflow is finished) and scope limited so exfiltration of that token would worst case allow an attacker to create/delete labels and pr comments as well as modify PR branches (if the submitter activated the checkbox for maintainer access). Actions can not access secrets without the workflow author explicitly passing them as input (envvars might reveal them though)

The Apache Org limits the actions that can be used in repos, so we only use well known allow-listed actions, while this does of course not prevent malicious actions it reduces the risk substantially.

Pinning SHAs would mitigate these risks (provided the action at that sha was audited...) but would also necessitate regularly checking + re-auditing the actions as to not miss security patches in these actions (e.g. [here](https://github.com/matlab-actions/setup-matlab/releases/tag/v1.1.1)). IMHO that would be a considerable effort (+ needing real expertise in typescript/node to spot any malicious additions outside of blatant secret exfiltration or nuking) resulting in a small gain.


Lead-authored-by: Jacob Wujciak-Jens <[email protected]>
Co-authored-by: assignUser <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
zagto pushed a commit to zagto/arrow that referenced this pull request Oct 7, 2022
This is a follow-up of ARROW-17621 / apache#14155.

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
fatemehp pushed a commit to fatemehp/arrow that referenced this pull request Oct 17, 2022
In this PR I:
- reduced the scope of the automatically generated `GITHUB_TOKEN` as much as possible (technically `contents:none` would  be the minimum but it is a bit unintuitive as it does not prevent checkout of public repos, I set `contents:read` in those cases)  
- update all actions used to the newest version (checking for breaking changes, only case is actions/github-script which remains on v3 for that reason -> follow up)
- move the creation of envvars containing secrets as close to their usage as possible (-> the step they are used in), this duplicates them in workflows with multiple jobs but is safer.

I have opted **NOT** to pin the different actions by SHA as recommended in some places as the con outweigh the possible protection in my opinion. The main danger with pinning tags or branches is that a malicious actor changes the commit the tag points to and exfiltrates secrets (either repository secrets or in case of private repos code/ip) or takes some other damaging action like deleting branches, rewriting history etc..

We only ever pass actions the `GITHUB_TOKEN` which is ephemeral (deleted after workflow is finished) and scope limited so exfiltration of that token would worst case allow an attacker to create/delete labels and pr comments as well as modify PR branches (if the submitter activated the checkbox for maintainer access). Actions can not access secrets without the workflow author explicitly passing them as input (envvars might reveal them though)

The Apache Org limits the actions that can be used in repos, so we only use well known allow-listed actions, while this does of course not prevent malicious actions it reduces the risk substantially.

Pinning SHAs would mitigate these risks (provided the action at that sha was audited...) but would also necessitate regularly checking + re-auditing the actions as to not miss security patches in these actions (e.g. [here](https://github.com/matlab-actions/setup-matlab/releases/tag/v1.1.1)). IMHO that would be a considerable effort (+ needing real expertise in typescript/node to spot any malicious additions outside of blatant secret exfiltration or nuking) resulting in a small gain.


Lead-authored-by: Jacob Wujciak-Jens <[email protected]>
Co-authored-by: assignUser <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
fatemehp pushed a commit to fatemehp/arrow that referenced this pull request Oct 17, 2022
This is a follow-up of ARROW-17621 / apache#14155.

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants