ci: update azure SP federated credentials #1442

akashsinghal · 2024-05-06T20:20:21Z

Description

What this PR does / why we need it:

This PR migrates the Azure SPN used for AKS e2e tests to use federated identities via OIDC

It removes dependence on SPN secret
It adds a new aks-deploy environment for jobs that require azure auth
It force bumps the az cli version so that it is >=2.60.0 (Note: this can be removed after the default github runner image uses a newer version of the az cli https://github.com/actions/runner-images?tab=readme-ov-file#available-images)
It caches access tokens for registry scope and key vault scope so that azure resource operations will not fail auth after 5 minutes.

Once PR merges to dev:

delete ALL secrets in SP
each of the maintainers can choose to create a federated credential to their own fork under environment aks-deploy

Which issue(s) this PR fixes (optional, using `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when the PR gets merged):

Fixes #1435

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Helm Chart Change (any edit/addition/update that is necessary for changes merged to the main branch)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Please also list any relevant details for your test configuration

Test A
Test B

Checklist:

Does the affected code have corresponding tests?
Are the changes documented, not just with inline documentation, but also with conceptual documentation such as an overview of a new feature, or task-based documentation like a tutorial? Consider if this change should be announced on your project blog.
Does this introduce breaking changes that would require an announcement or bumping the major version?
Do all new files have appropriate license header?

Post Merge Requirements

MAINTAINERS: manually trigger the "Publish Package" workflow after merging any PR that indicates Helm Chart Change

codecov · 2024-05-06T20:25:26Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.16%. Comparing base (1bd347c) to head (5f9861c).
Report is 14 commits behind head on dev.

Additional details and impacted files

@@            Coverage Diff             @@
##              dev    #1442      +/-   ##
==========================================
+ Coverage   66.76%   68.16%   +1.39%     
==========================================
  Files         116      119       +3     
  Lines        6030     6134     +104     
==========================================
+ Hits         4026     4181     +155     
+ Misses       1620     1559      -61     
- Partials      384      394      +10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

susanshi · 2024-05-07T08:15:11Z

.github/workflows/build-pr.yml

@@ -69,6 +69,7 @@ jobs:
    permissions:
      id-token: write
      contents: read
+    environment: aks-deploy


This was mentioned during the Meeting (but i forgot sorry) , could you remind me how the environment property is used?

without the env, will the cached token be accessible?

Sure. Environment are a github construct to represent a target deployment type. So testing, dev, prod, etc. In the environment, you can setup workflows, secrets, and variables that target that environment. So in our case, we would define a new environement azure-test and this environment contains all the necessary creds for our azure e2e tests. Read more about environments here.

In our workflows, we use Azure/login action to auth. The federated credential accepts environment as one of the scopes for federated credential. This means for each unique <org>/ratify combination, we could setup one federated credential that will target an environment azure-test. This means that for deislabs we'd have one cred and for each person's fork we would have another. There are other scopes supported for federated credentials like branches however it becomes cumbersome if we want to manually trigger tests from a specific feature branch. This is why environment is helpful.

Now there is some downside (potentially) for using environment. Now, the new environment will show up on the main github repository name.

And what's unfortunate is that for each aks run, it generates 6 entries under Deployments and if there are failures in the latest one (this seems arbitrary what is latest of the matrix), then the deployment shows as failed.

So in conclusion we have 2 options for federated credential scope.

Use branch scope. This means it'll be very cumbersome to trigger AKS tests directly from a feature branch. We would need to settle on the dev branch to run all tests on your fork.

Use environments with the failure reporting following the screenshots above.

susanshi

LGTM. thanks for the update! just a question , but just for learning purposes. (FYI, Binbin is investigating a aks e2e test failure)

akashsinghal · 2024-05-07T18:10:46Z

@susanshi Here are some findings on current state of the aks test cleanup:

The aks-test-cleanup job is currently set to need the aks-test job in order to run. Currently, the needs trigger by definition requires success of the job it needs. In our case, not all AKS tests usually succeed so we want a trigger that will only run after the AKS test if it is triggered BUT regardless if the AKS tests succeeded or failed. I looked at the docs and it makes it seem like it is possible: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-not-requiring-successful-dependent-jobs but this doesn't seem to work. I tried and with always() the cleanup step always runs irrespective of if AKS test ran or not which is not what we want. I also tried to add a conditional if that is the same as the AKS test trigger but keep the needs. That also does not seem to work.
The current AKS test has a cleanup() function that deletes the resource group upon a failure/exit. That seems to work however it may not always work? I do sometimes see that there are dangling resource groups.

I'm thinking maybe this cleanup optimization is out of scope for this PR?

susanshi

Lgtm

akashsinghal · 2024-05-07T23:32:23Z

verified on local fork at least one test passes:

binbin-li · 2024-05-08T01:22:07Z

@susanshi Here are some findings on current state of the aks test cleanup:

The aks-test-cleanup job is currently set to need the aks-test job in order to run. Currently, the needs trigger by definition requires success of the job it needs. In our case, not all AKS tests usually succeed so we want a trigger that will only run after the AKS test if it is triggered BUT regardless if the AKS tests succeeded or failed. I looked at the docs and it makes it seem like it is possible: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-not-requiring-successful-dependent-jobs but this doesn't seem to work. I tried and with always() the cleanup step always runs irrespective of if AKS test ran or not which is not what we want. I also tried to add a conditional if that is the same as the AKS test trigger but keep the needs. That also does not seem to work.

The current AKS test has a cleanup() function that deletes the resource group upon a failure/exit. That seems to work however it may not always work? I do sometimes see that there are dangling resource groups.

I'm thinking maybe this cleanup optimization is out of scope for this PR?

Thanks for the investigation. We added the cleanup job because the cleanup() function could be skipped. I agree it's not in scope of this PR, we can fix it later.

Signed-off-by: akashsinghal <[email protected]>

akashsinghal added 4 commits May 6, 2024 18:25

ci: update Azure SP federated credentials

d417d10

fix token scope

d7b5bc4

upgrade az cli

0608d04

fix

096f28e

akashsinghal requested review from binbin-li, jimmyraywv, luisdlp, susanshi and toddysm as code owners May 6, 2024 20:20

susanshi reviewed May 7, 2024

View reviewed changes

update clean up trigger

d869324

akashsinghal temporarily deployed to aks-deploy May 7, 2024 17:16 — with GitHub Actions Inactive

akashsinghal added 3 commits May 7, 2024 17:19

update trigger

c2a4647

Merge branch 'dev' into akashsinghal/azurespn

fa46c6a

updates

7e64999

susanshi approved these changes May 7, 2024

View reviewed changes

binbin-li approved these changes May 8, 2024

View reviewed changes

Merge branch 'dev' into akashsinghal/azurespn

5f9861c

akashsinghal merged commit 3639957 into ratify-project:dev May 8, 2024
17 checks passed

akashsinghal added a commit to akashsinghal/ratify that referenced this pull request Sep 13, 2024

ci: update azure SP federated credentials (ratify-project#1442)

fae4064

Signed-off-by: akashsinghal <[email protected]>

binbin-li pushed a commit to binbin-li/ratify that referenced this pull request Sep 14, 2024

ci: update azure SP federated credentials (ratify-project#1442)

4627415

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: update azure SP federated credentials #1442

ci: update azure SP federated credentials #1442

akashsinghal commented May 6, 2024

codecov bot commented May 6, 2024 •

edited

Loading

susanshi May 7, 2024 •

edited

Loading

susanshi May 7, 2024

akashsinghal May 7, 2024

akashsinghal May 7, 2024

susanshi left a comment

akashsinghal commented May 7, 2024

susanshi left a comment

akashsinghal commented May 7, 2024 •

edited

Loading

binbin-li commented May 8, 2024

ci: update azure SP federated credentials #1442

ci: update azure SP federated credentials #1442

Conversation

akashsinghal commented May 6, 2024

Description

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, using fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when the PR gets merged):

Type of change

How Has This Been Tested?

Checklist:

Post Merge Requirements

codecov bot commented May 6, 2024 • edited Loading

Codecov Report

susanshi May 7, 2024 • edited Loading

Choose a reason for hiding this comment

susanshi May 7, 2024

Choose a reason for hiding this comment

akashsinghal May 7, 2024

Choose a reason for hiding this comment

akashsinghal May 7, 2024

Choose a reason for hiding this comment

susanshi left a comment

Choose a reason for hiding this comment

akashsinghal commented May 7, 2024

susanshi left a comment

Choose a reason for hiding this comment

akashsinghal commented May 7, 2024 • edited Loading

binbin-li commented May 8, 2024

Which issue(s) this PR fixes (optional, using `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when the PR gets merged):

codecov bot commented May 6, 2024 •

edited

Loading

susanshi May 7, 2024 •

edited

Loading

akashsinghal commented May 7, 2024 •

edited

Loading