Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: update azure SP federated credentials #1442

Merged
merged 9 commits into from
May 8, 2024

Conversation

akashsinghal
Copy link
Collaborator

Description

What this PR does / why we need it:

This PR migrates the Azure SPN used for AKS e2e tests to use federated identities via OIDC

  • It removes dependence on SPN secret
  • It adds a new aks-deploy environment for jobs that require azure auth
  • It force bumps the az cli version so that it is >=2.60.0 (Note: this can be removed after the default github runner image uses a newer version of the az cli https://github.com/actions/runner-images?tab=readme-ov-file#available-images)
  • It caches access tokens for registry scope and key vault scope so that azure resource operations will not fail auth after 5 minutes.

Once PR merges to dev:

  • delete ALL secrets in SP
  • each of the maintainers can choose to create a federated credential to their own fork under environment aks-deploy

Which issue(s) this PR fixes (optional, using fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when the PR gets merged):

Fixes #1435

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Helm Chart Change (any edit/addition/update that is necessary for changes merged to the main branch)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Please also list any relevant details for your test configuration

  • Test A
  • Test B

Checklist:

  • Does the affected code have corresponding tests?
  • Are the changes documented, not just with inline documentation, but also with conceptual documentation such as an overview of a new feature, or task-based documentation like a tutorial? Consider if this change should be announced on your project blog.
  • Does this introduce breaking changes that would require an announcement or bumping the major version?
  • Do all new files have appropriate license header?

Post Merge Requirements

  • MAINTAINERS: manually trigger the "Publish Package" workflow after merging any PR that indicates Helm Chart Change

Copy link

codecov bot commented May 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.16%. Comparing base (1bd347c) to head (5f9861c).
Report is 14 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1442      +/-   ##
==========================================
+ Coverage   66.76%   68.16%   +1.39%     
==========================================
  Files         116      119       +3     
  Lines        6030     6134     +104     
==========================================
+ Hits         4026     4181     +155     
+ Misses       1620     1559      -61     
- Partials      384      394      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -69,6 +69,7 @@ jobs:
permissions:
id-token: write
contents: read
environment: aks-deploy
Copy link
Collaborator

@susanshi susanshi May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was mentioned during the Meeting (but i forgot sorry) , could you remind me how the environment property is used?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without the env, will the cached token be accessible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Environment are a github construct to represent a target deployment type. So testing, dev, prod, etc. In the environment, you can setup workflows, secrets, and variables that target that environment. So in our case, we would define a new environement azure-test and this environment contains all the necessary creds for our azure e2e tests. Read more about environments here.

In our workflows, we use Azure/login action to auth. The federated credential accepts environment as one of the scopes for federated credential. This means for each unique <org>/ratify combination, we could setup one federated credential that will target an environment azure-test. This means that for deislabs we'd have one cred and for each person's fork we would have another. There are other scopes supported for federated credentials like branches however it becomes cumbersome if we want to manually trigger tests from a specific feature branch. This is why environment is helpful.

Now there is some downside (potentially) for using environment. Now, the new environment will show up on the main github repository name.
image

And what's unfortunate is that for each aks run, it generates 6 entries under Deployments and if there are failures in the latest one (this seems arbitrary what is latest of the matrix), then the deployment shows as failed.

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in conclusion we have 2 options for federated credential scope.

  1. Use branch scope. This means it'll be very cumbersome to trigger AKS tests directly from a feature branch. We would need to settle on the dev branch to run all tests on your fork.
  2. Use environments with the failure reporting following the screenshots above.

Copy link
Collaborator

@susanshi susanshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks for the update! just a question , but just for learning purposes. (FYI, Binbin is investigating a aks e2e test failure)

@akashsinghal
Copy link
Collaborator Author

@susanshi Here are some findings on current state of the aks test cleanup:

  1. The aks-test-cleanup job is currently set to need the aks-test job in order to run. Currently, the needs trigger by definition requires success of the job it needs. In our case, not all AKS tests usually succeed so we want a trigger that will only run after the AKS test if it is triggered BUT regardless if the AKS tests succeeded or failed. I looked at the docs and it makes it seem like it is possible: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-not-requiring-successful-dependent-jobs but this doesn't seem to work. I tried and with always() the cleanup step always runs irrespective of if AKS test ran or not which is not what we want. I also tried to add a conditional if that is the same as the AKS test trigger but keep the needs. That also does not seem to work.
  2. The current AKS test has a cleanup() function that deletes the resource group upon a failure/exit. That seems to work however it may not always work? I do sometimes see that there are dangling resource groups.

I'm thinking maybe this cleanup optimization is out of scope for this PR?

Copy link
Collaborator

@susanshi susanshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@akashsinghal
Copy link
Collaborator Author

akashsinghal commented May 7, 2024

verified on local fork at least one test passes:
image

@binbin-li
Copy link
Collaborator

@susanshi Here are some findings on current state of the aks test cleanup:

  1. The aks-test-cleanup job is currently set to need the aks-test job in order to run. Currently, the needs trigger by definition requires success of the job it needs. In our case, not all AKS tests usually succeed so we want a trigger that will only run after the AKS test if it is triggered BUT regardless if the AKS tests succeeded or failed. I looked at the docs and it makes it seem like it is possible: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-not-requiring-successful-dependent-jobs but this doesn't seem to work. I tried and with always() the cleanup step always runs irrespective of if AKS test ran or not which is not what we want. I also tried to add a conditional if that is the same as the AKS test trigger but keep the needs. That also does not seem to work.
  2. The current AKS test has a cleanup() function that deletes the resource group upon a failure/exit. That seems to work however it may not always work? I do sometimes see that there are dangling resource groups.

I'm thinking maybe this cleanup optimization is out of scope for this PR?

Thanks for the investigation. We added the cleanup job because the cleanup() function could be skipped. I agree it's not in scope of this PR, we can fix it later.

@akashsinghal akashsinghal merged commit 3639957 into ratify-project:dev May 8, 2024
17 checks passed
akashsinghal added a commit to akashsinghal/ratify that referenced this pull request Sep 13, 2024
binbin-li pushed a commit to binbin-li/ratify that referenced this pull request Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate Azure e2e tests to use federated credentials with service principal
3 participants