Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.6.0 versions Pre login Fails on self hosted runners #403

Closed
kristeey opened this issue Jan 16, 2024 · 37 comments · Fixed by #407
Closed

1.6.0 versions Pre login Fails on self hosted runners #403

kristeey opened this issue Jan 16, 2024 · 37 comments · Fixed by #407
Assignees
Labels
bug Something isn't working

Comments

@kristeey
Copy link

Hi,

Description:
The 1.6.0 version introduces an error for people using self hosted runners installing az cli in a step before using azure/login action in a following step inside the same GHA job. Hence, the new release will do a pre cleanup which uses az cli before the cli is installed -> the job failes.

Workaround:
Downgrade to 1.5.1.

Possible solutions

  1. Make pre/post cleanup optional.
  2. make az cli install step a part of the action itself.
@kristeey kristeey added the need-to-triage Requires investigation label Jan 16, 2024
@MoChilia MoChilia self-assigned this Jan 16, 2024
@MoChilia
Copy link
Member

For anyone affected by this issue, please first pin the version to v1.5.1. We will prepare a fix by skipping pre-cleanup if az is not installed. But for security concerns, may I ask about your usage scenario? Are you running azure/login in a container? Is it indeed a pure environment with no residual Azure account on your runner?

MoChilia added a commit that referenced this issue Jan 16, 2024
@kristeey
Copy link
Author

Our workers are ephemeral and starts up in k8s pods upon request. I.e. The runtime environment are clean Linux containers that are killed when the pipeline finishes.

@VincentVerweij
Copy link

VincentVerweij commented Jan 16, 2024

Same issue here. We also use empty Linux containers, on which in one of the steps of the workflow we install the Azure CLI. But now, this pre step gives an error because it is not available yet at time of executing the pre step. Basically messing up our deployment process.

Wouldn't it be an idea of bumping the major number in case of such change, what is considered as breaking?

@mattloretitsch-od
Copy link

For anyone affected by this issue, please first pin the version to v1.5.1. We will prepare a fix by skipping pre-cleanup if az is not installed. But for security concerns, may I ask about your usage scenario? Are you running azure/login in a container? Is it indeed a pure environment with no residual Azure account on your runner?

Sure I can fill in that blank. We are using the Github Runner ARC controller. It spins fresh runners in k8s with every run. The image is very basic and has nothing pre-installed on it.

@nuryupin-kr
Copy link

nuryupin-kr commented Jan 16, 2024

We install az cli in a step right before calling azure/login step. Could you guys point your v1 tag to same commit as 1.5.1?

@ohbriansung
Copy link

Please consider rolling back this change and then creating a v2.0.0 since this is a breaking change for anyone running self-hosted runner without pre-installing the az cli.

@YanaXu YanaXu added bug Something isn't working and removed need-to-triage Requires investigation labels Jan 17, 2024
@YanaXu
Copy link
Collaborator

YanaXu commented Jan 17, 2024

Hi all,

Sorry for introducing this issue.
First, please pin to v1.5.1 to workaround this issue.

What we'll do next:
Release a new fix in v1.6.1. If it's OK, align v1 to v1.6.1.

What we want to clarify:

  • v1.6.0 fixes a security issue. It cleans login cache, which is important for regular use scenarios, especially for self-hosted runners on VMs.
  • We do understand it may change the workflow since we add pre and post steps. That's why we create an announcement 📢 Azure/login v1.6.0 Pre-Release Announcement #399 to collect your feedbacks. We'd like to collect your advice on this: how to notify you that there will be a new version of this action? Please leave a comment if you have a better suggestion.
  • We won't introduce breaking changes in v1. If we do find it's a breaking change, we'll fix it. No worries.
  • We do not expect so many self-hosted runner use cases. To be honest, there is still a long way to well support self-hosted runners. There are too many different use scenarios for self-hosted runners. Actually, we're glad to see you provide your use scenarios here and we definitely will add them to our support paths.

@YanaXu YanaXu self-assigned this Jan 17, 2024
@JasonRSeequent
Copy link

JasonRSeequent commented Jan 17, 2024

We have also run into warnings with v1.6.0, where we are calling azure/login from a composite action.
We receive this warning to say "pre" is not supported.
Warning: pre execution is not supported for local action from ./.github/actions/my-action

@YanaXu
Copy link
Collaborator

YanaXu commented Jan 17, 2024

@JasonRSeequent , it seems a different issue. Could you create a new issue and provide your workflow file and debug log? Thanks.

@nickwb
Copy link

nickwb commented Jan 17, 2024

We also have workflows that are expecting the managed identity of the VM running the self-hosted runner to be available from the beginning of the workflow run. They were broken with 1.6.0 due to the new pre-step.

It's simple enough for us to make a change, and I understand why you would want to make the change, but the breaking change to the v1 tag is not ideal. We have critical workflows that are impacted.

Edit: As an addendum, we use ephemeral Github Runners so logins leaking between workflows are not actually an issue for us.

@YanaXu
Copy link
Collaborator

YanaXu commented Jan 17, 2024

We also have workflows that are expecting the managed identity of the VM running the self-hosted runner to be available from the beginning of the workflow run. They were broken with 1.6.0 due to the new pre-step.

It's simple enough for us to make a change, and I understand why you would want to make the change, but the breaking change to the v1 tag is not ideal. We have critical workflows that are impacted.

Edit: As an addendum, we use ephemeral Github Runners so logins leaking between workflows are not actually an issue for us.

@nickwb , I don't quite get your problem. If you're using managed identity, that means you're using a VM from Azure. If your job in workflow file works before, it means you install az in it or before the job. Do you uninstall az in the job too?

@nickwb
Copy link

nickwb commented Jan 17, 2024

@YanaXu - The issue is our workflows did not previously contain a step for azure/login@v1 with auth-type: IDENTITY. For us, that login was already done "automatically" by our VM image before running each workflow. However, now that azure/login has added a pre-step, az account clear is being called and our "automatic login" is no longer available.

We don't install or uninstall the Azure CLI, it's baked in to our build runner VM image.

@YanaXu
Copy link
Collaborator

YanaXu commented Jan 17, 2024

@nickwb, correct me if I'm wrong. Current issue #403 is describing a use scenario like this: the workflow file runs on a fresh clean image (not a VM); the image is based on linux and without az installed; the workflow file contains one job; the job will install az and call azure login action afterwards.
Is this your use scenario? If it's not, please provide the details of yours.

@MoChilia
Copy link
Member

Hi @kristeey, @VincentVerweij, @mattloretitsch-od, @nuryupin-kr, @ohbriansung, @nickwb, we have created a preview test tag v1.6.1. Could you kindly try azure/[email protected] to verify if the fix works for you?

@kristeey
Copy link
Author

@MoChilia Looks like v1.6.1 did solve my issue 👍 Just out of curiosity... Why cannot installation of azure cli (if not installed) be integrated with the action it self?

@Topper1987
Copy link

Topper1987 commented Jan 18, 2024

Hello,
v1.6.1 sadly does not solve issue for us. We are passing creds arguments to action as result of previous step which loads credentials (JSON) from vault, so those credentials are not known at point of time of that pre-step which then fails (I guess)..

      - name: Azure Login
        uses: azure/[email protected]
        with:
          creds: ${{ toJson(fromJson(steps.prepare.outputs.variables)['CREDS']) }}

creds looks like (example from input at v1.5.1 where it normally worked)

creds: {
    "clientId": "***",
    "clientSecret": "***",
    "subscriptionId": "***",
    "tenantId": "***"
    }

with v1.6.0/v1.6.1
new "Pre Azure login" then fails with:

The template is not valid. $ORGNAME/$REPONAME/.github/workflows/workflowname.yml@main (Line: 287, Col: 18): Error reading JToken from JsonReader. Path '', line 0, position 0.

that position equals: creds:
${{ toJson(fromJson(steps.prepare.outputs.variables)['CREDS']) }}

@MoChilia
Copy link
Member

@Topper1987, your issue is not related to #403, please check the solution in #404 (comment).

@MoChilia
Copy link
Member

@kristeey, you've raised a good point. But GitHub action has a complex ecosystem, the integration of azure-cli and azure/login is not straight-forward. We need assess this thoroughly and explore its feasibility.

@Topper1987
Copy link

@Topper1987, your issue is not related to #403, please check the solution in #404 (comment).

Thank you! You are right, I missed that. It solved our issue.

@sblackstone
Copy link

Hi @kristeey, @VincentVerweij, @mattloretitsch-od, @nuryupin-kr, @ohbriansung, @nickwb, we have created a preview test tag v1.6.1. Could you kindly try azure/[email protected] to verify if the fix works for you?

1.6.1 resolves my issue, will v1 be pointed to v1.6.1?

@martin-traverse
Copy link

We have a multi-cloud project with a reusable workflow for integration that we use for all 3 cloud targets. The Azure auth step is conditional and only runs when the matrix build target is Azure. This change doesn't just break our Azure integration, it breaks GCP and AWS as well because the pre-step is still generated even when the conditional Azure auth step doesn't run.

We have upgraded to 1.6.1 which does let the build pass but is still generating warnings, and again those warnings appear for all three cloud targets not just Azure. It would be great if there was a way to disable the pre/post steps, we don't need them for our setup, the build container is destroyed anyway. At the very least we should be able to turn them off with the same condition we use to control the login step itself.

Also +1 for making v1 point to v1.6.1, currently using @v1 is still broken for us.

https://github.com/finos/tracdap/blob/main/.github/workflows/integration-cloud.yaml

@MoChilia
Copy link
Member

Hi @kristeey, @VincentVerweij, @mattloretitsch-od, @nuryupin-kr, @ohbriansung, @nickwb, we have created a preview test tag v1.6.1. Could you kindly try azure/[email protected] to verify if the fix works for you?

1.6.1 resolves my issue, will v1 be pointed to v1.6.1?

@sblackstone, yes, v1 will be pointed to v1.6.1 soon.

@MoChilia
Copy link
Member

We have a multi-cloud project with a reusable workflow for integration that we use for all 3 cloud targets. The Azure auth step is conditional and only runs when the matrix build target is Azure. This change doesn't just break our Azure integration, it breaks GCP and AWS as well because the pre-step is still generated even when the conditional Azure auth step doesn't run.

We have upgraded to 1.6.1 which does let the build pass but is still generating warnings, and again those warnings appear for all three cloud targets not just Azure. It would be great if there was a way to disable the pre/post steps, we don't need them for our setup, the build container is destroyed anyway. At the very least we should be able to turn them off with the same condition we use to control the login step itself.

Also +1 for making v1 point to v1.6.1, currently using @v1 is still broken for us.

https://github.com/finos/tracdap/blob/main/.github/workflows/integration-cloud.yaml

@martin-traverse, v1 will be pointed to v1.6.1 soon. Thanks for sharing your use case with us. It's really helpful. We will consider how to optimize the pre and post steps.

@VincentVerweij
Copy link

Hi @kristeey, @VincentVerweij, @mattloretitsch-od, @nuryupin-kr, @ohbriansung, @nickwb, we have created a preview test tag v1.6.1. Could you kindly try azure/[email protected] to verify if the fix works for you?

The team and I just implemented the v1.6.1 tag and it allows us to continue our deployments, thanks for that. As expected the following warning is given in the Pre step: Warning: Login cleanup failed with Error: Unable to locate executable file: az. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also check the file mode to verify the file is executable.. Cleanup will be skipped.

@roschart
Copy link

Hi @kristeey, @VincentVerweij, @mattloretitsch-od, @nuryupin-kr, @ohbriansung, @nickwb, we have created a preview test tag v1.6.1. Could you kindly try azure/[email protected] to verify if the fix works for you?

We have tested version v1.6.1 and it works well in our environment. It would be ideal if you could point version v1 to v1.6.1.

@MoChilia
Copy link
Member

Hi all, v1 has been updated to v1.6.1. Thank you again for bringing this issue to our attention, sharing your use cases, and helping us test!
cc @kristeey, @ohbriansung, @sblackstone, @martin-traverse, @VincentVerweij, @roschart.

@bb-froggy
Copy link
Contributor

We are not using a self-hosted runner, but our workflow was also affected. It is using the container mcr.microsoft.com/powershell from Docker Hub. I wonder if we are doing something stupid, since all others here seem to be using self-hosted runners, so other people with GitHub-hosted runners seem not to be affected.

It is important for our use case to install az on our own, because the workflow realizes a test matrix and we install either the current or the preview version of az. So, if login installed az on its own as in @kristeey's question, it would break our test.

@MoChilia
Copy link
Member

Hi @bb-froggy, your scenario is quite similar with the one in this issue. You are all using azure/login in a clean container. We have just fixed the bug in v1.6.1, now aligned with aligned with v1. I think there should be no block to your test? And I've observed that your workflow has successfully passed. Is there still any obstacle to your test?

@bb-froggy
Copy link
Contributor

Yes, v1.6.1 fixed the problem for us, too, and there are no obstacles to the test. Sorry for the confusion and thanks for the fix! I was just worried about two things:

  1. I was just surprised that all others affected have the self-hosted runners and nobody used a GitHub-hosted runner with a clean container. I see the similarity and why both are affected, but I had assumed that my use case was more common than theirs. It seems not to be so and I wondered whether we were doing something very strange and we should do it differently. If you told me that we shouldn't have been affected by the bug in the first place, because we were holding it wrong, then I would be glad about that feedback.
  2. I was concerned that you would solve the technical difficulties in installing az in login and found it a very good idea. I wanted to add that it wouldn't be good for me, and maybe there are others who also wouldn't like it. So if you ever did this, I hope you keep my use case in mind and for example add an option to skip it.

@MoChilia
Copy link
Member

@bb-froggy, your use case is not strange, and I think it's reasonable. We are now exploring the use cases of azure/login within GitHub Actions. There are so many scenarios! And from my perspective, these scenarios are also fair. We aspire to broaden our support to encompass these cases more stably. Thank you for bringing your use case to our attention!

Additionally, installing az by azure/login is not a straightforward one and there are numerous factors to consider, just as your mentioned, many users wouldn't like to do this. We will keep your use case in mind and continue collecting more use cases to try our best to avoid breaking your scenarios while improve the user experience of azure/login.

@MoChilia
Copy link
Member

Closing the issue for now. Feel free to reopen it if you have any further question.

@acrogenesis
Copy link

I have this problem with v2 Login failed with Error: Unable to locate executable file: az..
Adding az to the self-hosted github runner fixed it.

CleanShot 2024-04-21 at 14 07 47@2x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.