Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker auth.config not working anymore in 1.9.0 #24181

Closed
dani opened this issue Oct 11, 2024 · 19 comments · Fixed by #24215
Closed

Docker auth.config not working anymore in 1.9.0 #24181

dani opened this issue Oct 11, 2024 · 19 comments · Fixed by #24215

Comments

@dani
Copy link

dani commented Oct 11, 2024

Nomad version

Nomad v1.9.0
BuildDate 2024-10-10T07:13:43Z
Revision 7ad36851ec02f875e0814775ecf1df0229f0a615

Operating system and Environment details

AlmaLinux 9, using pre-built amd64 bin

Issue

Having docker plugin configured with an auth config like

plugin "docker" {
  config {
    auth {
      config = "/opt/nomad/docker/auth.json"
    }
[...]

With /opt/nomad/docker/auth.json looking like

{
  "auths": {
    "oci.ehtrace.local": {
      "auth": "XXXXXXX"
    }
  }
}

Doesn't work anymore : allocation fails to start with

Failed to pull `oci.ehtrace.local/kafka-connect:3.8.0-45.0.0-SNAPSHOT-1`: Error response from daemon: Head "https://oci.ehtrace.local/v2/kafka-connect/manifests/3.8.0-45.0.0-SNAPSHOT-1": no basic auth credentials

The exact same setup was working fine in 1.8.4 (and previous versions)

Reproduction steps

Try to start an allocation from a registrry requiring authentication, with credentials provided in auth.config

Expected Result

Credentials should be passed to Docker

Actual Result

Job file (if appropriate)

Any job file is affected

@dani dani added the type/bug label Oct 11, 2024
@mhallmark
Copy link

mhallmark commented Oct 11, 2024

Running into the same issue in our environment.

GCP VM's running nomad 1.9.0 pulling from GCP artifact repo

Failed to pull `REDACTED`: Error response from daemon: Get "REDACTED": denied: Unauthenticated request. Unauthenticated requests do not have permission "artifactregistry.repositories.downloadArtifacts" on resource "REDACTED" (or it may not exist)

This is using a GCP service account that has the IAM permissions required for this. Previous nomad versions were working correctly.

Rolling back the nomad client versions back to 1.8.4 resolves the issue

@ahjohannessen
Copy link

I am having the same issue here.

plugin "docker" {
    config {
        auth {
            config = "/etc/nomad.d/.docker-creds"

        }
    }
}

Nomad 1.8.4 the above works, bumping to 1.9.0 results in:

Failed to pull `quay.io/our-repo/app:0.0.1`: Error response from daemon: unauthorized: access to the requested resource is not authorized

//cc @tgross

@replay111
Copy link

replay111 commented Oct 12, 2024

Hi,
I got the same - I had to do the downgrade to 1.8.3 to bring back my images to running state....
But I am running insecure internal registry and no idea why Nomad is adding https:// prefix to it and ignores the fact that this repository is declared as secure in docker configuration.

I did simple check from cli only with docker:

docker pull 192.168.10.10:5005/tools/fabio:1.6.3

and this works perfect - same image provided in nomad structure:

image = "192.168.10.10:5005/tools/fabio:1.6.3"

in logs I can see that "https://" prefix is added - and error occurs that basic auth is not set.
More over - I added auth section with user and password for this job - but still it failed with same error.

@dcarbone
Copy link

Same issue here. Downgrading to 1.8.4 and docker auth is once again functional.

@salehjafarli
Copy link

salehjafarli commented Oct 14, 2024

Same issue here, docker pull works in machine but when nomad tries to pull it, gets unauthorized error

@pkazmierczak
Copy link
Contributor

Hey all, just wanting to let you know that the issue is being looked at and worked on, and we'll provide a fix soon. Thanks for reporting this!

@schmichael schmichael pinned this issue Oct 14, 2024
@replay111
Copy link

@pkazmierczak to czekamy na poprawkę ;-)

@roman-vynar
Copy link
Contributor

Me too was getting similar issues with Nomad having a hard time to pull docker images.
Worked ok on all versions <=1.8.4.

With 1.9.0 upgrade, nomad clients randomly started throwing errors, randomly because it was sometimes working sometimes not, I didn't have any other quick options to investigate so I downgraded back to 1.8.4 and breaking everything even more because of raft version mismatch on nomad servers.

Anyway, on the original issue my errors were:

   2024-10-11T12:21:04.920Z [ERROR] client.alloc_runner.task_runner: running driver failed: 
alloc_id=c7245a90-b9d5-85fd-6508-6ba37f711c82 task=dashboard-ui 
error="Failed to pull `localhost/dashboard-ui:1.0.150`: 
Error response from daemon: pull access denied for localhost/dashboard-ui, 
repository does not exist or may require 'docker login': denied: requested access to the resource is denied"
    2024-10-11T11:39:09.335Z [ERROR] client.alloc_runner.task_runner: running driver failed: 
alloc_id=4cbf8384-d01d-5353-0aa8-5c790f8f28ad task=llm-arbiter 
error="failed to fetch docker daemon info: 
Error response from daemon: client version 1.46 is too new. Maximum supported API version is 1.45"

My nomad config on both server/client agents looks like this:

plugin "docker" {
  config {
    auth {
      config = "/nomad/etc/docker-credentials.json"
    }
...
}

@pkazmierczak
Copy link
Contributor

Hey everyone, we merged the fix and it will be released with 1.9.1.

@ndobbs
Copy link

ndobbs commented Oct 17, 2024

Hey everyone, we merged the fix and it will be released with 1.9.1.

I'm glad I finally decided to check this project repo, I have racked my brain on this since for a few days now.

Thanks for getting a fix in! Reverting to 1.8.4 worked for me like others suggested.

@shantanugadgil
Copy link
Contributor

I assume this is not cloud specific and the fixes will work/consider AWS ECR? (We are hitting the issue for AWS ECR private repos).

Ofcourse, switching back to v1.8.4 makes things work again! 🥹

@pkazmierczak
Copy link
Contributor

I assume this is not cloud specific

It is not.

@tgross tgross unpinned this issue Oct 23, 2024
@tgross
Copy link
Member

tgross commented Oct 23, 2024

Unpinning now that 1.9.1 has shipped.

@RafalGoslawski
Copy link

I'm still running into this issue with version 1.9.3. Pulling with docker on that host works fine, but Nomad fails with:

[ERROR] client.alloc_runner.task_runner: running driver failed: [...] task=server error="Failed to pull `[REDACTED]`: Error response from daemon: Head \"[REDACTED]/manifests/1.0.0\": no basic auth credentials"
# nomad --version
Nomad v1.9.3
BuildDate 2024-11-11T16:35:41Z
Revision d92bf1014886c0ff9f882f4a2691d5ae8ad8131c
# cat /etc/nomad.d/client.hcl 
[...]
plugin "docker" {
  auth {
    config = "/root/.docker/config.json"
  }
}

@dcarbone
Copy link

dcarbone commented Dec 19, 2024

@RafalGoslawski:

For what its worth, the 1.9.1 release absolutely fixed this issue for me. Are you able to execute docker pull against that registry from the host directly with that config?

@RafalGoslawski
Copy link

RafalGoslawski commented Dec 19, 2024

@dcarbone Yes, running docker pull on the host directly works with the same config. What might be different than in previous cases in this issue is that it's a self-hosted private registry, with an address and port: registry.service.consul:5000, and it uses basic auth.

edit: Another thing is that currently I only have docker service and docker plugin configuration on the Nomad clients and not on the servers, though I don't know if that matters.

@tgross
Copy link
Member

tgross commented Dec 19, 2024

@RafalGoslawski yeah this looks like it's specifically an issue with basic auth... would you be willing to open a new issue so we can get that tracked?

@dcarbone
Copy link

dcarbone commented Dec 19, 2024

@RafalGoslawski @tgross So I also saw this with a local private registry, against which I use basic auth on the hosts through the vault-login docker credential helper. My current registry is simply the v2.8.3 distribution image.

One possible difference is I have a local zone for which I have an LE certificate.

@RafalGoslawski
Copy link

@tgross I've created a separate ticket #24717

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment