Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expected mirror to respond with 200 OK but received: 500 Internal Server Error #634

Open
talsh-oasis opened this issue Nov 11, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@talsh-oasis
Copy link

talsh-oasis commented Nov 11, 2024

Spegel version

v0.0.27

Kubernetes distribution

EKS

Kubernetes version

1.29

CNI

Amazon VPC CNI

Describe the bug

configure spegel to work with jfrog Artifactory.
I deploy the pod on a specific node that takes 102 sec, and then deploy him on another node at the same time.
It looks like the mirroring is not working.
this is the error logs I see:

{"time":"2024-11-11T14:14:00.157098779Z","level":"INFO","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror","file":"/build/pkg/registry/registry.go","line":214},"msg":"handling mirror request from external node","key":"sha256:9c62cf755b1ae1744e25e66cbcbdb968a971a3964f30aaa99e10da3cbea34df5","path":"/v2/cloudburst-release/workflows-manager/blobs/sha256:9c62cf755b1ae1744e25e66cbcbdb968a971a3964f30aaa99e10da3cbea34df5"}

{"time":"2024-11-11T14:14:00.161329624Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handle.func1","file":"/build/pkg/registry/registry.go","line":139},"msg":"","err":"could not determine size of blob with digest sha256:411364086e6428707c19012a51c9304ebcf5a082dc362fbdddcc103bf59b807b: content not found\ncontent digest sha256:411364086e6428707c19012a51c9304ebcf5a082dc362fbdddcc103bf59b807b: not found","path":"/v2/cloudburst-release/workflows-manager/blobs/sha256:411364086e6428707c19012a51c9304ebcf5a082dc362fbdddcc103bf59b807b","status":500,"method":"GET","latency":"1.072824ms"}

{"time":"2024-11-11T14:14:00.161757617Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handle.func1","file":"/build/pkg/registry/registry.go","line":139},"msg":"","err":"could not determine size of blob with digest sha256:9c62cf755b1ae1744e25e66cbcbdb968a971a3964f30aaa99e10da3cbea34df5: content not found\ncontent digest sha256:9c62cf755b1ae1744e25e66cbcbdb968a971a3964f30aaa99e10da3cbea34df5: not found","path":"/v2/cloudburst-release/workflows-manager/blobs/sha256:9c62cf755b1ae1744e25e66cbcbdb968a971a3964f30aaa99e10da3cbea34df5","status":500,"method":"GET","latency":"1.881825ms"}

{"time":"2024-11-11T14:14:00.162898384Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror.func2","file":"/build/pkg/registry/registry.go","line":281},"msg":"request to mirror failed","key":"sha256:9c62cf755b1ae1744e25e66cbcbdb968a971a3964f30aaa99e10da3cbea34df5","path":"/v2/cloudburst-release/workflows-manager/blobs/sha256:9c62cf755b1ae1744e25e66cbcbdb968a971a3964f30aaa99e10da3cbea34df5","err":"expected mirror to respond with 200 OK but received: 500 Internal Server Error","attempt":1}

{"time":"2024-11-11T14:14:00.168729017Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror.func2","file":"/build/pkg/registry/registry.go","line":281},"msg":"request to mirror failed","key":"sha256:9c62cf755b1ae1744e25e66cbcbdb968a971a3964f30aaa99e10da3cbea34df5","path":"/v2/cloudburst-release/workflows-manager/blobs/sha256:9c62cf755b1ae1744e25e66cbcbdb968a971a3964f30aaa99e10da3cbea34df5","err":"expected mirror to respond with 200 OK but received: 500 Internal Server Error","

there is any specific configuration that needs to be added when using jfrog?

@talsh-oasis talsh-oasis added the bug Something isn't working label Nov 11, 2024
@phillebaba
Copy link
Member

This error seems to be similar to #620. I do not believe that the error comes from the TTL running out because the image pulls are right after each other. I am going to see if I can reproduce this outside of EKS.

Are you able to share your Containerd configuration?

@talsh-oasis
Copy link
Author

This error seems to be similar to #620. I do not believe that the error comes from the TTL running out because the image pulls are right after each other. I am going to see if I can reproduce this outside of EKS.

Are you able to share your Containerd configuration?

I am deploying spegel on eks. Is there any special configuration that needs to be done when working with artifactory jfrog?

@phillebaba
Copy link
Member

There are no specific requirements for the upstream registry. Spegel does not communicate with the registry, if the image does not exist within the cluster the request will fail and Containerd will be responsible to pull from the original registry.

There is however specific configuration required for EKS. As Spegel is running I assume that you have already followed the compatibility guide.
https://github.com/spegel-org/spegel/blob/main/docs/COMPATIBILITY.md#eks

@talsh-oasis
Copy link
Author

There are no specific requirements for the upstream registry. Spegel does not communicate with the registry, if the image does not exist within the cluster the request will fail and Containerd will be responsible to pull from the original registry.

There is however specific configuration required for EKS. As Spegel is running I assume that you have already followed the compatibility guide. https://github.com/spegel-org/spegel/blob/main/docs/COMPATIBILITY.md#eks

I missed this required configuration for the EKS, I will check that as soon as possible.
i must say that when I test the spegel, the pod pulls the image on the specific node and then I delete the pod and create him again on another node(and then it didn't work).
i will check the link you sent.

@craig-seeman
Copy link

@phillebaba I'm going to try to do some testing here as well; I spotted this thread and know for certain it's something we see quite often here on our deployment. In the past 7 days I see ~100k 500's (~15% of logs).

I looked into these a bit a month or two ago and at the time figured it was exactly what you mentioned, that the node had cleaned up the manifest/blob and was still advertising it; but I'm noticing now when I look closer that there are actually multiple spegel mirrors 500ing on the same request chain for the same blob.

@quynhlab
Copy link

quynhlab commented Nov 24, 2024

I have same problem when run spegel on talos

❯ kubectl get nodes -o wide
NAME   STATUS   ROLES           AGE     VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION   CONTAINER-RUNTIME
c1     Ready    control-plane   3h44m   v1.31.2   10.16.255.244   <none>        Talos (v1.8.3)   6.6.60-talos     containerd://2.0.0
c2     Ready    control-plane   3h44m   v1.31.2   10.16.255.221   <none>        Talos (v1.8.3)   6.6.60-talos     containerd://2.0.0
c3     Ready    control-plane   3h44m   v1.31.2   10.16.255.191   <none>        Talos (v1.8.3)   6.6.60-talos     containerd://2.0.0
w1     Ready    <none>          3h44m   v1.31.2   10.16.255.180   <none>        Talos (v1.8.3)   6.6.60-talos     containerd://2.0.0
w2     Ready    <none>          3h44m   v1.31.2   10.16.255.121   <none>        Talos (v1.8.3)   6.6.60-talos     containerd://2.0.0
w3     Ready    <none>          3h44m   v1.31.2   10.16.255.87    <none>        Talos (v1.8.3)   6.6.60-talos     containerd://2.0.0
w4     Ready    <none>          3h44m   v1.31.2   10.16.255.138   <none>        Talos (v1.8.3)   6.6.60-talos     containerd://2.0.0
w5     Ready    <none>          3h44m   v1.31.2   10.16.255.186   <none>        Talos (v1.8.3)   6.6.60-talos     containerd://2.0.0
w6     Ready    <none>          3h44m   v1.31.2   10.16.255.234   <none>        Talos (v1.8.3)   6.6.60-talos     containerd://2.0.0

The logs error:

time":"2024-11-24T15:37:08.739340422Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handle.func1","file":"/build/pkg/registry/registry.go","line":139},"msg":"","err":"mirror with image component sha256:268bb8ee4a96f76136d3a565ef6ac1426243b2f5aefc2511bcdbd6b78c0cebf4 could not be found\nrequests to 2 mirrors failed, all attempts have been exhausted or timeout has been reached","path":"/v2/longhornio/longhorn-ui/blobs/sha256:268bb8ee4a96f76136d3a565ef6ac1426243b2f5aefc2511bcdbd6b78c0cebf4","status":404,"method":"GET","latency":"4.436375ms","ip":"10.37.3.169","handler":"mirror"}
{"time":"2024-11-24T15:37:14.789826177Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handle.func1","file":"/build/pkg/registry/registry.go","line":139},"msg":"","err":"could not determine size of blob with digest sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1: content not found\ncontent digest sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1: not found","path":"/v2/longhornio/longhorn-ui/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1","status":500,"method":"GET","latency":"706.282µs","ip":"10.37.3.169","handler":"blob"}
{"time":"2024-11-24T15:37:14.789996505Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror.func2","file":"/build/pkg/registry/registry.go","line":281},"msg":"request to mirror failed","key":"sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1","path":"/v2/longhornio/longhorn-ui/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1","ip":"10.37.3.169","err":"expected mirror to respond with 200 OK but received: 500 Internal Server Error","attempt":1}
{"time":"2024-11-24T15:37:14.79129223Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror.func2","file":"/build/pkg/registry/registry.go","line":281},"msg":"request to mirror failed","key":"sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1","path":"/v2/longhornio/longhorn-ui/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1","ip":"10.37.3.169","err":"expected mirror to respond with 200 OK but received: 500 Internal Server Error","attempt":2}
{"time":"2024-11-24T15:37:14.792336113Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror.func2","file":"/build/pkg/registry/registry.go","line":281},"msg":"request to mirror failed","key":"sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1","path":"/v2/longhornio/longhorn-ui/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1","ip":"10.37.3.169","err":"expected mirror to respond with 200 OK but received: 500 Internal Server Error","attempt":3}
{"time":"2024-11-24T15:37:14.79237457Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handle.func1","file":"/build/pkg/registry/registry.go","line":139},"msg":"","err":"mirror with image component sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 could not be found\nrequests to 3 mirrors failed, all attempts have been exhausted or timeout has been reached","path":"/v2/longhornio/longhorn-ui/blobs/sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1","status":404,"method":"GET","latency":"3.746523ms","ip":"10.37.3.169","handler":"mirror"}
{"time":"2024-11-24T15:37:16.097130736Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror.func2","file":"/build/pkg/registry/registry.go","line":281},"msg":"request to mirror failed","key":"sha256:3bb1e816eaee627e1890de9bb4d5bf5701f1113e91dc114798c3864febc8df41","path":"/v2/longhornio/longhorn-ui/blobs/sha256:3bb1e816eaee627e1890de9bb4d5bf5701f1113e91dc114798c3864febc8df41","ip":"10.37.3.169","err":"expected mirror to respond with 200 OK but received: 500 Internal Server Error","attempt":1}
{"time":"2024-11-24T15:37:16.098190557Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror.func2","file":"/build/pkg/registry/registry.go","line":281},"msg":"request to mirror failed","key":"sha256:3bb1e816eaee627e1890de9bb4d5bf5701f1113e91dc114798c3864febc8df41","path":"/v2/longhornio/longhorn-ui/blobs/sha256:3bb1e816eaee627e1890de9bb4d5bf5701f1113e91dc114798c3864febc8df41","ip":"10.37.3.169","err":"expected mirror to respond with 200 OK but received: 500 Internal Server Error","attempt":2}
{"time":"2024-11-24T15:37:16.098231583Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handle.func1","file":"/build/pkg/registry/registry.go","line":139},"msg":"","err":"mirror with image component sha256:3bb1e816eaee627e1890de9bb4d5bf5701f1113e91dc114798c3864febc8df41 could not be found\nrequests to 2 mirrors failed, all attempts have been exhausted or timeout has been reached","path":"/v2/longhornio/longhorn-ui/blobs/sha256:3bb1e816eaee627e1890de9bb4d5bf5701f1113e91dc114798c3864febc8df41","status":404,"method":"GET","latency":"4.990692ms","ip":"10.37.3.169","handler":"mirror"}
{"time":"2024-11-24T15:37:21.060903987Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror.func2","file":"/build/pkg/registry/registry.go","line":281},"msg":"request to mirror failed","key":"sha256:259310f1ab9038f1de90836b615670f31d3d30cb446da70b536eecb5fc7c0b87","path":"/v2/longhornio/longhorn-ui/blobs/sha256:259310f1ab9038f1de90836b615670f31d3d30cb446da70b536eecb5fc7c0b87","ip":"10.37.3.169","err":"expected mirror to respond with 200 OK but received: 500 Internal Server Error","attempt":1}
{"time":"2024-11-24T15:37:21.062145902Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror.func2","file":"/build/pkg/registry/registry.go","line":281},"msg":"request to mirror failed","key":"sha256:259310f1ab9038f1de90836b615670f31d3d30cb446da70b536eecb5fc7c0b87","path":"/v2/longhornio/longhorn-ui/blobs/sha256:259310f1ab9038f1de90836b615670f31d3d30cb446da70b536eecb5fc7c0b87","ip":"10.37.3.169","err":"expected mirror to respond with 200 OK but received: 500 Internal Server Error","attempt":2}
{"time":"2024-11-24T15:37:21.062195654Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handle.func1","file":"/build/pkg/registry/registry.go","line":139},"msg":"","err":"mirror with image component sha256:259310f1ab9038f1de90836b615670f31d3d30cb446da70b536eecb5fc7c0b87 could not be found\nrequests to 2 mirrors failed, all attempts have been exhausted or timeout has been reached","path":"/v2/longhornio/longhorn-ui/blobs/sha256:259310f1ab9038f1de90836b615670f31d3d30cb446da70b536eecb5fc7c0b87","status":404,"method":"GET","latency":"4.641855ms","ip":"10.37.3.169","handler":"mirror"}
{"time":"2024-11-24T15:37:21.513311115Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror.func2","file":"/build/pkg/registry/registry.go","line":281},"msg":"request to mirror failed","key":"sha256:907101d2006a5f6cc9696ab23bc7b8ee14e184f126c439689caa0b469323e262","path":"/v2/longhornio/longhorn-ui/blobs/sha256:907101d2006a5f6cc9696ab23bc7b8ee14e184f126c439689caa0b469323e262","ip":"10.37.3.169","err":"expected mirror to respond with 200 OK but received: 500 Internal Server Error","attempt":1}
{"time":"2024-11-24T15:37:21.514629521Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror.func2","file":"/build/pkg/registry/registry.go","line":281},"msg":"request to mirror failed","key":"sha256:907101d2006a5f6cc9696ab23bc7b8ee14e184f126c439689caa0b469323e262","path":"/v2/longhornio/longhorn-ui/blobs/sha256:907101d2006a5f6cc9696ab23bc7b8ee14e184f126c439689caa0b469323e262","ip":"10.37.3.169","err":"expected mirror to respond with 200 OK but received: 500 Internal Server Error","attempt":2}
{"time":"2024-11-24T15:37:21.514690076Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handle.func1","file":"/build/pkg/registry/registry.go","line":139},"msg":"","err":"mirror with image component sha256:907101d2006a5f6cc9696ab23bc7b8ee14e184f126c439689caa0b469323e262 could not be found\nrequests to 2 mirrors failed, all attempts have been exhausted or timeout has been reached","path":"/v2/longhornio/longhorn-ui/blobs/sha256:907101d2006a5f6cc9696ab23bc7b8ee14e184f126c439689caa0b469323e262","status":404,"method":"GET","latency":"4.746729ms","ip":"10.37.3.169","handler":"mirror"}
{"time":"2024-11-24T15:38:00.964460875Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handle.func1","file":"/build/pkg/registry/registry.go","line":139},"msg":"","err":"could not determine size of blob with digest sha256:65306f2c6d2efa4167a923eae849f9221536b3b429b9e287d5dc940d490494d7: content not found\ncontent digest sha256:65306f2c6d2efa4167a923eae849f9221536b3b429b9e287d5dc940d490494d7: not found","path":"/v2/descheduler/descheduler/blobs/sha256:65306f2c6d2efa4167a923eae849f9221536b3b429b9e287d5dc940d490494d7","status":500,"method":"GET","latency":"653.208µs","ip":"10.37.0.141","handler":"blob"}
{"time":"2024-11-24T15:38:00.964574487Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror.func2","file":"/build/pkg/registry/registry.go","line":281},"msg":"request to mirror failed","key":"sha256:65306f2c6d2efa4167a923eae849f9221536b3b429b9e287d5dc940d490494d7","path":"/v2/descheduler/descheduler/blobs/sha256:65306f2c6d2efa4167a923eae849f9221536b3b429b9e287d5dc940d490494d7","ip":"10.37.0.141","err":"expected mirror to respond with 200 OK but received: 500 Internal Server Error","attempt":1}
{"time":"2024-11-24T15:38:00.96609684Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror.func2","file":"/build/pkg/registry/registry.go","line":281},"msg":"request to mirror failed","key":"sha256:65306f2c6d2efa4167a923eae849f9221536b3b429b9e287d5dc940d490494d7","path":"/v2/descheduler/descheduler/blobs/sha256:65306f2c6d2efa4167a923eae849f9221536b3b429b9e287d5dc940d490494d7","ip":"10.37.0.141","err":"expected mirror to respond with 200 OK but received: 500 Internal Server Error","attempt":2}
{"time":"2024-11-24T15:38:00.967478837Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror.func2","file":"/build/pkg/registry/registry.go","line":281},"msg":"request to mirror failed","key":"sha256:65306f2c6d2efa4167a923eae849f9221536b3b429b9e287d5dc940d490494d7","path":"/v2/descheduler/descheduler/blobs/sha256:65306f2c6d2efa4167a923eae849f9221536b3b429b9e287d5dc940d490494d7","ip":"10.37.0.141","err":"expected mirror to respond with 200 OK but received: 500 Internal Server Error","attempt":3}
{"time":"2024-11-24T15:38:00.967540131Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handle.func1","file":"/build/pkg/registry/registry.go","line":139},"msg":"","err":"mirror with image component sha256:65306f2c6d2efa4167a923eae849f9221536b3b429b9e287d5dc940d490494d7 could not be found\nrequests to 3 mirrors failed, all attempts have been exhausted or timeout has been reached","path":"/v2/descheduler/descheduler/blobs/sha256:65306f2c6d2efa4167a923eae849f9221536b3b429b9e287d5dc940d490494d7","status":404,"method":"GET","latency":"4.347066ms","ip":"10.37.0.141","handler":"mirror"}

@phillebaba
Copy link
Member

I have been able to reproduce this with Talos, so I assume it is related to Containerd v2. I am going to add an e2e test with Talos to debug the issue and validate a possible solution.

@talsh-oasis
Copy link
Author

talsh-oasis commented Dec 2, 2024

@phillebaba
Hi, so after some testing I have more information:

  1. I am using EKS ami type  AL2_x86_64
  2. Kubernetes version - v.1.31
  3. containerd version : containerd://1.7.23
  4. pulling the following image agent-handler:rc-186"
  5. I make sure that the user data the nodes used include the following code:
    image
  6. I pull one of my images which is located on jfrog on a specific node, and then try on another node, but I don't see any improvements in the pulling times.
    the logs include the IP of the New node IP that the pod is running on.
    spegel_logs.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants