Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller Keeps Reattempting Failed Jobs #463

Open
CH-BrianJurgess opened this issue Dec 28, 2024 · 0 comments
Open

Controller Keeps Reattempting Failed Jobs #463

CH-BrianJurgess opened this issue Dec 28, 2024 · 0 comments

Comments

@CH-BrianJurgess
Copy link

CH-BrianJurgess commented Dec 28, 2024

In our cluster, the agent controller will keep reattempting a number of jobs despite these jobs either being failed or cancelled within the UI. It will recreate these pods over and over again no matter how many times I manually clean them up. When I grab the logs from the agent container I see that the agent container properly exits. However, when I grab the logs from container-0, I see that the buildkite agent has seg faulted and the container never exists. Every single one of these pipelines had eventually failed for one reason or another on the original run.

Agent Container Logs

2024-12-28 21:52:34 NOTICE Starting buildkite-agent v3.87.0 with PID: 8
2024-12-28 21:52:34 NOTICE The agent source code can be found here: https://github.com/buildkite/agent
2024-12-28 21:52:34 NOTICE For questions and support, email us at: [email protected]
2024-12-28 21:52:34 INFO   Configuration loaded path=/home/agent/buildkite/config
2024-12-28 21:52:34 INFO   Build Path doesn't exist, creating it (/workspace/build)
2024-12-28 21:52:34 INFO   Registering agent with Buildkite...
2024-12-28 21:52:34 INFO   Successfully registered agent "buildkite-01940a29-f1c4-44bf-88d2-672719abb132-9f4f2" with tags [k8s:agent-stack-version=v0.20.1, k8s:service-account=buildkite-build-agent, k8s:namespace=jenkins, k8s:node=ip-10-16-70-42.us-west-2.compute.internal, queue=default-queue]
2024-12-28 21:52:34 INFO   Starting 1 Agent(s)
2024-12-28 21:52:34 INFO   You can press Ctrl-C to stop the agents
2024-12-28 21:52:34 INFO   buildkite-01940a29-f1c4-44bf-88d2-672719abb132-9f4f2 Connecting to Buildkite...
2024-12-28 21:52:34 INFO   buildkite-01940a29-f1c4-44bf-88d2-672719abb132-9f4f2 Attempting to acquire job 01940a29-f1c4-44bf-88d2-672719abb132...
2024-12-28 21:52:34 WARN   buildkite-01940a29-f1c4-44bf-88d2-672719abb132-9f4f2 Buildkite rejected the call to acquire the job (PUT https://agent.buildkite.com/v3/jobs/01940a29-f1c4-44bf-88d2-672719abb132/acquire: 422 Unprocessable Entity: Cannot acquire job 01940a29-f1c4-44bf-88d2-672719abb132 as it's been assigned to agent 01940a2a-0455-4b09-8f2c-cf413549dc3d)
2024-12-28 21:52:34 INFO   buildkite-01940a29-f1c4-44bf-88d2-672719abb132-9f4f2 Disconnecting...
2024-12-28 21:52:34 INFO   buildkite-01940a29-f1c4-44bf-88d2-672719abb132-9f4f2 Disconnected
failed to acquire job: job acquisition rejected: PUT https://agent.buildkite.com/v3/jobs/01940a29-f1c4-44bf-88d2-672719abb132/acquire: 422 Unprocessable Entity: Cannot acquire job 01940a29-f1c4-44bf-88d2-672719abb132 as it's been assigned to agent 01940a2a-0455-4b09-8f2c-cf413549dc3d

container-0 logs

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x139cd23]
goroutine 1 [running]:
github.com/buildkite/agent/v3/internal/job.(*Executor).Run(0xc000125888, {0x1bdb188?, 0xc0000c35e0?})
	/work/internal/job/executor.go:103 +0x2e3
github.com/buildkite/agent/v3/clicommand.init.func12(0xc0001d3b98?)
	/work/clicommand/bootstrap.go:519 +0xd32
github.com/urfave/cli.HandleAction({0x150ae00?, 0x191ce58?}, 0x9?)
	/gomodcache/github.com/urfave/[email protected]/app.go:524 +0x50
github.com/urfave/cli.Command.Run({{0x180443d, 0x9}, {0x0, 0x0}, {0x0, 0x0, 0x0}, {0x182a3ef, 0x1b}, {0x0, ...}, ...}, ...)
	/gomodcache/github.com/urfave/[email protected]/command.go:175 +0x67c
github.com/urfave/cli.(*App).Run(0xc000485880, {0xc00003e060, 0x2, 0x2})
	/gomodcache/github.com/urfave/[email protected]/app.go:277 +0xb3b
main.main()
	/work/main.go:80 +0x1ec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant