Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: How to purge old workers (status: Not responding) #597

Open
RaiaN opened this issue Feb 10, 2025 · 5 comments · May be fixed by #598
Open

Docs: How to purge old workers (status: Not responding) #597

RaiaN opened this issue Feb 10, 2025 · 5 comments · May be fixed by #598
Labels
documentation Improvements or additions to documentation

Comments

@RaiaN
Copy link

RaiaN commented Feb 10, 2025

Documentation Issue

Hi there,

We are trying to figure out how to properly purge old workers?

Due to active development we have many worker with status: Not responding. How to delete them via deadline tool?

Best wishes,
Petr

@RaiaN RaiaN added documentation Improvements or additions to documentation needs triage A new report that needs a first look labels Feb 10, 2025
@leongdl
Copy link
Contributor

leongdl commented Feb 10, 2025

Hi,

Thanks for your question, please try the delete-worker API from the AWS CLI?
https://awscli.amazonaws.com/v2/documentation/api/latest/reference/deadline/delete-worker.html

For example,
aws deadline delete-worker --farm-id <farm-12345> --fleet-id <fleet-98765> --worker-id <worker-abcdef>

Question, why did the workers get stuck at at Not responding ? Is this a Customer Managed Fleet, or Service Managed Fleet?

@mwiebe
Copy link
Contributor

mwiebe commented Feb 10, 2025

We also have a sample fleet health check that you can deploy via CloudFormation, that will monitor the health of workers to clean them up and notify you if there are sustained issues.

https://github.com/aws-deadline/deadline-cloud-samples/tree/mainline/cloudformation/farm_templates/cmf_templates

@RaiaN RaiaN linked a pull request Feb 10, 2025 that will close this issue
@epmog epmog removed the needs triage A new report that needs a first look label Feb 10, 2025
@RaiaN
Copy link
Author

RaiaN commented Feb 11, 2025

Thank you @leongdl @mwiebe
We are using CMF (Windows nodes). I didn't know we can use aws CLI, I thought we should use deadline (this repo basically) to deal with deadline workers so I created a PR: #598

Question, why did the workers get stuck at at Not responding

We've been using Windows Deadline worker service and sometimes we hit this bug. In other cases Deadline service failed to start (i.e. there was a timezone bug on CMF fleet like time wasn't properly synchronized with global time so Worker couldn't start, see error message below).

Error:

API.Resp 📥 [deadline:AssumeFleetRoleForWorker](403) error={'Message': 'Signature not yet current: 20250207T232743Z is still later than 20250207T213245Z (20250207T212745Z + 5 min.)', 'Code': 'InvalidSignatureException'} params={} request_id=93c366be-d351-4f4e-8bb5-54299b43a44c

Any comments?

@RaiaN
Copy link
Author

RaiaN commented Feb 11, 2025

https://github.com/aws-deadline/deadline-cloud-samples/tree/mainline/cloudformation/farm_templates/cmf_templates

This is really good. We can use that I think. Is it compatible with CMF?

@mwiebe
Copy link
Contributor

mwiebe commented Feb 11, 2025

https://github.com/aws-deadline/deadline-cloud-samples/tree/mainline/cloudformation/farm_templates/cmf_templates

This is really good. We can use that I think. Is it compatible with CMF?

Yes, this template was created for a CMF with an EC2 Auto Scaling Group and event-based auto scaling enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants