Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ECS task concurrency prevention for registry-sweepers #105

Open
alexdunnjpl opened this issue Apr 3, 2024 · 5 comments
Open

Comments

@alexdunnjpl
Copy link

💡 Description

Currently, if a sweeper executes for longer than its schedule cadence, multiple instances of the sweeper will run concurrently.

This causes additional cost due to both redundant processing and a slowdown of all jobs due to increased database load, and could affect service if the database is loaded heavily enough.

Implement configuration to allow execution of <=1 container instance per task definition (i.e. node) at any point in time.

@jordanpadams this isn't blocking anything, but the sooner it's done, the shorter we can make our sweepers cadence and the performance/cost impact is nontrivial.

@jordanpadams jordanpadams transferred this issue from NASA-PDS/operations Apr 3, 2024
@jordanpadams
Copy link
Member

@alexdunnjpl when you say "implement configuration" is this an event scheduler configuration?

@alexdunnjpl
Copy link
Author

@jordanpadams I'm fuzzy on the details, but I think it requires defining a cluster for each task definition and setting a container limit on each cluster. Simply, "do some AWS Console stuff"

@sjoshi-jpl will have a better idea of the details I suspect

@jordanpadams
Copy link
Member

Thanks @alexdunnjpl. As a task, this is 100% going to get lost in the 100s of tickets we have open right now. I will try to keep track of this and add to our overall release plan.

@alexdunnjpl
Copy link
Author

alexdunnjpl commented Apr 12, 2024

The need for this should be somewhat mitigated (though not completely avoided) by NASA-PDS/registry-sweepers#115 as now, only provenance should result in any redundant work being done.

EDIT Actually this is incorrect - there's still a concern of multiple instances tripping over each other in the event of an influx of data which causes >cadencePeriod container runtime

@alexdunnjpl
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants