-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bring Azure machinery auto-scaling up to date. #2243
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
let me know when is ready to merge |
@doomedraven I've tested trying to get the semaphore to lock up in various situations, but it looks good now. Should be good to merge. |
ChrisThibodeaux
force-pushed
the
master
branch
2 times, most recently
from
July 22, 2024 15:28
d3d751d
to
86c5e0a
Compare
@doomedraven I believe I have found a better way of managing machine deletion. Please hold off on the merge until I can get the commits together. |
@doomedraven Okay, fix is in. Now operating as expected. |
…nd ephemeral OS disk type.
…_tag` to the name.
…o sqlalchemy.orm.exc.StaleDataError.
thank you for update |
6 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updated the Azure machinery, specifically with auto-scaling, to a usable state out-of-the-box.
Updates include:
machines
key in the config, which was leading to errors.pool_tag
to the end of the VMSS name.modules/machinery/az.py
methodstop
when the scale-set is set tois_scaling_down
, which was leading to aStaleDataError
.lib/cuckoo/core/guest.py
correcting the order of variables.Concerns:
The fix for the
StaleDataError
is not ideal. It means that the process of scaling down when no relevant tasks are available must wait for all current tasks to finish AND for the monitor thread to wake up and run. What would be a much better fix would be to find a way to signal the machine for deletion in az.py immediately after the AnalysisManager releases the machine. Any tips are greatly appreciated on how that might be done. (Edit: This no longer applies. Now correctly deleting machines in such cases.)If there are any ideas on how to better handle the issue with the semaphore, I would love to better apply a solution there. I am not entirely comfortable with what I have in place here. The main issues leading to deadlocking that I have noticed are:
update_limit
-- The_limit_value
is only updated when there are machines, and the amount of machines is less than the_upper_limit
. So no update for_limit_value
occurs when we are at the upper limit.check_for_starvation
-- Theavailable_count
is the number of unlocked machines. If machines are waiting at the semaphore, they are already locked. This makes updating the_value
here impossible.As a check for releasing the semaphore, I used the condition of
locked machines <= configured machine limit
, but I think that should maybe betotal machines (locked or unlocked) <= configured machine limit
.Thoughts?