-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqalchemy StaleDataError with Azure machinery #2242
Comments
Correction to the setup steps in my post. This error is also happening when using a single VMSS instance. All other info applies. |
Found the issue. In the I will have a fix for this in the PR bringing azure machinery up to date. |
Hey Chris, what's up? "Traceback (most recent call last): Do you think it's the same issue you were faced with? if so, can you share the solution you made? |
@dor15 I think it is exactly like the issue I faced. You are going to want to look at This is where stop and release are happening in the flow of a sandbox run: https://github.com/kevoreilly/CAPEv2/blob/master/lib/cuckoo/core/analysis_manager.py#L292-L341 Notice that the stop occurs first by calling
You can see the fix that I made for the azure machinery: |
Honestly, I'm surprised you are running into that now and no one else has reported it before. If you do happen to find the problem and the above fixes it for you, please create a pull request with the changes. It goes a long way! |
maybe it only related to cloud providers? i don't have those with kvm |
@doomedraven I think it is most likely how az/aws.py have chosen to deal with autoscaling. The az module was having issues because it was deleting the machine from the database in the I don't think any other machinery module explicitly removes machines from the database as the normal flow in @dor15 I would be very interested in knowing a bit more about your setup. Have I correctly guessed that you are using autoscaling? |
so maybe we need to backport that aws feature to az? |
im gonna keep this open for some time till we see what gonna do, as is summer and i don't have a lot of time to properly concentrate on big tasks here |
more than that, in the original code, in the part of "def _delete_machine_form_db(self, label):", it is trying to create a new "session" - and it got the following exception: "/opt/CAPEv2/modules/machinery/aws.py" 435L, 16769B 107,18 21% So I used the existing session with "self.db.session" to proceed with running the code, and then I got the error I described above. |
yes - from aws.conf: Enable auto-scale in cuckoo(by setting autoscale = yes). recommended for better performance. |
@dor15 I don't see Anyway, try using this and see if it fixes the problem.
I don't know what |
yes, I meant this line https://github.com/kevoreilly/CAPEv2/blob/d0732ee9d4b01184f86136336ee4ed5af98c34ca/modules/machinery/aws.py#L108 instead of this I use "self.db.session" before any session action, for example - I'll try to use your fix and update. thanks! |
trying to apply your fix and now getting a new error: it's probably related to the way I create/use the current session in "_delete_machine_form_db" - ` def _delete_machine_form_db(self, label):
|
the issue was solved after I comment-out the last line, and now it deletes the "old" machine and creates new one, and the whole process ends successfully But I'm not sure how it will affect the rest of the code (if it will affect anything). These are the final changes I made in the code:
|
@dor15 In Total changes would now look like this:
I think the machinery file should do as little session spawning as possible, but that's just me. Let me know if this fixes the last part of the problem. |
but we speak about azure not aws issue ? |
@doomedraven Azure issue was solved with #2243. This is a separate issue with AWS. I have a PR draft waiting for confirmation that these changes fix it. |
ah ok thank you |
Hey, 2024-08-29 09:17:58,140 [modules.machinery.aws] INFO: Stopping vm i-0eef625a9c17d557b |
@dor15 You're right, I said that wrong. The code was what I had intended, though. I don't understand where you are getting that log line from |
You are passing the label (instance id), not the Machine.name. |
no, I don't have a fork, but the line is from the original code database
@ethhart, you are right; now it's working. so the changes that @ChrisThibodeaux suggested work and solve the issue. |
so, now that we merged fix, is this issue is relevant? as it says Azure, not AWS |
@doomedraven I should have asked dor15 to split this off into a separate ticket sooner. I think we can close now. |
Thanks |
About accounts on capesandbox.com
This is open source and you are getting free support so be friendly!
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
When using Azure machinery with a VMSS, analysis should complete without database errors.
Current Behavior
When the last available task has been assigned to a machine, any other finishing analysis causes a sqlalchemy.orm.exc.StaleDataError. After, remaining tasks will always remain in the
running
state.Failure Information (for bugs)
Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.
Steps to Reproduce
Please provide detailed steps for reproducing the issue.
Context
Sqalchemy version = 1.4.50
Using Azure machinery with auto-scaling. The ScalingBoundSemaphore usage in the master branch has a bug when used with Azure auto-scaling that consistently leads to deadlocking. I have implemented a workaround that updates the semaphore in analysis_manager immediately before the machine is locked. If that is relevant, I can share the fix I have for that as well.
Failure Logs
Task 8 was the last task. It began running while task 6 was running. When task 6 finished, I got the error above. This is after successfully reporting on tasks 1-5 and 7.
Thoughts on where to begin looking for the cause?
The text was updated successfully, but these errors were encountered: