-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock found when trying to get lock #15545
Comments
seeing here #6398 (comment) reports to invalidate are mentioned as well. Might be the same. How often do you see this error per day? Are you also using the log importer? Is there any chance you could enable mysql error log so we can get more information about the deadlock? |
@tsteur, Do you have another idea on how to get several archivers synchronized in order only one site is processed? |
@pardvm have you tried to increase the sleep to say one minute (assuming you launch them hourly)? And I suppose the sleep is executed correctly with different sleep intervals for each archiver? Looking at the code this should work. Let me know if this doesn't work or if you confirm it's set up like this. I suppose we could in general also add some random minor sleep interval at the beginning of the script just to avoid these issues in general. @pardvm as a result of this do you also see deadlocks? |
@tsteur, I'll keep observing and I'll come back here with the results. |
@tsteur, |
Awesome @pardvm fingers crossed. Let me know if they come back otherwise. @eramirezprotec might be worth a try as well in case you're starting multiple archivers at the same time? I will keep this issue open to at least at a tiny random sleep at the beginning of the archive script to prevent this issue a bit better. @diosmosis @mattab thinking to add some sleep to the beginning of the archiver script in // we randomly sleep up to half a second in case multiple archivers are being launched at the same time.
// this can prevent deadlocks see https://github.com/matomo-org/matomo/issues/15545 although ideally we should
// sleep here a bit longer to better avoid this.
usleep(Common::getRandomInt(0, 500000)); For example. This would randomly sleep between 0 and 0.5seconds. From above we can see though that this might not be enough and we'd need to sleep randomly maybe between 0 and 10 or 0 and 30 seconds to minimise this issue. Not sure that's an issue? It might still not fully prevent any race conditions there but at least it be less likely. Fixing the actual race conditions be tricky. Would prefer not adding a new parameter just for that as it then doesn't work out of the box. Alternatively, we could have an FAQ describing this behaviour and would also need to add a sleep for that example: #4903 (comment) However, FAQs may not be viewed so if it worked out of the box it be even better. Thought there maybe was an FAQ already but couldn't find it. On https://matomo.org/docs/setup-auto-archiving/ it doesn't describe it. So as part of this guide it could be maybe mentioned how to launch multiple archivers with the sleep instead of changing the code. refs #14217 |
@tsteur, P.S.- F.Y.I. @eramirezprotec is a colleague at work and we both are trying to figure this problem out |
@tsteur, @eramirezprotec
|
I wonder if it's related to something I've seen yesterday where we do a lot of deletes there on the option and the general cache https://github.com/matomo-org/matomo/blob/3.13.2/core/Archive/ArchiveInvalidator.php#L178-L195 As a result some other requests might try to insert/update these option entries while another one is trying to delete it etc. As above is in a for loop there could be such edge conditions maybe. @pardvm when this happens it be great - if you can - shortly afterwards execute on the database this query SHOW ENGINE INNODB STATUS This would maybe let us better understand which lock it was waiting for. If that's not easily possible, you could set the log level to And also make sure to add a logging to file see https://matomo.org/faq/troubleshooting/faq_115/ If that's not easily possible, you could temporarily maybe change the above line to |
Hope this helps. And thanks for your help troubleshooting this |
@tsteur , Also, I'll evalute doing the temporary change in the code to flush Thanks a lot for your support |
@tsteur, Hope this could help. |
@pardvm thanks very much for this. The lock in here seems actually bit different and not for that particular case:
refs #14619 |
@tsteur, |
We are seeing the same error while running ver 3.13.1:
|
@Marc-Whiteman, |
Cheers for this input. This helps me understand things better. Created #15603 which hopefully fixes it. Feel free to give it a try already if you can. Hoping this can make it into the release on Monday/Tuesday if not it'll be in 3.13.4 I reckon. If you could give us feedback if that helps that be great. copying some info from the txt in here.
|
This should be hopefully fixed. Let us know if it still happens with the latest release and I'll reopen. |
@tsteur, |
Sorry about that @pardvm By the looks might be a different deadlock issue now maybe. |
We actually had this issue yesterday as well:
|
Not sure you still need it, but...
|
We discussed yesterday that it may help to prefix these invalidate report key with random characters so they might reserve a different index gap. Like So the goal of this issue be to prefix these values with a random 4 character prefix. With values from |
As it's contained all in ArchiveInvalidator class this might be easy to do and can then see how often deadlocks still happen |
fyi not really seeing a reduction in the number of deadlocks since applying #15666 but it's hard to say as we don't get many anyway. Further improvement should be the archive refactoring to avoid this issue mostly in the first place by no longer executing these queries for |
Moving this issue for now to 3.13.5 as #15666 in combination with the archive refactoring might fix the issue. Be good to hear once 3.13.4 is released whether this works. |
With the recent 3.13.5 patches we haven't seen any deadlocks yet. Closing this for now. Let me know should this still be an issue after 3.13.5 |
Hello, Command:
Any ideas how to resolve this? Thanks! |
Hi!
I'm getting this error every day since I updated Matomo to version 3.13.1:
INFO [2020-02-09 04:07:32] 9595 Failed to invalidate archived reports: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction
I found some tickets about the deadlock situation but none of them speak about the deadlock happening when trying to invalidate archived reports.
Thank you very much.
The text was updated successfully, but these errors were encountered: