Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRM-18011 improve lock handling for mysql 5.7.5+ #11720

Closed
wants to merge 1 commit into from

Conversation

eileenmcnaughton
Copy link
Contributor

@eileenmcnaughton eileenmcnaughton commented Feb 26, 2018

Overview

Improve mysql lock support for mysql 5.7.5

Before

Prior to mysql 5.7.5 GETting a mysql LOCK will release other locks held by the process. In order to not do that for the (all important) civimail worker lock we bypass calling GET LOCK if that process has already got a lock.

After

We only bypass getting the lock for maria DB & mysql versions prior to 5.7.5.

Technical Details

From https://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_get-lock

Before 5.7.5, only a single simultaneous lock can be acquired and GET_LOCK() releases any existing lock.

In MySQL 5.7.5, GET_LOCK() was reimplemented using the metadata locking (MDL) subsystem and its capabilities were extended. Multiple simultaneous locks can be acquired and GET_LOCK() does not release any existing locks. It is even possible for a given session to acquire multiple locks for the same name. Other sessions cannot acquire a lock with that name until the acquiring session releases all its locks for the name.

Comments

Also from the documentation (but outside scope of this change)

"GET_LOCK() is unsafe for statement-based replication. A warning is logged if you use this function when binlog_format is set to STATEMENT."


5.7.5+ supports multiple mysql threads in a process. We can be non-hacky once that is in play
* @return string
*/
public static function getDatabaseVersion() {
return CRM_Core_DAO::singleValueQuery('SELECT VERSION()');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First reaction -- this seems like something to cache in Civi::$STATICS[__CLASS__]['version']. But maybe it makes more sense to grep and see how often acquire() is used:

  • Most calls to acquire() deal with backend jobs. The impact of a few select version() queries seems pretty small in that context. This observation pushed me toward "Don't care/doesn't matter."
  • There is an acquire() call inCRM_Core_BAO_Cache::setItem(). That seems like it might get hit multiple times, although I can't really say it's a big deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I'm leaning towards "Don't care/doesn't matter." since I don't have real skin in this - I implemented something a bit similar to this in Omnipay & put up a PR for core at the same time since this has been a missing fix for a while

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I mean caching would make sense but I don't think this change needs to require that - it could happen at any point when someone feels like adding it)

@eileenmcnaughton
Copy link
Contributor Author

@totten is this good to merge? I'm not that inclined to change the caching at this stage - but if it's the only blocker I can

@eileenmcnaughton
Copy link
Contributor Author

@totten - I think this is good to merge & I am aware of some shops that would benefit from it (@lcdservices I think) but I want to get it closed or merged. If it can't be reviewed to the point of merging then I will close & let someone else re-open it when they have hit the locks failures & found the fix (hopefully) via jira

@eileenmcnaughton
Copy link
Contributor Author

Closing this since it has not been reviewed & I don't think it makes sense to keep it open without any evidence of anyone being willing to review it

@xurizaemon
Copy link
Member

xurizaemon commented May 13, 2018

Dropping some thoughts from chat.civicrm.org here for posterity or whatever.

Looking at the above & the lock use, I have a few questions and thoughts. I don't feel I fully understand our motivation for using db locks to handle this (versus a DB semaphore etc).

  1. I think the above PR could be expanded to permit multiple locks on MariaDB also. If I read their docs right, MariaDB has multiple named locks since 10.0.2. But above currently says no multilocks for MariaDB ever, if I read it right? (And then there's Percona too, right? Version detection doesn't feel right here.)

  2. I think the job naming needs to include the site, or the current structure will run into locking issues where multiple CiviCRM sites are hosted on the same server. MySQL locks are server-wide and CiviCRM's lock naming is job-based (not site:job based), so it seems to me that servers with multiple CiviCRM installs would compete for the same lock names. (cc @torrance @petednz on this specifically)

  3. Keep in mind that a second get_lock() will automatically release any currently held lock for that DB connection. I don't know if that will come up in "normal" program flow, but if that locking mechanism is widely used as a generic utility function, an extension which used locking (say in a hook?) or some other subprocess might then mess with CiviCRM's locking and make it release too early?

Short story, I think the semaphore approach as used by Drupal etc might be more robust than trusting the DB to do it for us with GET_LOCK() (and semaphore read/write likely would have comparable performance to SELECT VERSION() + SELECT GET_LOCK()). Semaphores don't have unexpected behaviour in passing (releasing existing locks), don't spread outside the current site (db-specific not server-specific), and don't require version-matching (version-sniffing feels so wrong in 2018). Which leads me to think that I must be missing some understanding as to why CiviCRM uses GET_LOCK() instead, esp if we find it to be problematic.

I know this is general to the locking approach in CiviCRM in general, and not to this specific currently-closed PR which might even have improved things, but I don't have a better place to drop this so here it is 😁

@eileenmcnaughton
Copy link
Contributor Author

I think the issues you list above all affect mysql < 5.7.5.

Once on 5.7.5 then it's possible to have named locks & hence they can be limited to one site & not inadvertently released - (although we don't actually release them pre 5.7.5 - we treat anything other than civimail as something that could continue without a lock at a pinch & civimail as a 'true' lock).

If this PR were reviewed & merged it should mean that on 5.7.5+ the mysql locks would work fine

(as an aside there is a civimail-lock-per-site setting somewhere)

@xurizaemon
Copy link
Member

xurizaemon commented May 13, 2018

I've tried to link to supporting docs for each issue raised. The screenshot below seems to confirm this for me.

MySQL docs appear to be quite specific about the issue of locks being server-wide (point 2 above, links etc). The below appears to confirm this - root/mysql is competing with test/test for the same lock name. I don't see that CiviCRM's lock naming (again, link above) handles this at all. But, you know it far better than me.

image

Maybe I should raise these over in Gitlab, feels like we're talking past each other here.

@eileenmcnaughton
Copy link
Contributor Author

Yes - mysql is server wide - but Civi can prefix domain already I think

@xurizaemon
Copy link
Member

xurizaemon commented May 13, 2018

got it!, CiviCRM does already handle (2). You were right about the releasing issue I raised in (3) changing behaviour with 5.7 (not sure about other MySQL derivatives), and (1) only applied to this PR. Thanks, I learned a couple things here about MySQL locking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants