-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add possibility to run multiple archiver in parallel #4903
Comments
In 5fcbc09: refs #4903 started to work on the possibility to run multiple archivers in parallel for faster archiving. There were multiple issues, for instance there were arrays of siteIds read and written in Options but options do cache all values in a class property so an update of an option does not get updated on another running archiver. Also all sites were reprocessed because of the time_before_today_archive_considered_outdated setting if the last archiving by another archivier was 10 seconds or longer ago. To prevent this only maintaining a list of to be processed siteids in db / filesystem helps so far |
Great idea!!
Maybe this is not only interesting for archiving, but also for other processes in Piwik... |
have opened a new global ticket for the collection of ideas: |
To run for instance 10 archiver in parallel one can execute the following command:
FYI: It is not a good idea to do this in case your archiving tables are using the Myisam engine. More on this will follow |
In a0acaac: refs #4903 removing this lock as it does not work anyway. After locking the table the lock will be immediately released when inserting the new archiving id as there is a second get_lock call. According to MySQL and tests have proven it a lock is released when you execute a new GET_LOCK() |
Ran some tests in a setup with 2 nodes. One node running the PHP and one node running the DB. In this first test the PHP node was rather weak (2 cores, 8GB RAM) and it was not worth running more than 2 workers. There was no performance improvement when running 3 or 4 workers. Here it took about 40hours to archive 4k websites with 1 worker and 20 hours with 2 workers (estimated). After running everything (PHP+DB) on the powerful DB node (16 cores, 64GB RAM) this changed. It was easily possible to run 12 workers (maybe even 16 or 20 would work, depends amount of data etc). Here it took about 8hours to archive 4k websites with 4 workers, 4.5hours with 8 workers, 3.5 hours with 12 workers (estimated). More information:
|
Such a job well done, Thomas! |
…le archivers in parallel for faster archiving. There were multiple issues, for instance there were arrays of siteIds read and written in Options but options do cache all values in a class property so an update of an option does not get updated on another running archiver. Also all sites were reprocessed because of the time_before_today_archive_considered_outdated setting if the last archiving by another archivier was 10 seconds or longer ago. To prevent this only maintaining a list of to be processed siteids in db / filesystem helps so far
…ide a better solution later
…er uses correct log level in case logger instance was already created, if the test proxy is used do not go two directories upwards
…g writers hoping no test fails because of this
…fter locking the table the lock will be immediately released when inserting the new archiving id as there is a second get_lock call. According to MySQL and tests have proven it a lock is released when you execute a new GET_LOCK()
…ncurrent requests. This is set to three concurrent processes in core:archive command. Refs matomo-org#4903
Add possibility to run multiple archiver via cli in parallel to make archiving of >10k websites faster.
If a server has enough power / resources a user will be able to start for instance 5 archiver in parallel and they will all archive different site ids. Currently, all started archiver would archive the same site ids.
The text was updated successfully, but these errors were encountered: