Improve multi threading in archiving to be more linear #5363

mgazdzik · 2014-06-18T15:41:21Z

Currently we can trigger manually many core:archive and each of them would trigger separate process for archiving idsites using common queue. However, each of those commands can spawn up to 3 (or more - if changed in file) processes computing segments. This causes that we cannot trigger as many archiving processes for idsites as we would like, because in worst case we will end up having 3x more processes computing segmented data at the same time. Therefore it would be good to have common limit saying how many process total can be spawned. Basically it should work the same for idsites as it works for segments now. That way we would be able to set top limit of process number regardless it's working on idsites of segments. Also it would allow easier managing number of processes - instead of multiple lines in crontab, only change single param to increase number of threads.

hpvd · 2014-06-18T16:03:22Z

maybe this could fit also into this topic:
"Parallelization of processes for best performance"
#4905

mattab · 2014-07-01T09:24:49Z

See related ticket where this originates from: #5396

mattab · 2014-09-18T01:06:57Z

Maybe we need a new parameter --max-threads=X that would limit the max number of concurrent segment processing threads started by this main core:archive thread?

mgazdzik · 2014-09-18T07:20:34Z

Yes, I think it would be what is needed. Also it would be great if this parameter would limit total number of threads, i.e. - we can process 10 segments or 10 sites or mixed in any other proportion.

hpvd · 2014-09-18T07:38:09Z

of course we like that idea :-)
see #4905
.......
What Do you think about a setting where the advanced User can put in the max number of the cores/threads which are available in the server/should be used max. by Piwik?
-this may allow max performance for piwik
-but do not slow down other tasks of the server/ the complete server
........
=> Maybe the max thread value could be a global thing?

mattab · 2014-09-22T11:24:36Z

It's a bit more complicated than we initially thought. I'm moving this to 2.8.0 to prevent a change in core:archive prior our release due Wed.

mattab · 2014-11-06T04:07:38Z

Moved to 2.10.0 as we don't have enough time left

mattab · 2014-12-06T01:39:12Z

I'm moving out of current milestone, because we need to think a bit more about this project, in particular our mid-term goals around scheduling archiving jobs.

Notes:

The feature of having concurrent / threaded / queued jobs, does not need to work in core
it can be done in a plugin, provided by Piwik PRO. Doing such added complexity as a plugin re-inforces the Piwik platform and keeps core small is our goal List all the code that should be moved from Piwik Core to specific Plugins #6782
for 99.9% of users, running jobs synchronously (or with our basic threading already implemeted) works very well
this issue is tightly coupled to other issues of core:archive at scale such as Computation load after adding new segments on long existing Piwik instance #6638 and Make archiving faster when thousands of websites with low or no traffic #5922 .... maybe we could somehow find a job queue library (laravel queue? the symfony bundle? ...) that we can reuse.
- dont want to reinvent the weel, it's time consuming to create tests, docs, good code for such feature
- we have requirements around scheduling thousands of jobs, with priorities, etc. dont want to implement this?
- If there is an existing job queue, that doesnt yet have mysql support, maybe we could add such support. Or if we can find one on redis, maybe we could use redis. whatever

Maybe this issue depends on discussion in #6638

cc @diosmosis @mnapoli @tsteur

mattab · 2015-03-24T08:16:52Z

I guess here what we need is a simple new plugin that implements a Job queue for certain climulti:request calls. This is kinda similar to how QueuedTracking has a Redis queue with pending requests.

mnapoli · 2015-03-24T08:59:52Z

@mattab I don't understand how the work queue is going to help? I think I don't understand the issue actually, from what I could gather from the ticket the problem is because of too many threads/not enough control on those threads?

mattab · 2015-03-24T22:52:23Z

Right now each core:archive script will trigger 1-N new calls and there could be already some running in the background, which could overload the server.

Having a queue (eg. FIFO) gives us ability to decide process 1 or N jobs at a time (at most) giving us a controlled environment. does it make sense?

mnapoli · 2015-03-25T00:15:49Z

Thanks I see the point of the queue now.

each core:archive script will trigger 1-N new calls and there could be already some running in the background

Is it because core:archive doesn't wait for the requests to finish processing? Or is it because we can run multiple core:archive in parallel (but in that case this is a problem we create ourselves)? Or is there another reason why "there could be already some running in the background" (assuming archiving in the browser is disabled)?

mattab · 2015-03-25T02:33:30Z

there are several reasons this could happen, for exahmple if users add 100 sites overnight, or many segments (see also #7483 )...

it is by design that user can trigger several archiving scripts, eg. to make better use of multiple CPUs on the server it could be useful. When a script is already running (for example it could run for days), if we block other core:archive calls, then data for 'yesterday', 'today', etc. may be missing. This would not be good solution, that's why I think we need some kind of Job Queue and then a way to order the jobs (eg. FIFO or some other logic we decide at the time)

I think I will leave it in Mid term for now until we hit again a problem that will require this solution.

mattab · 2015-12-02T01:13:58Z

Not needed for now

mgazdzik added this to the 2.x - The Great Piwik 2.x Backlog milestone Jul 8, 2014

mgazdzik added T: New feature labels Jul 8, 2014

mattab removed the P: normal label Aug 3, 2014

mattab modified the milestones: Mid term, Short term Aug 4, 2014

mattab added the Major label Aug 4, 2014

mattab modified the milestones: Short term, Piwik 2.6.0 Aug 4, 2014

mattab removed the Major label Aug 4, 2014

mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Sep 15, 2014

mattab modified the milestones: Piwik 2.8.0, Piwik 2.7.0 Sep 22, 2014

mattab modified the milestones: Piwik 2.9.0, Piwik 2.8.0 Oct 8, 2014

diosmosis self-assigned this Oct 15, 2014

diosmosis mentioned this issue Oct 16, 2014

allocateNewArchiveId: Cannot get named lock allocateNewArchiveId #6417

Closed

diosmosis pushed a commit that referenced this issue Oct 30, 2014

Refs #5363, remove need for Piwik URL to be specified to cron archiver.

e59ad80

mattab modified the milestones: Piwik 2.10.0 , Piwik 2.9.0 Nov 6, 2014

diosmosis mentioned this issue Nov 10, 2014

Processed metrics metadata #6589

Merged

mattab mentioned this issue Nov 15, 2014

Computation load after adding new segments on long existing Piwik instance #6638

Closed

mattab mentioned this issue Dec 1, 2014

Activate all CronArchive system tests #6753

Closed

This was referenced Dec 1, 2014

5363 cron archive full concurrency #6755

Closed

Provide extended core:archive logger output messages for easier monitoring #6764

Closed

mattab unassigned diosmosis Dec 6, 2014

mattab modified the milestones: Short term, Piwik 2.10.0 Dec 6, 2014

mattab removed the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Dec 10, 2014

mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Mar 24, 2015

mattab modified the milestones: Mid term, Short term Mar 25, 2015

mattab removed the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Sep 23, 2015

mattab closed this as completed Dec 2, 2015

mattab added the wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it. label Dec 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve multi threading in archiving to be more linear #5363

Improve multi threading in archiving to be more linear #5363

mgazdzik commented Jun 18, 2014

hpvd commented Jun 18, 2014

mattab commented Jul 1, 2014

mattab commented Sep 18, 2014

mgazdzik commented Sep 18, 2014

hpvd commented Sep 18, 2014

mattab commented Sep 22, 2014

mattab commented Nov 6, 2014

mattab commented Dec 6, 2014

mattab commented Mar 24, 2015

mnapoli commented Mar 24, 2015

mattab commented Mar 24, 2015

mnapoli commented Mar 25, 2015

mattab commented Mar 25, 2015

mattab commented Dec 2, 2015

Improve multi threading in archiving to be more linear #5363

Improve multi threading in archiving to be more linear #5363

Comments

mgazdzik commented Jun 18, 2014

hpvd commented Jun 18, 2014

mattab commented Jul 1, 2014

mattab commented Sep 18, 2014

mgazdzik commented Sep 18, 2014

hpvd commented Sep 18, 2014

mattab commented Sep 22, 2014

mattab commented Nov 6, 2014

mattab commented Dec 6, 2014

mattab commented Mar 24, 2015

mnapoli commented Mar 24, 2015

mattab commented Mar 24, 2015

mnapoli commented Mar 25, 2015

mattab commented Mar 25, 2015

mattab commented Dec 2, 2015