-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve multi threading in archiving to be more linear #5363
Comments
maybe this could fit also into this topic: |
See related ticket where this originates from: #5396 |
Maybe we need a new parameter |
Yes, I think it would be what is needed. Also it would be great if this parameter would limit total number of threads, i.e. - we can process 10 segments or 10 sites or mixed in any other proportion. |
of course we like that idea :-) |
It's a bit more complicated than we initially thought. I'm moving this to 2.8.0 to prevent a change in |
Moved to 2.10.0 as we don't have enough time left |
I'm moving out of current milestone, because we need to think a bit more about this project, in particular our mid-term goals around scheduling archiving jobs. Notes:
Maybe this issue depends on discussion in #6638 |
I guess here what we need is a simple new plugin that implements a Job queue for certain |
@mattab I don't understand how the work queue is going to help? I think I don't understand the issue actually, from what I could gather from the ticket the problem is because of too many threads/not enough control on those threads? |
Right now each core:archive script will trigger 1-N new calls and there could be already some running in the background, which could overload the server. Having a queue (eg. FIFO) gives us ability to decide process 1 or N jobs at a time (at most) giving us a controlled environment. does it make sense? |
Thanks I see the point of the queue now.
Is it because core:archive doesn't wait for the requests to finish processing? Or is it because we can run multiple core:archive in parallel (but in that case this is a problem we create ourselves)? Or is there another reason why "there could be already some running in the background" (assuming archiving in the browser is disabled)? |
there are several reasons this could happen, for exahmple if users add 100 sites overnight, or many segments (see also #7483 )... it is by design that user can trigger several archiving scripts, eg. to make better use of multiple CPUs on the server it could be useful. When a script is already running (for example it could run for days), if we block other core:archive calls, then data for 'yesterday', 'today', etc. may be missing. This would not be good solution, that's why I think we need some kind of Job Queue and then a way to order the jobs (eg. FIFO or some other logic we decide at the time) I think I will leave it in |
Not needed for now |
Currently we can trigger manually many core:archive and each of them would trigger separate process for archiving idsites using common queue. However, each of those commands can spawn up to 3 (or more - if changed in file) processes computing segments. This causes that we cannot trigger as many archiving processes for idsites as we would like, because in worst case we will end up having 3x more processes computing segmented data at the same time. Therefore it would be good to have common limit saying how many process total can be spawned. Basically it should work the same for idsites as it works for segments now. That way we would be able to set top limit of process number regardless it's working on idsites of segments. Also it would allow easier managing number of processes - instead of multiple lines in crontab, only change single param to increase number of threads.
The text was updated successfully, but these errors were encountered: