Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement: Multithreaded BPM analysis #8686

Closed
mixxxbot opened this issue Aug 23, 2022 · 13 comments
Closed

Improvement: Multithreaded BPM analysis #8686

mixxxbot opened this issue Aug 23, 2022 · 13 comments
Labels
Milestone

Comments

@mixxxbot
Copy link
Collaborator

Reported by: JosepMaJAZ
Date: 2016-11-11T16:14:20Z
Status: Fix Released
Importance: Wishlist
Launchpad Issue: lp1641153


The current implementation of the BPM and graphic analysis (analysis view) is single threaded (one file at a time).

While it is good that it doesn't hog the CPU for the rest of the application, it's not that good that the CPU usage stays at 20% on a high end machine.

My proposed solution (initial, previous to verify how it all is coded) is like this:

  • Get a thread-synchronized cue of tasks to do (the easiest model is simply the list of tracks selected to analyze).

  • Calculate number of threads that the processor has.
    There are several ways to get this information, depending on the platform, but the standard way is this: http://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency

Note: As a safety measure, programs that can use multiple threads tend to allow the user to specify it explicitly. It could be added as an option in the BPM preference page of Mixxx settings. 0 = automatic >0 = use that value. There would always be at least one thread.

  • Prepare a stack of threads filled with "hardware_concurrency-1" threads. All them with idle thread priority.
    That is, we will have at much Max-1 threads for analysis, so there's always one processor thread free for the rest of the application.
    This could be improved as in: If mixxx is not playing, use all available threads. If it is playing, max-1.

  • Modify the track analysis done on track load to use this stack of threads. Concretely, ensure that loading tracks use the same thread limit. If all threads are in use, discard one that is doing the backgroud analysis and assign it to the track(s) being loaded.

  • When a thread finishes doing its task (i.e. finishes the track analysis), it "pop"s a new tasks from the task list and starts processing it.
    If no more tasks are ready, the thread ends.

This would require also a small change in the UI, in how we report the current track being analyzed.

We could also study if it would be good to first do the bpm/grid/replaygain analysis, and when this finishes for all tracks, do the graphic analysis of those same tracks. the UI message could indicate "Part 1/2, Part 2/2)" or any other message to let the user know that this is a two step action.

@mixxxbot
Copy link
Collaborator Author

Commented by: JosepMaJAZ
Date: 2016-11-12T17:26:07Z


I've taken a look at the code.

This is what we have now:

  • A class AnalyzerQueue that extends from QThread and contains a QQueue of TrackPointer to process, and a QList of Analyzer* to use.

  • Two instances of this class (i.e. two threads): One used by the library (playlist, crates and analyzer view, created in AnalyzerFeature) and the other used by the engine (decks, samplers and preview deck, created in PlayerManager). It probably is this way since the option to not do the waveform analysis when doing the batch analysis was added in 2.0 (player always require waveform analysis).

  • When the thread runs, it loads one track, prepares all the analyzers and does the analysis in blocks of samples (currently 4096 frames).
    It can be stopped in any iteration of this loop.

  • There's a priorization option which makes a track to be analyzed immediately, stopping an ongoing track analysis if it needs to. It is currently unused (Because it has two independent threads), but the implementation is cumbersome. Instead of adding the new track to the front of the queue, it checks all the tracks to see if any of them is a track from a deck and use that instead of the one that is next in the queue. (a QQueue is a QList after all).

  • While this runs, some signals are emited, so that different screen updates can happen.

  • When it finishes, each analyzer saves its data to the track and other signals are emited. One of these signals calls analyzerqueue.stop() and the thread ends.

_
_
This is how we could change it:

  • New class AnalyzerWorker that will have most of the functionality of the AnalyzerQueue (i.e. it will be the thread and contain a list of analyzer pointers).

  • Modify the class AnalyzerQueue so that it becomes the manager of the workers. It contains the instances of the analizers, the queue of tracks to process and the pool of workers. Add the feature to calculate the number of threads of the CPU and/or read the max threads from the configuration.

-AnalyzerQueue becomes a single-instance class and so playermanager and library will use again the same instance.

-AnalyzerQueue contains a method to add to the queue, with a priority boolean parameter. If the amount of workers in the pool is less than the maximum number of threads allowed, add a new one and let it run. (We can assume that if there is any other worker, it is working already).
If the track was marked with priority, it is added in front of the the track queue, selects one worker from the pool and tells it to skip the current work and start with the next in the queue. (if one worker has just been created, do nothing, as it will just get the track once the semaphore lets it run)

  • The run and analyze methods will not change much, except to cleanup part of the old priority code.

  • Signals and slots need to be adapted (or we need to call analyzerqueue from the workers to emit the existing ones)

  • Worker threads will die just like they do now when they have no more work to do.

@mixxxbot
Copy link
Collaborator Author

Commented by: JosepMaJAZ
Date: 2016-11-12T17:33:52Z


I forgot to mention one thing.

The idea of the worker having only pointers to the analyzers is not so much an efficiency decision as it is a decision to dynamically allocate the analyzers for a concrete track.

This way, when we prioritize one track (which we assume comes from the playermanager), the worker that will take it can be configured to include the waveform analyzer. The exact way to implement this can change, but maybe the queue of tracks could have an instance of another class that could contain the priority flag.

We cannot assume that we can always get a worker and make it skip its work. We should only do that on workers that are working on a non-prioritized track (think about loading three decks on a 2 core machine).

@mixxxbot
Copy link
Collaborator Author

Commented by: rryan
Date: 2016-11-12T19:38:43Z


Sounds good to support a thread pool of analyzers dequeueing work from the pool. The UI interaction is a little tricky so it'd be nice to discuss that a little more before starting work on implementation.

I am worried about the plan of sizing the analyzer thread pool based on the number of cores. My concern is that no matter what our heuristic, it will go wrong for some folks. For these people, a problem like this can make Mixxx unsuitable for live use (e.g. Bug #⁠529614). To this end, I think we should expose the # of analyzer threads as a setting to the user in Preferences so they have a recourse if something goes wrong. Maybe we can set the default based on # of available hardware threads.

Some thoughts on thread priorities: In 2014 we re-organized our thread priorities into priority classes, Bug #⁠1270583.

QThread::setPriority(QThread::IdlePriority) translates to pthread SCHED_IDLE policy with priority 0 (BSD, OSX and Linux) and THREAD_PRIORITY_IDLE on Windows. Let's refer to concrete details when discussing priority changes. Anything above QThread::IdlePriority still uses the thread's original scheduling policy (e.g. SCHED_OTHER) with pthreads.

We have had problems with IdlePriority in the past causing tracks to take a long time to analyze (e.g. we had one user report analyzing a 3 minute track took 1 hour IIRC). I would use caution here. Ultimately, the "priority group" scheme we introduced in Bug #⁠1270583 addresses expressing the relative importance of Mixxx threads to the OS. I'm not sure we need to do more.

Setting thread priorities (especially SCHED_IDLE policy) on a CPU bound thread can also have unintended consequences of causing excessive thread pre-emption which can slow down analysis massively because every pre-emption causes CPU caches and TLB to flush.

@mixxxbot
Copy link
Collaborator Author

Commented by: daschuer
Date: 2016-11-12T20:57:31Z


One more thing to consider is, that the track analysis needs a CPU core, memory access and HDD access, which is shared with Sqlite and CachingReaders.

We have already users complaining about audio dropouts when analyzing the track on the fly.

The assumption: "If it is playing, max-1" is probably wrong, since Mixxx uses already several threads.

Multitheading analysis will be a great benefit when Mixxx is not playing.
Than we can use all Hardware recourses.

We may also consider to split the analysis itself into parallel task. Only one should require HDD.

Has anyone OpenMP experience? This could also be interesting for us.

@mixxxbot
Copy link
Collaborator Author

Commented by: MelGrubb
Date: 2016-11-14T12:50:59Z


As an end user, I think that keeping total resource usage low during a performance is the most important thing. From a very high level, I would think the prioritization logic should be more along the lines of:

If a track is currently playing, stop processing from the queue and do only on-demand analysis when a track is loaded. Alternately, limit to a user configurable max thread value, defaulted to 1.
If there is no track currently playing, use all available resources.

Then again, I don't understand why anyone would load up the analysis queue while performing. That is something I only do offline, and usually only when I know I've recently added a lot of new music to the computer. I rescan the library, select all "new" and hit Analyze. Then I go get dinner or watch TV for a while.

@mixxxbot
Copy link
Collaborator Author

Commented by: JosepMaJAZ
Date: 2016-11-14T20:24:33Z


Let's put everything in context first:

A) The batch analysis of tracks is only performed when the user explicitly presses analyze on the analyze view.
B) This analysis can be stopped if needed by pressing the same button that started it.
C) We are trying to move from a single thread analysis to a multithread analysis where the CPU resources are maximized (as in used if available).

D) The "on demand" analisys is that which happens when the track is loaded into a deck, and such track has not been analyzed yet.

If an user does "A" during a live performance, we can only provide a degree of glitch-free response dependant on the OSs ability of priorizing threads. And if it is not enough, the user has option "B".

About "C", this is about maximizing performance so we are interested in using the resources, if they are available. It is usual to offer the option to the user to change the amount of threads to use in case the detected setting isn't working as good as it should.

In this context, we can only think about "D" as being a "live performance" case, and the truth is that this case is not changing, because that is one single analysis. I am not proposing to do each analysis type into a different thread, since that requires thread synchronization and most probably the waveform analysis takes most of the time, so the whole analysis time would not improve much. (We might try that at some point, but it's not the main idea in this bug)
_
_
_
Now, how would I do it?

If there isn't any value for our "multithreaded_analysis" setting, or its value is zero, calculate the number of threads and set it.
We have two options to make this available to the user:
A) have an editbox to enter the value, and use always this value, except if it is less than 1 or higher than max-detected-threads, where we would change it to max-detected-threads and show it to the user that we changed it.
B) have an editbox to enter the value and another editbox, non-editable, that shows the actual number of threads that we are going to use, and use always this value, except if it is less than 1 or higher than max-detected-threads, where we would show to that we use max-detected-threads.
_
_
_
The process itself, as i described above consists of generating the desired amount of threads, and have them "consume" an initially filled queue of tracks.
All threads work independently and the queue is guarded so that only one thread at a time can update it.

In this context, thread priority was supposed to play a role since this is by definition a "long-running" task, and we still want to continue working on other parts of the application.
As said by RJ Ryan, we should not use idle priority since that doesn't work too well (seems that it's more geared to servers, where these tasks would only work when there is no user interaction).

Still, we can give hints with LOW_PRIORITY, NORMAL_PRIORITY and HIGH_PRIORITY.
Since we are not necessarily blocking other things, we should not find ourselves with problems. (Not sure about that cachereaderworker... )

About the amount of threads: The amount of threads only matters when they need to do something, since it will take longer for a lower priority thread to get its share of CPU if other more priorized threads are active.
But then, if a thread doesn't have anything to do (as in it is sleeping, waiting for events, synchronizing, or otherwise not ready to run), we are not making anything go slower by having a low priority thread run for the whole time-slice.

The real problem happens if we need an immediate response. An OS, except for interruptions, will not pause a running thread if it hasn't had it's time-slice. Since this is dependant on the OS, our priorized task might need to wait for a whole time-slice to start (and that's when there's no other similarly priorized task or other types of locks).

At last, when i said about reducing by one the number of threads if a track is playing, it is only so that in that case we know it is going to require some extra CPU. (And we UI graphics count too)
There's yet another option to play nicely with the user, reduce the number of threads to use if we detect an audio xrun. Do so until there is just one thread.
_
_
_
I started looking at how are threads best done with QT, and the more i read about, the more I'm convinced that I can only use "barebones" QThread (just like we do now). We need a Per-thread instance of analyzers (but not a per-task one), we have a constant number of threads and we want them all to do the same, taking as input a synchronized queue of tracks to process.
I.e. We are not in a scenario of list of black-box-like tasks to do (that include everything they need in order to acomplish that task) that are passed to a waiting thread that just tells them that they can run.

@mixxxbot
Copy link
Collaborator Author

Commented by: JosepMaJAZ
Date: 2016-11-25T20:47:14Z


I started working on this some days ago and I've got what I consider to be the workers ready.
Now I have to implement the worker manager, which is the one that will create the threads and feed them with the tracks to analyze.

It was a bit complicated to understand correctly how do the slots/signals operate in a multithreaded scenario, and combine them correctly with mutex, waitconditions and the likes.

I've done it this way:

  • Manager creates the worker and thread and assigns them (todo)
  • worker creates the analyzers (each worker needs their own instances) when starting (not in the constructor) sends a signal to the manager to ask for a track and waits on a waitcondition.
  • The manager receives the message and calls the worker directly feeding it one track (todo. The manager will know the worker because the worker itself tells how it is with the message).
  • The worker wakes up and processes the track. It sends signals to the manager to inform about the progress.
  • The manager receives the progress signals, and sends more signals for UI update.
  • The worker finishes analyzing the track and sends again the message to the manager that it needs a new track.
  • If the manager has no more tracks to analyze, it calls the worker method to stop.
  • The worker then sends a signal to the manager to indicate that it is finishing, and also sends a signal to the thread and they both clean up automatically.

I.e. threads will be alive only while they are required.

The scenario of prioritizing tracks (i.e. analysis of the player tracks) goes as follow:

  • If the manager has allocated max number of threads, take one of the existing workers and tell it to pause. Pause means wait on a waitcondition similarly to what they do to get a new track. The granularity is each processed buffer.
  • Then, it allocates a new worker, which is then configured for the foreground analysis type (i.e. ensuring waveform analysis is on). This worker will work similarly to other workers and when it finishes, it will ask too for a new track.
  • If there is no more priority tracks to analyze, the manager will tell the priority worker to stop and cleanup, and then it will tell the paused worker to continue.

Most of the manager part is still on todo, but the parts on the worker that the manager will use are already there.

Hope this will be good enough and not too complicate to understand either (hey.. it wasn't that easy right now anyway)

@mixxxbot
Copy link
Collaborator Author

Commented by: sblaisot
Date: 2016-11-25T21:17:27Z


can you also please explain the failing/error handling scenarios, like if a worker thread crash/hang and never send back any signal to the manager ?
Is there some sort of "timeout" or something which can be used by the manager to terminate the worker thread ?

@mixxxbot
Copy link
Collaborator Author

Commented by: JosepMaJAZ
Date: 2016-11-25T22:12:07Z


Mmmm. I didn't think about that. If i didn't overlook it, there's nothing in this regard right now either.

The crashing scenario might require a simple try-catch on the main process method, so that it can get captured and inform the manager that it ends.

The hang scenario is a bit more controversial. Maybe it could get detected (checking the periodicity of calls to progress), but nevertheless, it needs a forced stop, which means telling the thread directly to die. Maybe a QTimer once it has asked the thread to stop could test if it is still alive and then force it to stop.

@mixxxbot
Copy link
Collaborator Author

Commented by: JosepMaJAZ
Date: 2016-12-03T15:25:23Z


First version of the changes here:

https://github.com/JosepMaJAZ/mixxx/tree/multithreaded-analysis

@mixxxbot
Copy link
Collaborator Author

Commented by: daschuer
Date: 2016-12-05T08:06:49Z


Cool!
If you like an early review, please issue a normal PR against master and describe the unfinished state in the description.

@mixxxbot
Copy link
Collaborator Author

Commented by: uklotzde
Date: 2018-04-18T17:55:04Z


Superseded by the responsive analysis framework that supports multi-threading implicitly by design:
#1624

Sorry, JosepMa, I didn't plan to capture this task from you! The multitude of issues around our existing analysis framework just became apparent while digging deeper and deeper into the topic, leading to a complete rewrite literally.

@mixxxbot
Copy link
Collaborator Author

Issue closed with status Fix Released.

@mixxxbot mixxxbot transferred this issue from another repository Aug 24, 2022
@mixxxbot mixxxbot added this to the 2.3.0 milestone Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant