-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace fibers with threadpool #674
Conversation
src/cosim/utility/thread_pool.hpp
Outdated
if (!work_queue_.empty()) { | ||
|
||
auto task = work_queue_.front(); | ||
work_queue_.pop(); | ||
task(); | ||
|
||
lck.unlock(); | ||
cv_.notify_one(); | ||
|
||
} else { | ||
std::this_thread::yield(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this PR. Would it be possible to have an option to run fmus without spawning a separate worker thread? (i.e. running everything in a main thread sequentially). For some use-cases, the communication overhead between the main and the worker threads seem quite significant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possible today, and was in fact the motivation for using fibers in the first place. (Not that fibers are required for sequential execution of FMU functions, but they provide a nice framework for combining sequential execution with asynchronous I/O so one can run both local and remote slaves without the overhead of threads.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is just that the remote-slave implementation we're using today (proxyfmu) is not built around async I/O, so it requires a worker thread per slave anyway. That is why fibers simply are an extra overhead that comes in addition to threading and I/O costs, which is what I suppose motivated @markaren's PR.
As I see it, there are two ways to improve performance in this area: Drop fibers and go all-in for proxyfmu and one-thread-per-slave (as this PR proposes), or keep fibers and implement async communication with slaves (as was the original plan).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation @kyllingstad.
I was able to achieve required simulation performance by removing std::thread
for slaves as well as recursive fiber creation in the libcosim master branch. I was not able to achieve this via pseudo_async_slave
because it still creates fiber recursively in the fmu interface methods (calling get/setters for variables and do_step).
But now the problem is I cannot not simulate some fmus because they run directly in fibers with limited resources (created by slave_simulator::impl::do_step
). And I cannot replace/extend slave_simulator
as it is always added in execution::add_slave
.
This PR seems fixed my issue but I still had to remove worker thread to avoid communication overhead (also worker thread consuming cpu time on checking messages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to have an option to run fmus without spawning a separate worker thread?
Yes, and this is one of the motivations for this PR. IMO the master algorithm itself should decide how it handles execution. In the case of this implementation it could take numThreads
as an argument where < 1 means no pool.
proxy-fmu was not a motivation for this PR. My motivation was to simplify the code base and make it run faster. I've found that the fiber solution is slower than a threaded solution with or without proxy-fmu.
This is an important thing to address if this is something we want to move forward with. Do we have to support old compilers that are not C++17 feature complete? |
If the target branch gets the ok, we should also make it possible to specify the count in the XML config. Edit: This was meant as an comment in #680 |
Adding the functionality to set the number of worker threads for SSP was trivial, but not so much for the OSP alternative as it does not provide algorithm spesific configuration options. This is related to #404 |
I see your point, but I don't think this can be solved separately. Should not be a blocker for this PR. |
What's the reason for this PR to remain in draft mode? Any remaining issues that must be resolved? |
Not other than significant vetting. And yeah, the concurrent file locking test needs to be fixed (currently commented out). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work! Just added some comments
|
Test included and passing now |
As fibers are removed, I think |
@markaren, if you are ok with the latest changes to |
How are the observed differences in usage/speed/accuracy on your side? All good? |
Yes. I've added a We seem to be seeing ~15-20% improvements in simulation speed with this PR over the fiber implementation, at least with the example projects like dp-ship. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I don't have much to add except some stylistic nitpicks here and there. I didn't look at the thread pool implementation in any detail, since it seems to have been thoroughly reviewed by others. Everything else looks good to merge as far as I'm concerned.
boost::fibers::condition_variable condition_; | ||
}; | ||
|
||
|
||
/** | ||
* A shared mutex à la `std::shared_mutex`, but with support for fibers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this class at all anymore, we can just use std::shared_mutex
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I see that @ljamt already suggested we do this as a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we noticed this as well. In addition cosim::utility::shared_mutex
was used in utility_concurrency_unittest.cpp
with the intention to test that file locking functioned correctly with the custom mutex, which was in turn only necessary because of fibers.
With the removal of fibers, that test can be removed as well, given that there is no point for us in unit testing std::shared_mutex
. We wanted to push this as a separate PR to avoid more noise in this one, since cosim::utility::shared_mutex
is used throughout concurrency.hpp / cpp
.
One more thing: You may want to consider running clang-format on everything before merging. I see there are some includes that are out of alphabetic order after the |
This PR replaces #671. It uses a threadpool rather than
std::for_each
so that compilers older than gcc9 will work.Note that slave state is unimplemented in this PR. It will eventually be included when I have figured out how to best make it back in after removing
async_slave
.Additionally the
concurrency
test has been commented out. The test is about file locking.