-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split PBF Parser into a single Read and a single Parse thread to increase parsing speed. #20
Conversation
…ease parsing speed.
@@ -32,6 +32,84 @@ or see http://www.gnu.org/licenses/agpl.txt. | |||
#include "ExtractorStructs.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since concurrent_queue is not just a small struct, it is justified to put into its own class file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will split up into its own file.
Great addition, I put a few comments into the pull request |
|
||
bool empty() const | ||
{ | ||
boost::mutex::scoped_lock lock(queue_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a lock on the empty function really necessary. This is a concurrent read and it does not ensure that the queue does not change after the lock goes out of scope.
Split PBF Parser into a single Read and a single Parse thread to increase parsing speed.
Thanks |
I noticed extracting was inefficient in the original threading model, and I wasn't making any significant gains by increasing the number of threads to more than 1 (the default). I split the PBF Parser code into two separate threads instead - one thread reads all data from the PBF, and another thread parses all the data. The Read thread places all data into a thread-safe queue. I added a maximum queue size limit to avoid issues with reading all of the data into RAM (data is read quicker than it is parsed). I noticed an improvement of around 30% over original parsing method. Threading is done with Boost threads. I removed the OMP pragmas as they were no longer necessary.
Note - Read/Parse methods are NOT thread safe any longer. Only one thread may Read and Parse at the same time. I experimented with multiple Parse threads, but the improvement was negligible due to greedy locking of the STXXLContainers object. I'm guessing a better locking mechanism may improve the performance, though I did not have time to test it out.