-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBD::Pg Array and string optimization patch #21
base: master
Are you sure you want to change the base?
Conversation
Hi, so this is a pretty huge rewrite of the guts of DBD::Pg. When using large numbers of placeholders (10,000 - 50,000) the performance boost is dramatic. I'd like to get some reviews of this if anyone has the time available. Given the size of the patch it's quite possible I've goofed somewhere. If you have any questions or concerns please let me know. |
Replace string operations with an efficient string concat library Replace all linked lists with arrays Use arrays to remove quadratic complexity algorithm. Fix CPAN RT #105492
Hi Matt, Thanks for the contribution. I note the following test errors introduced by your patch:
These don't exist in master, so they're definitely introduced by your patch; hopefully you'll have some quick insights there. Can you also add a test file for the high number of placeholders use case you mention? Have you done any timing tests for the more common use case of smaller number of placeholders? Also, was this original code you wrote for the efficient concat library? Best, David |
Woah, sorry about those test errors. I don't see those test errors myself. What platform or distro are you using? I'm using fedora 24. t/03smethod.t returns no errors for me, which is strange. I haven't done any timing tests for small placeholder lists. I'll do that and report back. I can also add tests for the larger lists in the next couple of days. Yes, the string concat library was written by myself from scratch. |
Fix test error in t/03smethod.t
Hi David, Sorry about that, an installed version of DBD::Pg was interfering with the tests. I've pushed a new commit that fixes the issue. I'm also getting a test error in t/30unicode.t, It's like that without my patch though. I'll add the large placeholder test and the bencmarks in the next couple of days. Thanks for the quick response! Regards, |
Hi Matt, Thanks for the updates; the t/30unicode.t is a known failure at this point and is actually a broken test. I'll review the code; I assume the issue was a naïve O(n^2) algorithm for multiple concats which wasn't revealed in the typical use case with a more pedestrian number of bind vars. :-) Thanks, David |
Hi David, I wrote this patch over a year ago, so my memory is a bit fuzzy. The main issues addressed by replacing linked lists with arrays is on removed line 2754
And this section on removed line 2223.
The final performance issue is happens when building the query string to dispatch to postgres. Repeated calls to strchr() and strcat() were causing a serious slowdown. It's not really noticeable with a dozen placeholders, but with 20,000 arguments the query string can be a megabyte or more in size, which causes a noticeable performance hit. I found all of this when porting bugzilla.redhat.com from MySQL 5.0 to Postgres 9.2. With the amount of data we have in Bugzilla, it can generate some pretty insane queries. |
results.txt - CSV file with some benchmarks with placeholder counts between 5 - 100 in increments of 5, and 1000 - 65000 in increments of 1000. reproducer.txt - perl script that I used to generate those results. You can run the reproducer in valgrind, and then look at the output in kcachegrind which will show you where the CPU spends all its time. Note that you will probably need the perl, dbd and dbd-pg debug packages installed for this to give useful output.
|
Hi @mtyson01, Has this patch been updated since the last correspondence in CPAN RT#105492? I'm just wondering if you consider this a little more bulletproof than you seemed in that ticket. I'm going to review the code in a little more depth now that it is passing tests. Best, David |
Hi David,
Yes, The patch in CPAN RT only replaces the placeholder linked list (ph_t). This patch in github has been expanded to replace the segment (seg_t) linked list as well.
I'm fairly confident that the patch is good for my use case. The patch in the CPAN RT was originally written against 2.19.3 (What we are running in production now) and then ported to 3.5. The 2.19.3 version of the patch has been running in production for a year with no problems. I'm pretty confident for my use case that it's bulletproof. (this does not include the seg_t array) I've noticed that DBD::Pg makes an effort to support postgres-isms such has named and positional placeholders. We don't actually use these, so those codepaths have not had much testing. I did some quick tests to check that they still worked, and it seemed ok, but it hasn't received anywhere near the usage that the DBI ? style placeholders have. |
I'm not in a rush to get this merged. I can maintain my fork for as long as needed. If you like we can wait until I push this version of the patch into production (this could be a long time though) |
I tried implementing this recently, but ran into a number of compilation issues with the new files. Will try again someday, but anyone else is welcome to take a stab at it. |
Anyone want to take another pass at this? I'm still having issues, but maybe you two (@mtyson01 @machack666) will have better luck? |
Replace string operations with an efficient string concat library
Replace all linked lists with arrays
Use arrays to remove quadratic complexity algorithm.
Fix CPAN RT #105492