DBD::Pg Array and string optimization patch #21

mtyson01 · 2016-10-09T04:27:54Z

Replace string operations with an efficient string concat library
Replace all linked lists with arrays
Use arrays to remove quadratic complexity algorithm.

Fix CPAN RT #105492

mtyson01 · 2016-10-09T04:30:19Z

Hi, so this is a pretty huge rewrite of the guts of DBD::Pg. When using large numbers of placeholders (10,000 - 50,000) the performance boost is dramatic.

I'd like to get some reviews of this if anyone has the time available. Given the size of the patch it's quite possible I've goofed somewhere.

If you have any questions or concerns please let me know.

Replace string operations with an efficient string concat library Replace all linked lists with arrays Use arrays to remove quadratic complexity algorithm. Fix CPAN RT #105492

machack666 · 2016-10-10T14:22:35Z

Hi Matt,

Thanks for the contribution. I note the following test errors introduced by your patch:

t/03smethod.t ....... 1/122
#   Failed test 'Statement handle attribute pg_numbound returns 0 if no placeholders bound yet'
#   at t/03smethod.t line 622.
#          got: '2'
#     expected: '0'

#   Failed test 'Statement handle attribute pg_numbound returns 1 if one placeholder bound'
#   at t/03smethod.t line 626.
#          got: '2'
#     expected: '1'

#   Failed test 'Statement handle attribute pg_numbound returns 1 if one placeholders bound as NULL'
#   at t/03smethod.t line 635.
#          got: '2'
#     expected: '1'
t/03smethod.t ....... 122/122 # Looks like you failed 3 tests of 122.

These don't exist in master, so they're definitely introduced by your patch; hopefully you'll have some quick insights there. Can you also add a test file for the high number of placeholders use case you mention?

Have you done any timing tests for the more common use case of smaller number of placeholders?

Also, was this original code you wrote for the efficient concat library?

Best,

David

mtyson01 · 2016-10-10T20:18:44Z

Woah, sorry about those test errors. I don't see those test errors myself. What platform or distro are you using? I'm using fedora 24. t/03smethod.t returns no errors for me, which is strange.

I haven't done any timing tests for small placeholder lists. I'll do that and report back. I can also add tests for the larger lists in the next couple of days.

Yes, the string concat library was written by myself from scratch.

Fix test error in t/03smethod.t

mtyson01 · 2016-10-10T21:05:19Z

Hi David,

Sorry about that, an installed version of DBD::Pg was interfering with the tests. I've pushed a new commit that fixes the issue.

I'm also getting a test error in t/30unicode.t, It's like that without my patch though.
BD::Pg::st execute failed: ERROR: invalid byte sequence for encoding "UTF8": 0xc9 0x6d at t/30unicode.t line 117.

I'll add the large placeholder test and the bencmarks in the next couple of days.

Thanks for the quick response!

Regards,
Matt.

machack666 · 2016-10-10T21:10:51Z

Hi Matt,

Thanks for the updates; the t/30unicode.t is a known failure at this point and is actually a broken test.

I'll review the code; I assume the issue was a naïve O(n^2) algorithm for multiple concats which wasn't revealed in the typical use case with a more pedestrian number of bind vars. :-)

Thanks,

David

mtyson01 · 2016-10-11T00:02:44Z

Hi David,

I wrote this patch over a year ago, so my memory is a bit fuzzy.

The main issues addressed by replacing linked lists with arrays is on removed line 2754

for (x=1,currph=imp_sth->ph; NULL != currph; currph = currph->nextph,x++) {
    if (x==phnum){
        break;
    }
}

And this section on removed line 2223.
Looking at it now, It looks like I've removed this and not replaced it properly. I'll have to check this further.

/* Let the correct segment(s) point to it */
for (currseg=imp_sth->seg; NULL != currseg; currseg=currseg->nextseg) {
     if (currseg->placeholder==xint) {
         currseg->ph = newph;
     }
}

The final performance issue is happens when building the query string to dispatch to postgres. Repeated calls to strchr() and strcat() were causing a serious slowdown. It's not really noticeable with a dozen placeholders, but with 20,000 arguments the query string can be a megabyte or more in size, which causes a noticeable performance hit.

I found all of this when porting bugzilla.redhat.com from MySQL 5.0 to Postgres 9.2. With the amount of data we have in Bugzilla, it can generate some pretty insane queries.

mtyson01 · 2016-10-11T10:37:11Z

Have you done any timing tests for the more common use case of smaller number of placeholders?

results.txt
reproducer.txt

results.txt - CSV file with some benchmarks with placeholder counts between 5 - 100 in increments of 5, and 1000 - 65000 in increments of 1000.

reproducer.txt - perl script that I used to generate those results.

You can run the reproducer in valgrind, and then look at the output in kcachegrind which will show you where the CPU spends all its time.

Note that you will probably need the perl, dbd and dbd-pg debug packages installed for this to give useful output.

$ valgrind --tool=callgrind perl reproducer.pl
$ kcachegrind callgrind.out.1234

machack666 · 2016-10-14T14:11:07Z

Hi @mtyson01,

Has this patch been updated since the last correspondence in CPAN RT#105492? I'm just wondering if you consider this a little more bulletproof than you seemed in that ticket.

I'm going to review the code in a little more depth now that it is passing tests.

Best,

David

mtyson01 · 2016-10-16T23:08:20Z

Hi David,

Has this patch been updated since the last correspondence in CPAN RT#105492?

Yes, The patch in CPAN RT only replaces the placeholder linked list (ph_t). This patch in github has been expanded to replace the segment (seg_t) linked list as well.

I'm just wondering if you consider this a little more bulletproof than you seemed in that ticket.

I'm fairly confident that the patch is good for my use case. The patch in the CPAN RT was originally written against 2.19.3 (What we are running in production now) and then ported to 3.5.

The 2.19.3 version of the patch has been running in production for a year with no problems. I'm pretty confident for my use case that it's bulletproof. (this does not include the seg_t array)

I've noticed that DBD::Pg makes an effort to support postgres-isms such has named and positional placeholders. We don't actually use these, so those codepaths have not had much testing. I did some quick tests to check that they still worked, and it seemed ok, but it hasn't received anywhere near the usage that the DBI ? style placeholders have.

mtyson01 · 2016-10-16T23:38:36Z

I'm not in a rush to get this merged. I can maintain my fork for as long as needed. If you like we can wait until I push this version of the patch into production (this could be a long time though)

turnstep · 2020-06-18T17:25:40Z

I tried implementing this recently, but ran into a number of compilation issues with the new files. Will try again someday, but anyone else is welcome to take a stab at it.

turnstep · 2021-05-24T14:51:30Z

Anyone want to take another pass at this? I'm still having issues, but maybe you two (@mtyson01 @machack666) will have better luck?

DBD::Pg Array and string optimization patch

98a6abe

Replace string operations with an efficient string concat library Replace all linked lists with arrays Use arrays to remove quadratic complexity algorithm. Fix CPAN RT #105492

mtyson01 force-pushed the master branch from 9f1a1ae to 98a6abe Compare October 9, 2016 23:27

DBD::Pg Array and string optimization patch

d8e0ef1

Fix test error in t/03smethod.t

turnstep self-assigned this May 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DBD::Pg Array and string optimization patch #21

DBD::Pg Array and string optimization patch #21

mtyson01 commented Oct 9, 2016

mtyson01 commented Oct 9, 2016

machack666 commented Oct 10, 2016

mtyson01 commented Oct 10, 2016

mtyson01 commented Oct 10, 2016

machack666 commented Oct 10, 2016

mtyson01 commented Oct 11, 2016

mtyson01 commented Oct 11, 2016

machack666 commented Oct 14, 2016

mtyson01 commented Oct 16, 2016

mtyson01 commented Oct 16, 2016

turnstep commented Jun 18, 2020

turnstep commented May 24, 2021

DBD::Pg Array and string optimization patch #21

Are you sure you want to change the base?

DBD::Pg Array and string optimization patch #21

Conversation

mtyson01 commented Oct 9, 2016

mtyson01 commented Oct 9, 2016

machack666 commented Oct 10, 2016

mtyson01 commented Oct 10, 2016

mtyson01 commented Oct 10, 2016

machack666 commented Oct 10, 2016

mtyson01 commented Oct 11, 2016

mtyson01 commented Oct 11, 2016

machack666 commented Oct 14, 2016

mtyson01 commented Oct 16, 2016

mtyson01 commented Oct 16, 2016

turnstep commented Jun 18, 2020

turnstep commented May 24, 2021