Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBD::Pg Array and string optimization patch #21

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mtyson01
Copy link

@mtyson01 mtyson01 commented Oct 9, 2016

Replace string operations with an efficient string concat library
Replace all linked lists with arrays
Use arrays to remove quadratic complexity algorithm.

Fix CPAN RT #105492

@mtyson01
Copy link
Author

mtyson01 commented Oct 9, 2016

Hi, so this is a pretty huge rewrite of the guts of DBD::Pg. When using large numbers of placeholders (10,000 - 50,000) the performance boost is dramatic.

I'd like to get some reviews of this if anyone has the time available. Given the size of the patch it's quite possible I've goofed somewhere.

If you have any questions or concerns please let me know.

Replace string operations with an efficient string concat library
Replace all linked lists with arrays
Use arrays to remove quadratic complexity algorithm.

Fix CPAN RT #105492
@machack666
Copy link
Collaborator

Hi Matt,

Thanks for the contribution. I note the following test errors introduced by your patch:

t/03smethod.t ....... 1/122
#   Failed test 'Statement handle attribute pg_numbound returns 0 if no placeholders bound yet'
#   at t/03smethod.t line 622.
#          got: '2'
#     expected: '0'

#   Failed test 'Statement handle attribute pg_numbound returns 1 if one placeholder bound'
#   at t/03smethod.t line 626.
#          got: '2'
#     expected: '1'

#   Failed test 'Statement handle attribute pg_numbound returns 1 if one placeholders bound as NULL'
#   at t/03smethod.t line 635.
#          got: '2'
#     expected: '1'
t/03smethod.t ....... 122/122 # Looks like you failed 3 tests of 122.

These don't exist in master, so they're definitely introduced by your patch; hopefully you'll have some quick insights there. Can you also add a test file for the high number of placeholders use case you mention?

Have you done any timing tests for the more common use case of smaller number of placeholders?

Also, was this original code you wrote for the efficient concat library?

Best,

David

@mtyson01
Copy link
Author

Woah, sorry about those test errors. I don't see those test errors myself. What platform or distro are you using? I'm using fedora 24. t/03smethod.t returns no errors for me, which is strange.

I haven't done any timing tests for small placeholder lists. I'll do that and report back. I can also add tests for the larger lists in the next couple of days.

Yes, the string concat library was written by myself from scratch.

Fix test error in t/03smethod.t
@mtyson01
Copy link
Author

Hi David,

Sorry about that, an installed version of DBD::Pg was interfering with the tests. I've pushed a new commit that fixes the issue.

I'm also getting a test error in t/30unicode.t, It's like that without my patch though.
BD::Pg::st execute failed: ERROR: invalid byte sequence for encoding "UTF8": 0xc9 0x6d at t/30unicode.t line 117.

I'll add the large placeholder test and the bencmarks in the next couple of days.

Thanks for the quick response!

Regards,
Matt.

@machack666
Copy link
Collaborator

Hi Matt,

Thanks for the updates; the t/30unicode.t is a known failure at this point and is actually a broken test.

I'll review the code; I assume the issue was a naïve O(n^2) algorithm for multiple concats which wasn't revealed in the typical use case with a more pedestrian number of bind vars. :-)

Thanks,

David

@mtyson01
Copy link
Author

Hi David,

I wrote this patch over a year ago, so my memory is a bit fuzzy.

The main issues addressed by replacing linked lists with arrays is on removed line 2754

for (x=1,currph=imp_sth->ph; NULL != currph; currph = currph->nextph,x++) {
    if (x==phnum){
        break;
    }
}

And this section on removed line 2223.
Looking at it now, It looks like I've removed this and not replaced it properly. I'll have to check this further.

/* Let the correct segment(s) point to it */
for (currseg=imp_sth->seg; NULL != currseg; currseg=currseg->nextseg) {
     if (currseg->placeholder==xint) {
         currseg->ph = newph;
     }
}

The final performance issue is happens when building the query string to dispatch to postgres. Repeated calls to strchr() and strcat() were causing a serious slowdown. It's not really noticeable with a dozen placeholders, but with 20,000 arguments the query string can be a megabyte or more in size, which causes a noticeable performance hit.

I found all of this when porting bugzilla.redhat.com from MySQL 5.0 to Postgres 9.2. With the amount of data we have in Bugzilla, it can generate some pretty insane queries.

@mtyson01
Copy link
Author

Have you done any timing tests for the more common use case of smaller number of placeholders?

results.txt
reproducer.txt

results.txt - CSV file with some benchmarks with placeholder counts between 5 - 100 in increments of 5, and 1000 - 65000 in increments of 1000.

reproducer.txt - perl script that I used to generate those results.

You can run the reproducer in valgrind, and then look at the output in kcachegrind which will show you where the CPU spends all its time.

Note that you will probably need the perl, dbd and dbd-pg debug packages installed for this to give useful output.

$ valgrind --tool=callgrind perl reproducer.pl
$ kcachegrind callgrind.out.1234

@machack666
Copy link
Collaborator

Hi @mtyson01,

Has this patch been updated since the last correspondence in CPAN RT#105492? I'm just wondering if you consider this a little more bulletproof than you seemed in that ticket.

I'm going to review the code in a little more depth now that it is passing tests.

Best,

David

@mtyson01
Copy link
Author

Hi David,

Has this patch been updated since the last correspondence in CPAN RT#105492?

Yes, The patch in CPAN RT only replaces the placeholder linked list (ph_t). This patch in github has been expanded to replace the segment (seg_t) linked list as well.

I'm just wondering if you consider this a little more bulletproof than you seemed in that ticket.

I'm fairly confident that the patch is good for my use case. The patch in the CPAN RT was originally written against 2.19.3 (What we are running in production now) and then ported to 3.5.

The 2.19.3 version of the patch has been running in production for a year with no problems. I'm pretty confident for my use case that it's bulletproof. (this does not include the seg_t array)

I've noticed that DBD::Pg makes an effort to support postgres-isms such has named and positional placeholders. We don't actually use these, so those codepaths have not had much testing. I did some quick tests to check that they still worked, and it seemed ok, but it hasn't received anywhere near the usage that the DBI ? style placeholders have.

@mtyson01
Copy link
Author

I'm not in a rush to get this merged. I can maintain my fork for as long as needed. If you like we can wait until I push this version of the patch into production (this could be a long time though)

@turnstep turnstep self-assigned this May 18, 2020
@turnstep
Copy link
Contributor

I tried implementing this recently, but ran into a number of compilation issues with the new files. Will try again someday, but anyone else is welcome to take a stab at it.

@turnstep
Copy link
Contributor

Anyone want to take another pass at this? I'm still having issues, but maybe you two (@mtyson01 @machack666) will have better luck?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants