Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize truth value testing for strings #10269

Merged
merged 6 commits into from
Apr 6, 2021

Conversation

pranavrajpal
Copy link
Contributor

This causes code that tests for the boolean value of a string to test the length directly instead of going through PyObject_IsTrue, as mentioned in mypyc/mypyc#768.

This should make code that uses this faster, but my benchmarks seem to show that this makes code slower than it would be without this optimization. I was hoping you could take a look at my code and the benchmarks and see if I'm doing anything wrong.

Test Plan

I added a test that compiles and runs code that uses this, as well as an irbuild test, and ran all of the mypyc tests to make sure they were still working.

Benchmarks

This is my benchmark code:

import time

def run(s: str) -> None:
    t0 = time.time()
    for i in range(1_000_000):
        if s:
            pass
    t1 = time.time()
    print(f'{t1 - t0:.3f}')

run('')
run('A' * 100)

My results from running this benchmark several times each:

Interpreted version without mypyc: 0.022 to 0.024 seconds
Compiled version, from master (without these changes): 0.006 to 0.007 seconds
Compiled version, with these changes: 0.008 to 0.009 seconds

This causes code that uses 'if x' where x is a string to check the
length of the string directly instead of using PyObject_IsTrue, which
should make it faster.
This fixes the expected IR for some tests that had checks to see if
strings were true, which was changed due to
76adcac.
Add a test case to make sure that the previous changes to how `if x`
works for strings don't break anything.
This adds an irbuild test to make sure that truth value testing with
strings checks the length directly instead of using the generic
PyObject_IsTrue function.
r2 = r1 >= 0 :: signed
r3 = truncate r1: int32 to builtins.bool
if r3 goto L1 else goto L2 :: bool
r1 = CPyObject_Size(r0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the bottleneck. This may be about as fast/slow as PyObject_IsTrue, so you aren't seeing an improvement. You'd need to specialize this operation for str values, and access the size of the string directly using C struct access. There are examples of how to do this for list and tuple values. Strings should be fairly similar (details may be different).

This changes the check if a string is true (i.e. it's length isn't zero)
into a primitive, instead of implementing it with a call to a generic
version of `builtins.len`. This should hopefully make code that uses
`if x` faster when x is a string.
Update the expected output of irbuild tests that had conversions from
strings to booleans, which changed when
bed74c9 moved that conversion to a
primitive.
@pranavrajpal
Copy link
Contributor Author

Ok, I have moved the check for whether a string's length is 0 to a primitive, and this appears to have made it faster.

Using the previously mentioned benchmark, I got these times:
Interpreted: 0.022 to 0.024 seconds
Compiled version, without these changes: 0.006 to 0.007 seconds
Compiled version, with these changes: 0.003 to 0.004 seconds

That appears to show that these changes make that benchmark approximately twice as fast.

Copy link
Collaborator

@TH3CHARLie TH3CHARLie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good job!

@JukkaL
Copy link
Collaborator

JukkaL commented Apr 6, 2021

Nice performance win! Thanks!

@JukkaL JukkaL merged commit 1e96f1d into python:master Apr 6, 2021
sthagen added a commit to sthagen/python-mypy that referenced this pull request Apr 7, 2021
[mypyc] Optimize truth value testing for strings (python#10269)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants