Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-25433: Align bytearray strip methods to those found in byteobject.c #14771

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

rompe
Copy link

@rompe rompe commented Jul 14, 2019

According to bpo-25433, knowledge about whitespace should be implemented only once.

bytearray.strip, .rstrip, .lstrip all had their own definitions of whitespace.
The implementations for bytes is cleaner and a also a little bit faster, so I adopted the code from there and modified it to work with bytearrays and to fulfil the documentation which states that for bytearrays these functions always return copies, even if no changes have been applied.

I checked for performance regressions:

Current master:

./python -m timeit -s 'bla = bytearray(b"  foo  ")' 'bla.lstrip()'
1000000 loops, best of 5: 245 nsec per loop
./python -m timeit -s 'bla = bytearray(b"  foo  ")' 'bla.rstrip()'
1000000 loops, best of 5: 245 nsec per loop
./python -m timeit -s 'bla = bytearray(b"  foo  ")' 'bla.strip()'
1000000 loops, best of 5: 260 nsec per loop

Using this branch:

./python -m timeit -s 'bla = bytearray(b"  foo  ")' 'bla.lstrip()'
1000000 loops, best of 5: 235 nsec per loop
./python -m timeit -s 'bla = bytearray(b"  foo  ")' 'bla.rstrip()'
1000000 loops, best of 5: 235 nsec per loop
./python -m timeit -s 'bla = bytearray(b"  foo  ")' 'bla.strip()'
1000000 loops, best of 5: 239 nsec per loop

Timings using custom byte sequences to strip showed no differences at all between the two branches, no matter if one or multiple characters long.

https://bugs.python.org/issue25433

Ulf Rompe added 2 commits July 13, 2019 16:55
The previous implementations differed a lot from the the ones found in
bytesobject.c and all three of them included hardcoded lists of
whitespace characters. Knowledge about whitespace already exists in
pyctype.c and should not be duplicated.
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@mangrisano
Copy link
Contributor

/cc @tiran @serhiy-storchaka @pitrou

@serhiy-storchaka
Copy link
Member

I think that most of the code can be moved into bytes_methods.c and shared between bytes and bytearray implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants