-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-107089: Improve Shelf.clear method performance #107090
Conversation
Thanks for the clear description of the problem. The solution looks ok, but I am a bit concerned it is not fully backwards compatible with classes derived from |
See: #107122 |
+1 |
It appears that the Revised test script: import os
import shelve
import tempfile
import timeit
def clearshelf(db):
keys = list(db.keys())
for k in keys:
del db[k]
def test_clear(use_clear_method):
with tempfile.TemporaryDirectory() as tempdir:
filename = os.path.join(tempdir, "test-shelf")
with shelve.open(filename) as db:
items = {str(x):x for x in range(10000)}
db.update(items)
if use_clear_method:
db.clear()
else:
clearshelf(db)
assert len(db) == 0
test_with_clear = lambda: test_clear(True)
test_without_clear = lambda: test_clear(False)
if __name__ == "__main__":
TRIALS = 5
method_time = timeit.timeit(test_with_clear, number=TRIALS, globals=globals()) / TRIALS
print(f"method:\t{method_time:.3}")
func_time = timeit.timeit(test_without_clear, number=TRIALS, globals=globals()) / TRIALS
print(f"func:\t{func_time:.3}") Running on my branch:
Edited Jul 26: Finally got to test the other dbms, using Rocky Linux/aarch64 9.2. With gdbm:
By forcing
The times are a lot closer on gdbm and practically the same on |
@jtcave Would you like to rebase the PR? |
The clear method used to be implemented by inheriting a mix-in from the MutableMapping ABC. It was a poor fit for shelves, and a better implementation is now in place
Because pythongh-107089 is peculiar to implementation details of dbm objects, it would be less disruptive to implement it in the DbfilenameShelf class, which is used for calls to shelve.open. Since it is known that the backing object is specifically one of the dbm objects, its clear method (see pythongh-107122) can be used with no fallback code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
Shelf
inherits itsclear
method fromMutableMapping
, but the implementation is poorly suited for that class. Theclear
mix-in is implemented by callingpopitem
in a loop. Each call topopitem
constructs a new iterator over the shelf, which is a generator that does byte-to-str conversion on the keys in thedbm
object. Unfortunately,dbm
objects are non-iterable, and theirkeys
method simply returns a list of all keys. Since eachpopitem
call performs a full scan of the key space, the clear method ends up having O(n²) performance (assuming the delete and read operations amortize to O(1)).By having the shelf just iterate over the list of dbm keys once, this is avoided.
Shelf.clear()
has very poor performance #107089