Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use deque for ~4x speedup when reading array columns #164

Merged
merged 1 commit into from
Sep 14, 2020

Conversation

dourvaris
Copy link
Contributor

@dourvaris dourvaris commented Sep 13, 2020

This PR increases performance when dealing with larger array columns by ~4x by using a collections.deque instead of queue.Queue.

Profiles are below. For reference, the clickhouse query took ~100ms to return the data and clickhouse_driver took 1300ms (queue.Queue) and 300ms (collections.deque) when processing the data to python objects.

I still think there may be room for improvement by lowering the amount of function calls, but maybe for another PR.

With queue.Queue

         1024218 function calls (1024203 primitive calls) in 1.382 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3    0.397    0.132    1.318    0.439 arraycolumn.py:126(_read)
    30206    0.125    0.000    0.329    0.000 queue.py:121(put)
    60412    0.122    0.000    0.201    0.000 threading.py:335(notify)
    30206    0.112    0.000    0.310    0.000 queue.py:153(get)
    30209    0.048    0.000    0.075    0.000 queue.py:96(empty)
    30243    0.046    0.000    0.049    0.000 {method 'read' of 'clickhouse_driver.bufferedreader.BufferedReader' objects}
    30203    0.046    0.000    0.109    0.000 arraycolumn.py:45(size_unpack)
    60412    0.041    0.000    0.056    0.000 threading.py:243(__exit__)
    60412    0.041    0.000    0.079    0.000 threading.py:255(_is_owned)
    60412    0.039    0.000    0.067    0.000 threading.py:240(__enter__)
    60412    0.038    0.000    0.038    0.000 {method 'acquire' of '_thread.lock' objects}
    60415    0.035    0.000    0.050    0.000 queue.py:208(_qsize)
       38    0.034    0.001    0.034    0.001 {method 'read' of '_ssl._SSLSocket' objects}
    60412    0.029    0.000    0.029    0.000 {method '__enter__' of '_thread.lock' objects}
    30206    0.026    0.000    0.336    0.000 queue.py:192(get_nowait)
    52513    0.023    0.000    0.023    0.000 base.py:101(<genexpr>)
    30206    0.022    0.000    0.029    0.000 queue.py:212(_put)
    30216    0.020    0.000    0.020    0.000 {method 'unpack' of 'Struct' objects}
    30206    0.018    0.000    0.025    0.000 queue.py:216(_get)

With collections.deque

             299247 function calls (299232 primitive calls) in 0.430 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3    0.180    0.060    0.370    0.123 arraycolumn.py:121(_read)
    30243    0.045    0.000    0.048    0.000 {method 'read' of 'clickhouse_driver.bufferedreader.BufferedReader' objects}
    30203    0.044    0.000    0.106    0.000 arraycolumn.py:40(size_unpack)
       38    0.032    0.001    0.032    0.001 {method 'read' of '_ssl._SSLSocket' objects}
    52513    0.025    0.000    0.025    0.000 base.py:101(<genexpr>)
    30216    0.019    0.000    0.019    0.000 {method 'unpack' of 'Struct' objects}
        6    0.016    0.003    0.054    0.009 base.py:94(_read_data)
    60441    0.014    0.000    0.014    0.000 {method 'append' of 'list' objects}
    30215    0.011    0.000    0.011    0.000 {built-in method builtins.isinstance}
        1    0.010    0.010    0.430    0.430 <string>:1(<module>)
    30206    0.008    0.000    0.008    0.000 {method 'appendleft' of 'collections.deque' objects}
    30206    0.007    0.000    0.007    0.000 {method 'pop' of 'collections.deque' objects}

@dourvaris dourvaris force-pushed the perf/arraycolumn-speedup branch 5 times, most recently from a1075e6 to 46a15d7 Compare September 13, 2020 19:36
@dourvaris
Copy link
Contributor Author

Cleaned up some non relevant changes made by python code formatter. Ready for review.

@dourvaris dourvaris changed the title Use deque for higher performance when reading array columns Use deque for ~4x speedup when reading array columns Sep 13, 2020
@dourvaris dourvaris force-pushed the perf/arraycolumn-speedup branch from 46a15d7 to f890e94 Compare September 13, 2020 20:24
@dourvaris dourvaris force-pushed the perf/arraycolumn-speedup branch from f890e94 to ecf9f07 Compare September 14, 2020 01:23
@coveralls
Copy link

coveralls commented Sep 14, 2020

Coverage Status

Coverage decreased (-0.003%) to 96.53% when pulling ecf9f07 on dourvaris:perf/arraycolumn-speedup into 3e99054 on mymarilyn:master.

@xzkostyan xzkostyan merged commit 059bba7 into mymarilyn:master Sep 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants