Accelerated api for processing n x m entities #29

hardbyte · 2017-05-30T01:42:22Z

At the moment in entitymatch.cffi_filter_similarity_k we pass in two lists of bloom filters, but we end up calling the native code n times. This issue is to move that iteration into C.

We also are dealing with "nice" python bitarrays which require some manipulation (1) before passing into native code. We might want to consider adding an accelerated interface that takes our custom bit packed data as plain python bytes.

1: [ffi.new("char[128]", bytes(f[0].tobytes())) for f in filters1]

I've started experimenting in

Branch feature-chunked-speedup for a C implementation of many x many comparisons.
Branch feature-direct-cffi builds ontop of that with a look at accessing bitarray data from C without a memcopy. Only does a bitarray popcount for now.

# Assume ba is a bitarray
addr = ba.buffer_info()[0]
pntr = ffi.cast("char *", addr)
lib.popcount(pntr)

The text was updated successfully, but these errors were encountered:

unzvfu · 2018-02-21T04:12:56Z

This issue has been split into issues #66 and #67.

hardbyte added the enhancement label May 30, 2017

hardbyte added a commit that referenced this issue Aug 18, 2017

First cut at issue #29

f6b0c8c

hardbyte added the optimisation label Jan 9, 2018

unzvfu self-assigned this Feb 7, 2018

unzvfu added this to the Sprint 2018-02-12 milestone Feb 7, 2018

unzvfu mentioned this issue Feb 7, 2018

bit shuffling #18

Closed

gusmith modified the milestones: Sprint 2018-02-12, Sprint 2018-02-26 Feb 21, 2018

This was referenced Feb 21, 2018

Avoid copying data back and forth between the Python runtime and the C++ library #66

Open

Accelerated API for processing n x m entities #67

Closed

unzvfu closed this as completed Feb 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerated api for processing n x m entities #29

Accelerated api for processing n x m entities #29

hardbyte commented May 30, 2017 •

edited

Loading

unzvfu commented Feb 21, 2018

Accelerated api for processing n x m entities #29

Accelerated api for processing n x m entities #29

Comments

hardbyte commented May 30, 2017 • edited Loading

unzvfu commented Feb 21, 2018

hardbyte commented May 30, 2017 •

edited

Loading