Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerated api for processing n x m entities #29

Closed
hardbyte opened this issue May 30, 2017 · 1 comment
Closed

Accelerated api for processing n x m entities #29

hardbyte opened this issue May 30, 2017 · 1 comment

Comments

@hardbyte
Copy link
Collaborator

hardbyte commented May 30, 2017

At the moment in entitymatch.cffi_filter_similarity_k we pass in two lists of bloom filters, but we end up calling the native code n times. This issue is to move that iteration into C.

We also are dealing with "nice" python bitarrays which require some manipulation (1) before passing into native code. We might want to consider adding an accelerated interface that takes our custom bit packed data as plain python bytes.

1: [ffi.new("char[128]", bytes(f[0].tobytes())) for f in filters1]

I've started experimenting in

  • Branch feature-chunked-speedup for a C implementation of many x many comparisons.
  • Branch feature-direct-cffi builds ontop of that with a look at accessing bitarray data from C without a memcopy. Only does a bitarray popcount for now.
# Assume ba is a bitarray
addr = ba.buffer_info()[0]
pntr = ffi.cast("char *", addr)
lib.popcount(pntr)
@unzvfu
Copy link

unzvfu commented Feb 21, 2018

This issue has been split into issues #66 and #67.

@unzvfu unzvfu closed this as completed Feb 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants