Added rle_fast C extension to improve speed #2

AshishS-1123 · 2021-01-13T10:45:40Z

Changes Made

Created rle_fast extension
Added tests for testing the extension
Udated setup.py to build and install the extension

Why?

While the algorithm for encoding and decoding operations in rle/init.py is very efficient, it does not perform the said operations very fast. The reason for this is that Python is a slow language.
As such, Python provides a C-API for users to write extensions to Python. This way, we have the speed of C with the flexibility of Python.

Here, I have used the C-API to write the same algorithm as used in rle/init.py in C, provided code for building this extension and wrote a few tests. For the few input values that I tried, the speed seems to have improved at least 4x.

How?

The code for this extension is present in the folder rle_fast.
It contains 3 files, namely-

rle_fast_extension.c
rle_utils.h
rle_docs.h

Wrapper for extension

The wrapper code for the extension is present in the file rle_fast/rle_fast_extension.c.
This code contains two methods encode_c and decode_c that will be called for encode and decode operations respectively. They are responsible for taking the arguments, parsing them, performing type checking, raising appropriate exceptions, etc.
In short, they act as an interface between Python and the C code.

Other than these functions, there are some function-level and module-level definitions too, where we define the names of the functions that will be called from the python script, number of arguments to be passed, docstrings for functions and module, and the name of the module.

Encode and Decode Operations

The file rle_fast/rle_utils.h contains the actual algorithm for performing the encode and decode operations.
It contains two functions, encode_sequence and decode_sequence.

The algorithm used in these two functions is the same as that in rle/init.py.

Documentation

The docstrings are present in the file rle_fast/rle_docs.h.
These are merely variables containing strings describing the module and the methods in it.
In rle_fast_extension.c, these docstrings have been used in the module and function definitions. After building and installing the extension, these docstrings can be accessed using the built-in help or doc method, just like with a normal python package.

Installation

The code for installing the rle_fast extension is present in the setup.py file.

To build the extension,
python setup.py build

To install the extension,
python setup.py install

Usage

To import the package,

from rle.rle_fast import encode
from rle.rle_fast import decode

Changes as compared to PR #1

In my previous pull request, I had mentioned that the extension fails for non integer values. I have fixed that bug.

Before, in the encode_sequence function, I had converted the elements from the input sequence to integers, before comparing them. This is why the code failed for non integer parameters.
In this version of my code, I have made use of an API function PyObject_RichCompareBool that compares two Python objects.
Now, the code works for almost all data-types, including integers, floats, complex numbers, characters, etc.

TO-DO

Write better tests for the extension
Find cases where the extension might fail, and fix it.

AshishS-1123 · 2021-02-21T10:37:52Z

@tnwei
I know you are super busy right now, but could you please review this PR. I am starting to wonder if there is some problem in the code.

As for the problem of distributing the package on multiple platforms, the best solution I could find was to use Github Actions. Now, I am not that familiar with how to go about doing that, so I decided to check out how other packages that use C extensions build wheels. Checkout this link from scikit-learn. It looks promising.

AshishS-1123 · 2021-04-15T13:07:12Z

@tnwei Its been months since this PR was opened. Yet you haven't given any response.
Please comment below whether you are looking into this, or if you want to reject my contribution. Otherwise, I will have no choice but to call this a dead project and create and publish my own fork.

AshishS-1123 added 5 commits January 13, 2021 09:30

Added wrapper code for building extension

3532399

Added utility functions for encode and decode operations

4c101fa

Added docstrings for module and functions

68dddce

Updated setup.py to build extension

0e57e39

Added unittests for testing rle_fast

eb4c1c4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added rle_fast C extension to improve speed #2

Added rle_fast C extension to improve speed #2

AshishS-1123 commented Jan 13, 2021

AshishS-1123 commented Feb 21, 2021

AshishS-1123 commented Apr 15, 2021

Added rle_fast C extension to improve speed #2

Are you sure you want to change the base?

Added rle_fast C extension to improve speed #2

Conversation

AshishS-1123 commented Jan 13, 2021

Changes Made

Why?

How?

Wrapper for extension

Encode and Decode Operations

Documentation

Installation

Usage

Changes as compared to PR #1

TO-DO

AshishS-1123 commented Feb 21, 2021

AshishS-1123 commented Apr 15, 2021