Implemented rle_fast Extension for Real Time Encoding #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes Made
Why?
While the algorithm for performing encoding and decoding operations is pretty efficient, one of its major drawbacks is that it is written is Python. Python is not a very fast language.
As such, python provides a C API for users to write extensions, which are written in C, but can be run from Python.
The extension I wrote, uses a similar algorithm to the one in rle package. But the speed has increased by almost 5x.
How?
The extension I built is present in the folder rle_fast.
Wrapper For Extension
1. The file rle_fast/rle_fast_extension.c contains the wrapper code for the extension. It contains the module definitions, method declarations, and two wrapper methods- one for encode and decode each.
2. These methods are namely- encode_c and decode_c.
3. The first step in these functions is to get and parse the arguments.
4. The next step is to check the correctness of the given arguments and raise appropriate errors if any.
5. After that, we call a function from the header file rle_utils.h
Utility Functions for Encode and Decode
1. The functions in the file, rle_fast/rle_utils.h are responsible for the actual encoding and decoding operations.
2. The algorithm used in these functions is the same as the one in rle/init.py file. The only difference being how various operations are being performed.
Different functions are used for performing operations like creating a new empty list, getting iterators, etc. These functions can be found on the official python site
Docstrings
No software can be complete without documentation. The docstrings for various methods and modules in C extensions need to be added at module definitions. Here, the module definitions are present in rle_fast_extension.c
But the docstrings can be found in the file rle_fast/rle_docs.h.
You can either view the docs from this file or use the help function after installing the extension.
Installation
To install the extension, the code has already been added to setup.py
Run the following commands from the terminal.
To build the package
python setup.py build
To install the package
python setup.py install
TO-DO
The extension doesn't support encoding operations on sequences containing strings or characters. This is because the code assumes that we are operating on numbers only. When such an input is given, it raises a NotImplementedError.
So, I have commented out the test that fails in file tests/test_encode_rlefast.py
Also, the README file needs to be updated with the extension.
And since this seems like major improvement to the package, maybe we can release this as a major version?