-
Notifications
You must be signed in to change notification settings - Fork 39
Adding operator+[=] and operator/[=] for FloatingArray and IntegerArray. #58
Conversation
…nippets and cmake modules * Build native.pyx stub and libpandas.so * In place cmake build + rpath on linux, at least * Library scaffolding and build infrastructure for libpandas native core experiments to evaluate replacing Series and DataFrame internals and custom dtypes with a consistent set of wrapper types, array, and table data structures. Incorporates several bits of BSD / MIT / Apache-licensed code and various copyright notices have been added to LICENSES. * Do not include pandas/ * Array API stubs * Import bunch of utility code from Arrow, refactoring
Please leave code comments on Gerrit: https://review.gerrithub.io/#/c/298035/ Author: Wes McKinney <[email protected]> Closes wesm#40 from wesm/array-views and squashes the following commits: 27a194b [Wes McKinney] Change merge-pr script to target pandas-dev/pandas2 ea2017e [Wes McKinney] * Prototype array view interface * Buffer slicing and copying * Test ArrayView constructors and copy/move assignment * Test ArrayView::Slice, EnsureMutable, and some array attrs Change-Id: I1f942ec1160590c33b68a12eb88fe923dd135911
Help keep things clean. This style is based on LLVM and Google C++ styles -- we can always tweak a bit later. Author: Wes McKinney <[email protected]> Closes wesm#48 from wesm/clang-format and squashes the following commits: eed45dd [Wes McKinney] Built "make format" target format target using clang-format based on arrow-cpp and clean up existing codebase Change-Id: Id697c3805d18107ad7e20e9b81ba09350d7cd219
…or reporting, bit manipulation) with libarrow Closes wesm#44. This will spare us developing similar functionality in two places for the time being -- we can keep an eye on the relationship between the libraries in case it does make sense to split in the future (but it may not, particularly if we use Arrow to implement nested data natively in pandas). Author: Wes McKinney <[email protected]> Closes wesm#49 from wesm/link-libarrow and squashes the following commits: 1d176b1 [Wes McKinney] Refactor libpandas to use common constructs and utility code from libarrow: Status, MemoryPool, Buffer, etc. Change-Id: I3140a3ed8dd17db376be21241fc259523a16cc88
For test automation for now. This won't pass until the libarrow artifacts are updated in conda-forge Closes wesm#41 Author: Wes McKinney <[email protected]> Closes wesm#50 from wesm/circle-ci and squashes the following commits: c68b56b [Wes McKinney] Skip Cython extensions for now e2bdb62 [Wes McKinney] Add external project include, google version numbers 5d4a536 [Wes McKinney] conda create fix 5986e72 [Wes McKinney] conda create with env path af44d4d [Wes McKinney] Docker mysteries 1391c87 [Wes McKinney] Tweaks, not sure why failing 3fc6219 [Wes McKinney] Add basic Circle CI setup Change-Id: I2fefc2548d5e60a8e6f9ea39fa57efb9cd0b2e79
* Changing PrimitiveType to NumericType * Removing the macro for declaring sub-types of NumericType and instead using template arguments * Addding a static SINGLETON member to the NumericType base class instead of the macros for methods for creating singletons for each numeric type * Removing NullType's inheritance of PrimitiveType * Removing intermediate base class between {Integer,Floating}ArrayImpl and PrimitiveType and just making a concrete template based class {Integer,Floating}Array * Changing FLOAT and DOUBLE to FLOAT32 and FLOAT64 Author: Joshua Storck <[email protected]> Closes wesm#55 from joshuastorck/pandas-2.0 and squashes the following commits: 1614d44 [Joshua Storck] * Fixing native.p{xd,yx} so that it matches the latest C++ code * Changing PrimitiveType to NumericType * Removing the macro for declaring sub-types of NumericType and instead using template arguments * Addding a static SINGLETON member to the NumericType base class instead of the macros for methods for creating singletons for each numeric type * Removing NullType's inheritance of PrimitiveType * Removing intermediate base class between {Integer,Floating}ArrayImpl and PrimitiveType and just making a concrete template based class {Integer,Floating}Array * Changing FLOAT and DOUBLE to FLOAT32 and FLOAT64
auto other_data = other.data(); | ||
if (HasNulls()) { | ||
for (auto ii = 0; ii < length; ++ii, ++this_data, ++other_data) { | ||
if (!IsNull(ii)) { *this_data += *other_data; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check may or may not be a good idea from a performance perspective. Are you sure the conditional check is cheaper than just doing the arithmetic always? Even when there are only a few null values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say to do the arithmetic always and use byte/word-wise &
with the respective bitmaps (if they are byte-aligned, otherwise do it bitwise) to propagate nulls separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would have been my preference. It just wasn't clear from the existing code if floating point arrays intentionally didn't have bitmaps.
// Copy the list of nulls (aka valid bits) to a buffer | ||
Status CopyNulls(int64_t length, std::shared_ptr<Buffer>* out) const { | ||
// Pre-condition is that we already have valid_bits_ set | ||
return valid_bits_->Copy(this->offset_, BitUtil::BytesForBits(length), out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offset needs to be used as a bit offset. Can use CopyBitmap here (which hasn't been properly implemented yet, but the signature is okay)
plus, plus_inplace); | ||
OperatorTest<IntegerArray, UInt32Type, IntegerArray, UInt32Type>().TestInplaceOperator( | ||
plus, plus_inplace); | ||
OperatorTest<IntegerArray, UInt64Type, IntegerArray, UInt64Type>().TestInplaceOperator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess we may want some macros (or other templates) to be able to say "Do this thing for the cartesian product of certain types"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also use type traits to associate a class template (e.g. IntegerArray<Int32Type>
) with just the type Int32Type
} | ||
|
||
template <typename TYPE> | ||
Status NumericArray<TYPE>::EnsureMutableAndCheckChange(bool& changed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Google C++ guide discourages mutable references, preferring bool* changed
auto other_data = other.data(); | ||
if (HasNulls()) { | ||
for (auto ii = 0; ii < length; ++ii, ++this_data, ++other_data) { | ||
if (!IsNull(ii)) { *this_data += *other_data; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say to do the arithmetic always and use byte/word-wise &
with the respective bitmaps (if they are byte-aligned, otherwise do it bitwise) to propagate nulls separately
IntegerArray<TYPE>& operator+=(const IntegerArray<OTHER_TYPE>& other) { | ||
// TODO: check Status and throw exception | ||
EnsureMutable(); | ||
auto length = std::min(this->length_, other.length()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud here. I have been thinking about a slightly different approach with implementing operators in generality, by instead unpacking the containers into simpler data structures, like:
# SFINAE
template <typename T, typename ENABLED = void>
struct TypeData {};
template <typename T>
struct TypeData<T, std::enable_if<std::is_integer<T>, T>::type> {
T* data;
uint8_t* valid_bits;
int64_t length;
int64_t offset;
}
and then have arithmetic methods
template <typename LEFT_TYPE, typename RIGHT_TYPE, typename OUT_TYPE>
void add(TypeData<LEFT_TYPE> left, TypeData<RIGHT_TYPE> right, TypeData<OUT_TYPE> out) {
...
for (int64_t i = 0; i < left.length; ++i) {
out.data[i] = left.data[i] + right.data[i];
}
...
}
In the case of inplace-add, in the operator dispatcher, you would have::
TypeData<LEFT_TYPE> left(...);
TypeData<RIGHT_TYPE> right(...);
add(left, right, left);
You would have to deal with whether to allocate the validity/null bitmap prior to invoking this function, but I think it's probably better to handle the mutability checks / memory allocation / etc. separate from the analytics. Then you can box the output as you wish.
Of course, you need to have a forward declaration someplace
extern template <typename LEFT_TYPE, typename RIGHT_TYPE, typename OUT_TYPE>
void add(TypeData<LEFT_TYPE> left, TypeData<RIGHT_TYPE> right, TypeData<OUT_TYPE> out);
and then instantiate the matrix of types you're choosing to support. Some analytical kernels might need to allocate memory based on what's in the data, and so we could do something like
void kernel_name(OperatorContext* ctx, TypeData<ARG1_TYPE> arg1, ...);
I'd also like to maximize code sharing amongst related operator implementations.
I'm going to take a crack at implementing part of Take
under this regime (see https://github.com/pandas-dev/pandas/blob/master/pandas/src/algos_take_helper.pxi). Will keep thinking about this.
Also, per #46 we are contemplating going bitmaps for all null handling which would make the code for integers and floats identical (outside some type promotion details maybe)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you assume that all numeric types will have a null bitmap, then I think NumericArray in the current code would probably hold something equivalent to what you are suggesting with TypeData.
I'm not following your SFINAE code, but I think you are driving at a design that looks a bit more like this:
template <typename KERNEL,
typename LEFT_TYPE,
typename RIGHT_TYPE,
typename OUT_TYPE,
typename... KERNEL_ARGS>
void RunKernel(TypeData<LEFT_TYPE> const& left,
TypeData<RIGHT_TYPE> const& right,
TypeData<OUT_TYPE>& out,
KERNEL_ARGS&&... kernel_args)
{
KERNEL kernel(std::forward<KERNEL_ARGS>(kernel_args)...);
kernel(left, right, out);
}
class GenericKernel
{
public:
template <typename LEFT_TYPE, typename RIGHT_TYPE, typename OUT_TYPE, typename FUNCTOR>
void Run(const TypeData<LEFT_TYPE>& left,
FUNCTOR& functor,
const TypeData<RIGHT_TYPE>& right,
TypeData<OUT_TYPE>& out)
{
// Do a bitmap OR for nulls first and then do ffs
// and branch
for (auto ii = 0; ii < left.length; ++ii)
{
functor(left.data[ii], right.data[ii], out.data[ii]);
}
}
};
// This would get initialized once at startup and would
// have information about the instruction set available
// on the machine (AVX vs. AVX2)
class SIMDConfiguration
{
};
template <typename LEFT_TYPE, typename RIGHT_TYPE>
class SIMDSinglePrecisionPossible;
template <typename LEFT_TYPE, typename RIGHT_TYPE>
class SIMDSinglePrecisionPossible : public std::false_type
{
};
template <>
class SIMDSinglePrecisionPossible<float, float> : public std::true_type
{
};
template <typename LEFT_TYPE, typename RIGHT_TYPE>
class SIMDDoublePrecisionPossible;
template <typename LEFT_TYPE, typename RIGHT_TYPE>
class SIMDDoublePrecisionPossible : public std::false_type
{
};
template <>
class SIMDDoublePrecisionPossible<double, double> : public std::true_type
{
};
template <typename LEFT_TYPE, typename RIGHT_TYPE>
class SIMDImpossible
{
public:
static constexpr auto value = !(SIMDDoublePrecisionPossible<LEFT_TYPE, RIGHT_TYPE>::value ||
SIMDSinglePrecisionPossible<LEFT_TYPE, RIGHT_TYPE>::value);
};
class AddKernel : public GenericKernel
{
public:
AddKernel(SIMDConfiguration& config) : config_(config)
{
}
template <typename LEFT_TYPE,
typename RIGHT_TYPE,
typename OUT_TYPE>
void operator()(const TypeData<LEFT_TYPE>& left,
const TypeData<RIGHT_TYPE>& right,
TypeData<OUT_TYPE>& out)
{
RunOptimized<LEFT_TYPE, RIGHT_TYPE>(left, right, out);
}
template <typename LEFT_TYPE, typename RIGHT_TYPE, typename OUT_TYPE>
typename std::enable_if<SIMDSinglePrecisionPossible<LEFT_TYPE, RIGHT_TYPE>::value>::type
RunOptimized(const TypeData<LEFT_TYPE>& left,
const TypeData<RIGHT_TYPE>& right,
TypeData<OUT_TYPE>& out)
{
// Use _mm_add_ps or _mm256_add_pd depending on
// config_, fallback to Run
}
template <typename LEFT_TYPE, typename RIGHT_TYPE, typename OUT_TYPE>
typename std::enable_if<SIMDDoublePrecisionPossible<LEFT_TYPE, RIGHT_TYPE>::value>::type
RunOptimized(const TypeData<LEFT_TYPE>& left,
const TypeData<RIGHT_TYPE>& right,
TypeData<OUT_TYPE>& out)
{
// Use _mm_add_pd or _mm256_add_pd depending on
// config_, fallback to Run
}
template <typename LEFT_TYPE, typename RIGHT_TYPE, typename OUT_TYPE>
typename std::enable_if<SIMDImpossible<LEFT_TYPE, RIGHT_TYPE>::value>::type
RunOptimized(const TypeData<LEFT_TYPE>& left,
const TypeData<RIGHT_TYPE>& right,
TypeData<OUT_TYPE>& out)
{
static auto add = [](auto const& left, auto const& right, auto& out)
{
out = left + right;
};
Run(left, add, right, out);
}
private:
SIMDConfiguration& config_;
};
int main(int argc, char* argv[])
{
// Just to see if it compiles
TypeData<float> a;
TypeData<int> b;
TypeData<int> c;
SIMDConfiguration config;
RunKernel<AddKernel>(a, b, c, config);
}
I would expect FloatingArray and IntegerArray to control what operations are allowed using SFINAE, in the same fashion that I've currently implemented it. The main difference would be that those functions would just invoke RunKernel and would setup the correct return type based on the desired rules for promotion of types.
static constexpr auto needs_double = | ||
(max_value > std::numeric_limits<typename FloatType::c_type>::max()); | ||
|
||
using type = typename std::conditional<needs_double, DoubleType, FloatType>::type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In pure Python and NumPy at least, integer division always yields double
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could support integer division with //
, but that's certainly a lower priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, this is just choosing the right size of floating point type based on the inputs.
@@ -74,7 +74,7 @@ endif() | |||
# GCC cannot always verify whether strict aliasing rules are indeed followed due to | |||
# fundamental limitations in escape analysis, which can result in subtle bad code generation. | |||
# This has a small perf hit but worth it to avoid hard to debug crashes. | |||
set(CXX_COMMON_FLAGS "-std=c++11 -fno-strict-aliasing -msse4.2 -Wall -Wno-sign-compare -Wno-deprecated -pthread -D__STDC_FORMAT_MACROS") | |||
set(CXX_COMMON_FLAGS "-std=c++1y -fno-strict-aliasing -msse4.2 -Wall -Wno-sign-compare -Wno-deprecated -pthread -D__STDC_FORMAT_MACROS") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will increase the minimal required GCC version to 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think c++1y
is required by clang, we can do c++14
on gcc 4.9 I believe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing that comes to my mind is that the typical toolchains for packages that work on multiple Linux distros (i.e. manylinux1
wheels and the redhat toolchain in conda) all depend on code compiling with GCC 4.8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redhat devtoolset-3 has gcc 4.9.x, so I imagine by the time pandas2 is ready for primetime this upgrade will have happened. There will be other packages that need C++14, might as well push the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then let's hope for a manylinux2
standard then and enjoy the new C++14 features ;) . Looking forward to std::make_unique
.
bad99a1
to
bcb8a9e
Compare
This includes a design change that obviates the need for an ArrayView. Instead, every array has an internal offset. Shallow copy is achieved by copy constructor, though the current set of copy constructors don't yet support a slice. Deep copy is still achieved through the Copy virtual function. More detailed explanation of the changes: * Adding copy/move constructors for {Floating,Integer,Numeric}Array * Adding various method for marking/getting nulls (valid bits) in integer arrays * Changing data() and mutable_data() in NumericArray so that they return a pointer that starts at the array's offset * Addition of Addable/Divisable classes (similar to Boost operators) for easy support of operator[+/] * Implementing IntegerArray::operator/, IntegerArray::operator+= FloatingArray::operator/=, FloatingArray::operator+= * Adding meta programming class TypeList, which is used as scaffolding in array-test to simplify the declaration of every permutation of arithmetic operation for all types.
bcb8a9e
to
d8d55cf
Compare
There's a bunch of bits here that can be extracted and merged as useful utilities. I want to wait on pulling in operator implementations until the operator interface has been thought through some more: https://docs.google.com/document/d/1YmsV48iO6YNSxCIC84Xig5z-i9g4g7rzYTmCgUAxKRk/edit#heading=h.pzfqkdqgusm7 I'd like to take a crack at implementing some operators in a way that can evaluated either in single- or multi-threaded mode (without excessive code duplication -- so the code that runs things in parallel won't be too knowledgeable about the details of the operator except how invoke their virtual functions |
I cherry-picked some of this as a7f2a82 I'm going to do a little prototyping on operator evaluation generally, so will return to this hopefully soon |
git diff upstream/master | flake8 --diff
This includes a design change that obviates the need for an ArrayView.
Instead, every array has an internal offset. Shallow copy is achieved by copy
constructor, though the current set of copy constructors don't yet
support a slice. Deep copy is still achieved through the Copy virtual
function.
More detailed explanation of the changes:
integer arrays
return a pointer that starts at the array's offset
for easy support of operator[+/]
side types on arithmetic operators
FloatingArray::operator/=, FloatingArray::operator+=