Skip to content
This repository has been archived by the owner on Apr 10, 2024. It is now read-only.

Adding operator+[=] and operator/[=] for FloatingArray and IntegerArray. #58

Closed
wants to merge 7 commits into from

Conversation

joshuastorck
Copy link

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master | flake8 --diff
  • whatsnew entry

This includes a design change that obviates the need for an ArrayView.
Instead, every array has an internal offset. Shallow copy is achieved by copy
constructor, though the current set of copy constructors don't yet
support a slice. Deep copy is still achieved through the Copy virtual
function.

More detailed explanation of the changes:

  • Adding copy/move constructors for {Floating,Integer,Numeric}Array
  • Adding various method for marking/getting nulls (valid bits) in
    integer arrays
  • Changing data() and mutable_data() in NumericArray so that they
    return a pointer that starts at the array's offset
  • Addition of Addable/Divisable classes (similar to Boost operators)
    for easy support of operator[+/]
  • Unit test scaffolding for testing permutations of left/right hand
    side types on arithmetic operators
  • Implementing IntegerArray::operator/, IntegerArray::operator+=
    FloatingArray::operator/=, FloatingArray::operator+=

wesm and others added 6 commits October 12, 2016 15:07
…nippets and cmake modules

* Build native.pyx stub and libpandas.so
* In place cmake build + rpath on linux, at least
* Library scaffolding and build infrastructure for libpandas native core
  experiments to evaluate replacing Series and DataFrame internals and custom
  dtypes with a consistent set of wrapper types, array, and table data
  structures. Incorporates several bits of BSD / MIT / Apache-licensed code and
  various copyright notices have been added to LICENSES.
* Do not include pandas/
* Array API stubs
* Import bunch of utility code from Arrow, refactoring
Please leave code comments on Gerrit:
https://review.gerrithub.io/#/c/298035/

Author: Wes McKinney <[email protected]>

Closes wesm#40 from wesm/array-views and squashes the following commits:

27a194b [Wes McKinney] Change merge-pr script to target pandas-dev/pandas2
ea2017e [Wes McKinney] * Prototype array view interface * Buffer slicing and copying * Test ArrayView constructors and copy/move assignment * Test ArrayView::Slice, EnsureMutable, and some array attrs

Change-Id: I1f942ec1160590c33b68a12eb88fe923dd135911
Help keep things clean. This style is based on LLVM and Google C++
styles -- we can always tweak a bit later.

Author: Wes McKinney <[email protected]>

Closes wesm#48 from wesm/clang-format and squashes the following commits:

eed45dd [Wes McKinney] Built "make format" target format target using clang-format based on arrow-cpp and clean up existing codebase

Change-Id: Id697c3805d18107ad7e20e9b81ba09350d7cd219
…or reporting, bit manipulation) with libarrow

Closes wesm#44. This will spare us developing similar functionality in two
places for the time being -- we can keep an eye on the relationship
between the libraries in case it does make sense to split in the
future (but it may not, particularly if we use Arrow to implement
nested data natively in pandas).

Author: Wes McKinney <[email protected]>

Closes wesm#49 from wesm/link-libarrow and squashes the following commits:

1d176b1 [Wes McKinney] Refactor libpandas to use common constructs and utility code from libarrow: Status, MemoryPool, Buffer, etc.

Change-Id: I3140a3ed8dd17db376be21241fc259523a16cc88
For test automation for now. This won't pass until the libarrow
artifacts are updated in conda-forge    Closes wesm#41

Author: Wes McKinney <[email protected]>

Closes wesm#50 from wesm/circle-ci and squashes the following commits:

c68b56b [Wes McKinney] Skip Cython extensions for now
e2bdb62 [Wes McKinney] Add external project include, google version numbers
5d4a536 [Wes McKinney] conda create fix
5986e72 [Wes McKinney] conda create with env path
af44d4d [Wes McKinney] Docker mysteries
1391c87 [Wes McKinney] Tweaks, not sure why failing
3fc6219 [Wes McKinney] Add basic Circle CI setup

Change-Id: I2fefc2548d5e60a8e6f9ea39fa57efb9cd0b2e79
* Changing PrimitiveType to NumericType  * Removing the macro for
declaring sub-types of NumericType and    instead using template
arguments  * Addding a static SINGLETON member to the NumericType base
class    instead of the macros for methods for creating singletons for
each numeric type  * Removing NullType's inheritance of PrimitiveType
* Removing intermediate base class between {Integer,Floating}ArrayImpl
and PrimitiveType and just making a concrete template based    class
{Integer,Floating}Array  * Changing FLOAT and DOUBLE to FLOAT32 and
FLOAT64

Author: Joshua Storck <[email protected]>

Closes wesm#55 from joshuastorck/pandas-2.0 and squashes the following commits:

1614d44 [Joshua Storck] * Fixing native.p{xd,yx} so that it matches the latest C++ code * Changing PrimitiveType to NumericType * Removing the macro for declaring sub-types of NumericType and   instead using template arguments * Addding a static SINGLETON member to the NumericType base class   instead of the macros for methods for creating singletons for   each numeric type * Removing NullType's inheritance of PrimitiveType * Removing intermediate base class between {Integer,Floating}ArrayImpl   and PrimitiveType and just making a concrete template based   class {Integer,Floating}Array * Changing FLOAT and DOUBLE to FLOAT32 and FLOAT64
auto other_data = other.data();
if (HasNulls()) {
for (auto ii = 0; ii < length; ++ii, ++this_data, ++other_data) {
if (!IsNull(ii)) { *this_data += *other_data; }
Copy link

@shoyer shoyer Oct 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check may or may not be a good idea from a performance perspective. Are you sure the conditional check is cheaper than just doing the arithmetic always? Even when there are only a few null values?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say to do the arithmetic always and use byte/word-wise & with the respective bitmaps (if they are byte-aligned, otherwise do it bitwise) to propagate nulls separately

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would have been my preference. It just wasn't clear from the existing code if floating point arrays intentionally didn't have bitmaps.

// Copy the list of nulls (aka valid bits) to a buffer
Status CopyNulls(int64_t length, std::shared_ptr<Buffer>* out) const {
// Pre-condition is that we already have valid_bits_ set
return valid_bits_->Copy(this->offset_, BitUtil::BytesForBits(length), out);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offset needs to be used as a bit offset. Can use CopyBitmap here (which hasn't been properly implemented yet, but the signature is okay)

plus, plus_inplace);
OperatorTest<IntegerArray, UInt32Type, IntegerArray, UInt32Type>().TestInplaceOperator(
plus, plus_inplace);
OperatorTest<IntegerArray, UInt64Type, IntegerArray, UInt64Type>().TestInplaceOperator(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess we may want some macros (or other templates) to be able to say "Do this thing for the cartesian product of certain types"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also use type traits to associate a class template (e.g. IntegerArray<Int32Type>) with just the type Int32Type

}

template <typename TYPE>
Status NumericArray<TYPE>::EnsureMutableAndCheckChange(bool& changed) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google C++ guide discourages mutable references, preferring bool* changed

auto other_data = other.data();
if (HasNulls()) {
for (auto ii = 0; ii < length; ++ii, ++this_data, ++other_data) {
if (!IsNull(ii)) { *this_data += *other_data; }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say to do the arithmetic always and use byte/word-wise & with the respective bitmaps (if they are byte-aligned, otherwise do it bitwise) to propagate nulls separately

IntegerArray<TYPE>& operator+=(const IntegerArray<OTHER_TYPE>& other) {
// TODO: check Status and throw exception
EnsureMutable();
auto length = std::min(this->length_, other.length());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud here. I have been thinking about a slightly different approach with implementing operators in generality, by instead unpacking the containers into simpler data structures, like:

# SFINAE
template <typename T, typename ENABLED = void>
struct TypeData {};

template <typename T>
struct TypeData<T, std::enable_if<std::is_integer<T>, T>::type> {
  T* data;
  uint8_t* valid_bits;
  int64_t length;
  int64_t offset;
}

and then have arithmetic methods

template <typename LEFT_TYPE, typename RIGHT_TYPE, typename OUT_TYPE>
void add(TypeData<LEFT_TYPE> left, TypeData<RIGHT_TYPE> right, TypeData<OUT_TYPE> out) {
  ...
  for (int64_t i = 0; i < left.length; ++i) {
    out.data[i] = left.data[i] + right.data[i];
  } 
  ...
}

In the case of inplace-add, in the operator dispatcher, you would have::

TypeData<LEFT_TYPE> left(...);
TypeData<RIGHT_TYPE> right(...);
add(left, right, left);

You would have to deal with whether to allocate the validity/null bitmap prior to invoking this function, but I think it's probably better to handle the mutability checks / memory allocation / etc. separate from the analytics. Then you can box the output as you wish.

Of course, you need to have a forward declaration someplace

extern template <typename LEFT_TYPE, typename RIGHT_TYPE, typename OUT_TYPE>
void add(TypeData<LEFT_TYPE> left, TypeData<RIGHT_TYPE> right, TypeData<OUT_TYPE> out);

and then instantiate the matrix of types you're choosing to support. Some analytical kernels might need to allocate memory based on what's in the data, and so we could do something like

void kernel_name(OperatorContext* ctx, TypeData<ARG1_TYPE> arg1, ...);

I'd also like to maximize code sharing amongst related operator implementations.

I'm going to take a crack at implementing part of Take under this regime (see https://github.com/pandas-dev/pandas/blob/master/pandas/src/algos_take_helper.pxi). Will keep thinking about this.

Also, per #46 we are contemplating going bitmaps for all null handling which would make the code for integers and floats identical (outside some type promotion details maybe)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you assume that all numeric types will have a null bitmap, then I think NumericArray in the current code would probably hold something equivalent to what you are suggesting with TypeData.

I'm not following your SFINAE code, but I think you are driving at a design that looks a bit more like this:

template <typename KERNEL,
          typename LEFT_TYPE, 
          typename RIGHT_TYPE, 
          typename OUT_TYPE,  
          typename... KERNEL_ARGS>
void RunKernel(TypeData<LEFT_TYPE> const& left, 
               TypeData<RIGHT_TYPE> const& right,
               TypeData<OUT_TYPE>& out,                    
               KERNEL_ARGS&&... kernel_args)
{
    KERNEL kernel(std::forward<KERNEL_ARGS>(kernel_args)...);
    kernel(left, right, out);
}

class GenericKernel
{

public:

    template <typename LEFT_TYPE, typename RIGHT_TYPE, typename OUT_TYPE, typename FUNCTOR>
    void Run(const TypeData<LEFT_TYPE>& left, 
             FUNCTOR& functor,
             const TypeData<RIGHT_TYPE>& right,
             TypeData<OUT_TYPE>& out)
    {
        // Do a bitmap OR for nulls first and then do ffs
        // and branch 
        for (auto ii = 0; ii < left.length; ++ii)
        {
            functor(left.data[ii], right.data[ii], out.data[ii]);
        }
    }    

};

// This would get initialized once at startup and would 
// have information about the instruction set available
// on the machine (AVX vs. AVX2)
class SIMDConfiguration
{
};

template <typename LEFT_TYPE, typename RIGHT_TYPE>
class SIMDSinglePrecisionPossible;

template <typename LEFT_TYPE, typename RIGHT_TYPE>
class SIMDSinglePrecisionPossible : public std::false_type
{
};

template <>
class SIMDSinglePrecisionPossible<float, float> : public std::true_type
{
};

template <typename LEFT_TYPE, typename RIGHT_TYPE>
class SIMDDoublePrecisionPossible;

template <typename LEFT_TYPE, typename RIGHT_TYPE>
class SIMDDoublePrecisionPossible : public std::false_type
{
};

template <>
class SIMDDoublePrecisionPossible<double, double> : public std::true_type
{
};

template <typename LEFT_TYPE, typename RIGHT_TYPE>
class SIMDImpossible 
{
public:
    static constexpr auto value = !(SIMDDoublePrecisionPossible<LEFT_TYPE, RIGHT_TYPE>::value ||
                                    SIMDSinglePrecisionPossible<LEFT_TYPE, RIGHT_TYPE>::value);
};


class AddKernel : public GenericKernel
{

public:

    AddKernel(SIMDConfiguration& config) : config_(config)
    {
    }

    template <typename LEFT_TYPE,
              typename RIGHT_TYPE,
              typename OUT_TYPE>
    void operator()(const TypeData<LEFT_TYPE>& left, 
                    const TypeData<RIGHT_TYPE>& right,
                    TypeData<OUT_TYPE>& out)
    {
        RunOptimized<LEFT_TYPE, RIGHT_TYPE>(left, right, out);
    }

    template <typename LEFT_TYPE, typename RIGHT_TYPE, typename OUT_TYPE>
    typename std::enable_if<SIMDSinglePrecisionPossible<LEFT_TYPE, RIGHT_TYPE>::value>::type
    RunOptimized(const TypeData<LEFT_TYPE>& left, 
                 const TypeData<RIGHT_TYPE>& right,
                 TypeData<OUT_TYPE>& out)
    {
        // Use _mm_add_ps or _mm256_add_pd depending on 
        // config_, fallback to Run
    }

    template <typename LEFT_TYPE, typename RIGHT_TYPE, typename OUT_TYPE>
    typename std::enable_if<SIMDDoublePrecisionPossible<LEFT_TYPE, RIGHT_TYPE>::value>::type
    RunOptimized(const TypeData<LEFT_TYPE>& left, 
                 const TypeData<RIGHT_TYPE>& right,
                 TypeData<OUT_TYPE>& out)
    {
        // Use _mm_add_pd or _mm256_add_pd depending on 
        // config_, fallback to Run
    }

    template <typename LEFT_TYPE, typename RIGHT_TYPE, typename OUT_TYPE>
    typename std::enable_if<SIMDImpossible<LEFT_TYPE, RIGHT_TYPE>::value>::type
    RunOptimized(const TypeData<LEFT_TYPE>& left, 
                 const TypeData<RIGHT_TYPE>& right,
                 TypeData<OUT_TYPE>& out)
    {
        static auto add = [](auto const& left, auto const& right, auto& out)
            {
            out = left + right;
            };
        Run(left, add, right, out);
    }


private:

    SIMDConfiguration& config_;

};

int main(int argc, char* argv[])
{
    // Just to see if it compiles
    TypeData<float> a;
    TypeData<int> b;
    TypeData<int> c;
    SIMDConfiguration config;
    RunKernel<AddKernel>(a, b, c, config);
}

I would expect FloatingArray and IntegerArray to control what operations are allowed using SFINAE, in the same fashion that I've currently implemented it. The main difference would be that those functions would just invoke RunKernel and would setup the correct return type based on the desired rules for promotion of types.

static constexpr auto needs_double =
(max_value > std::numeric_limits<typename FloatType::c_type>::max());

using type = typename std::conditional<needs_double, DoubleType, FloatType>::type;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pure Python and NumPy at least, integer division always yields double

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could support integer division with //, but that's certainly a lower priority.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, this is just choosing the right size of floating point type based on the inputs.

@@ -74,7 +74,7 @@ endif()
# GCC cannot always verify whether strict aliasing rules are indeed followed due to
# fundamental limitations in escape analysis, which can result in subtle bad code generation.
# This has a small perf hit but worth it to avoid hard to debug crashes.
set(CXX_COMMON_FLAGS "-std=c++11 -fno-strict-aliasing -msse4.2 -Wall -Wno-sign-compare -Wno-deprecated -pthread -D__STDC_FORMAT_MACROS")
set(CXX_COMMON_FLAGS "-std=c++1y -fno-strict-aliasing -msse4.2 -Wall -Wno-sign-compare -Wno-deprecated -pthread -D__STDC_FORMAT_MACROS")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will increase the minimal required GCC version to 5?

Copy link
Owner

@wesm wesm Oct 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think c++1y is required by clang, we can do c++14 on gcc 4.9 I believe

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing that comes to my mind is that the typical toolchains for packages that work on multiple Linux distros (i.e. manylinux1 wheels and the redhat toolchain in conda) all depend on code compiling with GCC 4.8.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redhat devtoolset-3 has gcc 4.9.x, so I imagine by the time pandas2 is ready for primetime this upgrade will have happened. There will be other packages that need C++14, might as well push the issue

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then let's hope for a manylinux2 standard then and enjoy the new C++14 features ;) . Looking forward to std::make_unique.

This includes a design change that obviates the need for an ArrayView.
Instead, every array has an internal offset. Shallow copy is achieved by copy
constructor, though the current set of copy constructors don't yet
support a slice. Deep copy is still achieved through the Copy virtual
function.

More detailed explanation of the changes:

* Adding copy/move constructors for {Floating,Integer,Numeric}Array
* Adding various method for marking/getting nulls (valid bits) in
  integer arrays
* Changing data() and mutable_data() in NumericArray so that they
  return a pointer that starts at the array's offset
* Addition of Addable/Divisable classes (similar to Boost operators)
  for easy support of operator[+/]
* Implementing IntegerArray::operator/, IntegerArray::operator+=
  FloatingArray::operator/=, FloatingArray::operator+=
* Adding meta programming class TypeList, which is
  used as scaffolding in array-test to simplify the
  declaration of every permutation of arithmetic operation for all types.
@wesm
Copy link
Owner

wesm commented Dec 10, 2016

There's a bunch of bits here that can be extracted and merged as useful utilities. I want to wait on pulling in operator implementations until the operator interface has been thought through some more: https://docs.google.com/document/d/1YmsV48iO6YNSxCIC84Xig5z-i9g4g7rzYTmCgUAxKRk/edit#heading=h.pzfqkdqgusm7

I'd like to take a crack at implementing some operators in a way that can evaluated either in single- or multi-threaded mode (without excessive code duplication -- so the code that runs things in parallel won't be too knowledgeable about the details of the operator except how invoke their virtual functions

@wesm
Copy link
Owner

wesm commented Dec 13, 2016

I cherry-picked some of this as a7f2a82

I'm going to do a little prototyping on operator evaluation generally, so will return to this hopefully soon

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants