Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-17263 [C++]: Utility functions for working with RLE #13842

Closed
wants to merge 130 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
130 commits
Select commit Hold shift + click to select a range
53323d4
add type-only parts from rle branch
zagto Jul 29, 2022
4eb0b46
handle rle type in ToString for type ids
zagto Jul 20, 2022
b4f94c0
handle RLE in ARROW_GENERATE_FOR_ALL_TYPES
zagto Jul 20, 2022
12df4bc
imlement NotImplemented status for rle in MakeFormatterImpl
zagto Jul 20, 2022
bf68567
add RunLengthEncodedArray class
zagto Jul 27, 2022
b90a5fe
type_fwd: add RunLengthEncodedType
zagto Jul 27, 2022
21259d8
include new array_encoded header in array.h
zagto Jul 27, 2022
2689543
introduce type traits for rle/encoding types
zagto Jul 28, 2022
83754f5
add methods for rle in type visitor abstract classes
zagto Jul 28, 2022
735fa04
split ARROW_GENERATE_FOR_ALL_TYPES
zagto Jul 29, 2022
672b2a5
actually add scalar type visitor
zagto Jul 28, 2022
7091760
add comments
zagto Jul 29, 2022
687b344
Merge branch 'scalar-visitor' into rle-type
zagto Jul 29, 2022
e4d480a
stub rle type in various visitors
zagto Jul 28, 2022
05f7404
more stubs
zagto Jul 28, 2022
f62046b
more stubs
zagto Jul 28, 2022
cf548bc
formatting
zagto Jul 29, 2022
8e21a09
update/stub more visitors
zagto Jul 29, 2022
0a33530
one more visitor
zagto Jul 29, 2022
a59335c
gtest_util: add rle to type list
zagto Jul 29, 2022
7741fe9
remove some unused methods for now
zagto Jul 29, 2022
77b5650
add rle_util
zagto Jul 29, 2022
3fcdddd
update the rle_util functions
zagto Jul 30, 2022
3bfef48
type: RLE only has one buffer
zagto Aug 1, 2022
0f391a4
naming
zagto Aug 1, 2022
8380e79
wip: testing for VisitMergedRuns
zagto Aug 1, 2022
6878b5b
array_encoded: add methods for working with RunLengthEncodedArray
zagto Aug 2, 2022
fe8c4a4
add rle array tests
zagto Aug 2, 2022
adfcac5
make HasValidityBitmap return false for rle
zagto Aug 2, 2022
caf19be
make null count of rle arrays always zero
zagto Aug 2, 2022
0ef0128
handle rle in GetNumBuffers
zagto Aug 2, 2022
d503b98
fix setting offset in RunLengthEncodedArray::Make
zagto Aug 2, 2022
47e186b
fix testing status strings
zagto Aug 2, 2022
a0a053c
Merge branch 'rle-type' into rle-util
zagto Aug 2, 2022
1624e1e
mark constructors as explicit
zagto Aug 2, 2022
b65f2a6
doxygen: add group for encoded arrays
zagto Aug 2, 2022
adc4e31
fix comment
zagto Aug 2, 2022
c94e1dd
wip: test for VisitMergedRuns
zagto Aug 3, 2022
25e9e2d
make shared_ptr reference parameters const
zagto Aug 3, 2022
991541d
Merge branch 'rle-type' into rle-util
zagto Aug 3, 2022
8260b92
actually fix const reference parameters
zagto Aug 3, 2022
4324f72
type_traits: remove misleading bytes_required method for rle
zagto Aug 3, 2022
b318d43
type_internal: stub visitor for rle
zagto Aug 3, 2022
557a058
type: move run ends into child and set children array in constructor
zagto Aug 3, 2022
e0bf74b
update RunLengthEncodedArray class
zagto Aug 3, 2022
2565e2f
pandas: correctly detect rle type as not supported
zagto Aug 3, 2022
0133f2e
Merge branch 'rle-type' into rle-util
zagto Aug 3, 2022
d8623d7
fix copy-paste error in VisitMergedRuns test
zagto Aug 3, 2022
23ca398
remove invalid testcase in FindPhysicalOffset test
zagto Aug 3, 2022
13b49ac
order RunLengthEncodedArray arguments like the child arrays in format
zagto Aug 3, 2022
d81cc30
hopefully fix compiling parquet
zagto Aug 3, 2022
587860f
fix test
zagto Aug 3, 2022
634eafe
fix RunLengthEncodedArray constructor calls
zagto Aug 3, 2022
2ef0f2f
Merge branch 'rle-type' into rle-util
zagto Aug 3, 2022
60527e0
rle_util_test: formatting
zagto Aug 3, 2022
aaab59c
update rle utilitties for new format
zagto Aug 3, 2022
117da6f
rle_util: formatting
zagto Aug 3, 2022
30d7d67
give rle type one buffer since I found examples of code assuming one
zagto Aug 4, 2022
81520bc
formatting
zagto Aug 4, 2022
05e98c6
fix comment
zagto Aug 4, 2022
9121faf
Merge branch 'rle-type' into rle-util
zagto Aug 4, 2022
c78fbf3
rle_util: fixes and make FindPhysicalOffset take element count instre…
zagto Aug 4, 2022
f032fbc
rle_util tests fixed
zagto Aug 4, 2022
9775fb8
stub rle in another visitor in parquet
zagto Aug 4, 2022
438b268
Merge branch 'rle-type' into rle-util
zagto Aug 4, 2022
ffdfb1c
rle_util: add comments
zagto Aug 4, 2022
8bb5828
fix comments
zagto Aug 4, 2022
d2731a0
builder_base: use VisitScalarTypeInline
zagto Aug 4, 2022
5de04c2
Merge branch 'scalar-visitor' into rle-type
zagto Aug 4, 2022
ff3ab7c
remove no longer used visitor stub
zagto Aug 4, 2022
1b646b6
formatting
zagto Aug 4, 2022
0381dd5
Merge branch 'rle-type' into rle-util
zagto Aug 4, 2022
af3e625
Revert "remove no longer used visitor stub"
zagto Aug 4, 2022
be3ef70
Merge branch 'rle-type' into rle-util
zagto Aug 4, 2022
71e7bde
add more user friendly methods to get physical offset and length
zagto Aug 4, 2022
ec1e004
add test
zagto Aug 4, 2022
63b5ce2
fix GetPhysicalLength method
zagto Aug 4, 2022
1d31bdd
better test for physical offset/length
zagto Aug 4, 2022
6f58c4b
VisitMergedRuns test: test inverted case
zagto Aug 5, 2022
2c211c4
VisitMergedRuns: fix handling both arrays ending inside a run
zagto Aug 5, 2022
1b6fc84
implement rle_util visitor variant for a single array
zagto Aug 10, 2022
66f2d81
add merged rle iterator to replace visitor
zagto Aug 10, 2022
dd3eeb3
MergedRunsIterator: use pointer instead of reference_wrapper
zagto Aug 15, 2022
7f37695
fix single input MergedRunsIterator constructor
zagto Aug 15, 2022
fea250e
fix . -> mixup
zagto Aug 15, 2022
b45bd6a
fix rle iterator
zagto Aug 15, 2022
f4b3f8a
fix AddArtificialOffsetInChildArray
zagto Aug 15, 2022
c2dcab4
also test rle iterator on single input array
zagto Aug 15, 2022
fa74062
naming: DataArray -> ValuesArray
zagto Aug 15, 2022
b894269
remove old VisitMergedRuns/VisitRuns functions that are replaced by i…
zagto Aug 15, 2022
d3e5cac
mark rle_util accessors as inline
zagto Aug 15, 2022
f8ee87a
rename rle iterator test
zagto Aug 15, 2022
fd26ec0
Merge branch 'master' into scalar-visitor
zagto Aug 15, 2022
98cf6e4
Merge branch 'scalar-visitor' into rle-type
zagto Aug 15, 2022
d350081
fix rle stub in parquet path_internal
zagto Aug 15, 2022
281a470
Merge branch 'rle-type' into rle-util
zagto Aug 15, 2022
99a264a
fix GetPhysicalLength function
zagto Aug 18, 2022
bbd81f6
byte-swapping RLE arrays should now just work
zagto Aug 24, 2022
5c60fbd
Merge branch 'rle-type' into rle-util
zagto Aug 24, 2022
59f5fa2
fix handling of 0 length arrays at the end of run ends array
zagto Sep 1, 2022
7c46a05
Merge branch 'master' into scalar-visitor
zagto Sep 1, 2022
5933a21
Merge branch 'scalar-visitor' into rle-type
zagto Sep 1, 2022
07c6446
Merge branch 'rle-type' into rle-util
zagto Sep 1, 2022
a35b651
fix diff error message
zagto Sep 6, 2022
f55717d
fix zero-lenght arrays in rle iterator
zagto Sep 7, 2022
5bdc984
rle_util_test: avoid using array span objects beyond thier lifetime
zagto Sep 7, 2022
286fe6c
rle iterator: add more accessor variants
zagto Sep 7, 2022
f6d78d8
formatting
zagto Sep 7, 2022
25bc98b
mark rle diff as not supported correctly
zagto Sep 8, 2022
2eeb0fd
fix rle type construction add test
zagto Sep 13, 2022
3b5ace1
test rle type string
zagto Sep 13, 2022
6733c8f
rle_util_test: fix too big Slice() on NullArray
zagto Sep 13, 2022
0d80e3a
add diagram for MergedRunsInterator test
zagto Sep 13, 2022
8731559
fix typo
zagto Sep 13, 2022
2272661
rle type/array: support non-int32 run ends arrays
zagto Sep 26, 2022
acf5404
Merge branch 'rle-type' into rle-util
zagto Sep 26, 2022
d3716fa
Merge branch 'master' into scalar-visitor
zagto Sep 26, 2022
4a4a6ac
Merge branch 'scalar-visitor' into rle-type
zagto Sep 26, 2022
54db9a8
Merge branch 'rle-type' into rle-util
zagto Sep 26, 2022
bda49e7
type_fwd: C++17 compatibility
zagto Sep 26, 2022
580e548
Merge branch 'rle-type' into rle-util
zagto Sep 26, 2022
90a0814
rle_util: support different types for run ends array
zagto Oct 7, 2022
8c23788
fix run-ends type detection in GetPhysicalOffset and GetPhysicalLength
zagto Oct 7, 2022
8534f88
test mutltple run ends types in rle offset/length test
zagto Oct 7, 2022
306bb09
Merge branch 'master' into scalar-visitor
zagto Nov 29, 2022
3fc809f
Merge branch 'scalar-visitor' into rle-type
zagto Nov 29, 2022
b99c68b
Merge branch 'master' into scalar-visitor
zagto Nov 29, 2022
408f0bb
Merge branch 'scalar-visitor' into rle-type
zagto Nov 29, 2022
64d8790
fix linting error: redundant "virtual"
zagto Dec 1, 2022
150f8e0
Merge branch 'rle-type' into rle-util
zagto Dec 1, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions cpp/src/arrow/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ set(ARROW_SRCS
array/array_binary.cc
array/array_decimal.cc
array/array_dict.cc
array/array_encoded.cc
array/array_nested.cc
array/array_primitive.cc
array/builder_adaptive.cc
Expand Down Expand Up @@ -217,6 +218,7 @@ set(ARROW_SRCS
util/key_value_metadata.cc
util/memory.cc
util/mutex.cc
util/rle_util.cc
util/string.cc
util/string_builder.cc
util/task_group.cc
Expand Down Expand Up @@ -715,6 +717,7 @@ add_arrow_test(array_test
array/array_test.cc
array/array_binary_test.cc
array/array_dict_test.cc
array/array_encoded_test.cc
array/array_list_test.cc
array/array_struct_test.cc
array/array_union_test.cc
Expand Down
5 changes: 5 additions & 0 deletions cpp/src/arrow/array.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,15 @@
/// @{
/// @}

/// \defgroup encoded-arrays Concrete classes for encoded arrays
/// @{
/// @}

#include "arrow/array/array_base.h" // IWYU pragma: keep
#include "arrow/array/array_binary.h" // IWYU pragma: keep
#include "arrow/array/array_decimal.h" // IWYU pragma: keep
#include "arrow/array/array_dict.h" // IWYU pragma: keep
#include "arrow/array/array_encoded.h" // IWYU pragma: keep
#include "arrow/array/array_nested.h" // IWYU pragma: keep
#include "arrow/array/array_primitive.h" // IWYU pragma: keep
#include "arrow/array/data.h" // IWYU pragma: keep
Expand Down
4 changes: 4 additions & 0 deletions cpp/src/arrow/array/array_base.cc
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,10 @@ struct ScalarFromArraySlotImpl {
return Status::OK();
}

Status Visit(const RunLengthEncodedArray& a) {
return Status::NotImplemented("Creating scalar from encoded array");
}

Status Visit(const ExtensionArray& a) {
ARROW_ASSIGN_OR_RAISE(auto storage, a.storage()->GetScalar(index_));
out_ = std::make_shared<ExtensionScalar>(std::move(storage), a.type());
Expand Down
77 changes: 77 additions & 0 deletions cpp/src/arrow/array/array_encoded.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

#include "arrow/array/array_encoded.h"
#include "arrow/array/util.h"
#include "arrow/util/logging.h"
#include "arrow/util/rle_util.h"

namespace arrow {

// ----------------------------------------------------------------------
// RunLengthEncodedArray

RunLengthEncodedArray::RunLengthEncodedArray(const std::shared_ptr<ArrayData>& data) {
ARROW_CHECK_EQ(data->type->id(), Type::RUN_LENGTH_ENCODED);
SetData(data);
}

RunLengthEncodedArray::RunLengthEncodedArray(const std::shared_ptr<DataType>& type,
int64_t length,
const std::shared_ptr<Array>& run_ends_array,
const std::shared_ptr<Array>& values_array,
int64_t offset) {
ARROW_CHECK_EQ(type->id(), Type::RUN_LENGTH_ENCODED);
SetData(ArrayData::Make(type, length, {NULLPTR}, 0, offset));
data_->child_data.push_back(std::move(run_ends_array->data()));
data_->child_data.push_back(std::move(values_array->data()));
}

Result<std::shared_ptr<RunLengthEncodedArray>> RunLengthEncodedArray::Make(
const std::shared_ptr<Array>& run_ends_array,
const std::shared_ptr<Array>& values_array, int64_t logical_length, int64_t offset) {
if (!RunLengthEncodedType::RunEndsTypeValid(*run_ends_array->type())) {
return Status::Invalid("Run ends array must be int16, int32 or int64 type");
}
if (run_ends_array->null_count() != 0) {
return Status::Invalid("Run ends array cannot contain null values");
}

return std::make_shared<RunLengthEncodedArray>(
run_length_encoded(run_ends_array->type(), values_array->type()), logical_length,
run_ends_array, values_array, offset);
}

std::shared_ptr<Array> RunLengthEncodedArray::values_array() const {
return MakeArray(data()->child_data[1]);
}

std::shared_ptr<Array> RunLengthEncodedArray::run_ends_array() const {
return MakeArray(data()->child_data[0]);
}

int64_t RunLengthEncodedArray::GetPhysicalOffset() const {
const ArraySpan span(*this->data_);
return rle_util::GetPhysicalOffset(span);
}

int64_t RunLengthEncodedArray::GetPhysicalLength() const {
const ArraySpan span(*this->data_);
return rle_util::GetPhysicalLength(span);
}

} // namespace arrow
87 changes: 87 additions & 0 deletions cpp/src/arrow/array/array_encoded.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

// Array accessor classes run-length encoded arrays

#pragma once

#include <cstdint>
#include <memory>
#include <string>
#include <utility>
#include <vector>

#include "arrow/array/array_base.h"
#include "arrow/array/data.h"
#include "arrow/result.h"
#include "arrow/status.h"
#include "arrow/type.h"
#include "arrow/type_fwd.h"
#include "arrow/util/checked_cast.h"
#include "arrow/util/macros.h"
#include "arrow/util/visibility.h"

namespace arrow {

/// \addtogroup encoded-arrays
///
/// @{

// ----------------------------------------------------------------------
// RunLengthEncoded

/// Concrete Array class for run-length encoded data
class ARROW_EXPORT RunLengthEncodedArray : public Array {
public:
using TypeClass = RunLengthEncodedType;

explicit RunLengthEncodedArray(const std::shared_ptr<ArrayData>& data);

RunLengthEncodedArray(const std::shared_ptr<DataType>& type, int64_t length,
const std::shared_ptr<Array>& run_ends_array,
const std::shared_ptr<Array>& values_array, int64_t offset = 0);

/// \brief Construct a RunLengthEncodedArray from values and run ends arrays
///
/// The data type is automatically inferred from the arguments.
/// The run_ends_array and values_array must be the same length.
static Result<std::shared_ptr<RunLengthEncodedArray>> Make(
const std::shared_ptr<Array>& run_ends_array,
const std::shared_ptr<Array>& values_array, int64_t logical_length,
int64_t offset = 0);

/// \brief Returns an array holding the values of each run. This function does apply the
/// physical offset to the array
std::shared_ptr<Array> values_array() const;

/// \brief Returns an array holding the logical indexes of each run end. This function
/// does apply the physical offset to the array
std::shared_ptr<Array> run_ends_array() const;

/// \brief Get the physical offset of the RLE array. Warning: calling this may result in
/// in an O(log(N)) binary search on the run ends buffer
int64_t GetPhysicalOffset() const;

/// \brief Get the physical offset of the RLE array. Avoid calling this method in a
/// context where you can easily calculate the value yourself. Calling this can result
/// in an O(log(N)) binary search on the run ends buffer
int64_t GetPhysicalLength() const;
};

/// @}

} // namespace arrow
145 changes: 145 additions & 0 deletions cpp/src/arrow/array/array_encoded_test.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

#include <gtest/gtest.h>

#include <cstdint>
#include <cstring>
#include <memory>
#include <vector>

#include "arrow/array.h"
#include "arrow/array/builder_nested.h"
#include "arrow/chunked_array.h"
#include "arrow/status.h"
#include "arrow/testing/builder.h"
#include "arrow/testing/gtest_util.h"
#include "arrow/type.h"
#include "arrow/util/checked_cast.h"

namespace arrow {

using internal::checked_cast;

// ----------------------------------------------------------------------
// Run-length encoded array tests

namespace {

class TestRunLengthEncodedArray
: public ::testing::TestWithParam<std::shared_ptr<DataType>> {
protected:
std::shared_ptr<DataType> run_ends_type;
std::shared_ptr<Array> string_values;
std::shared_ptr<Array> int32_values;
std::shared_ptr<Array> int16_values;
std::shared_ptr<Array> size_values;
std::shared_ptr<Array> size_only_null;

void SetUp() override {
run_ends_type = GetParam();
std::shared_ptr<DataType> run_ends_type = GetParam();

string_values = ArrayFromJSON(utf8(), R"(["Hello", "World", null])");
int32_values = ArrayFromJSON(int32(), "[10, 20, 30]");
int16_values = ArrayFromJSON(int16(), "[10, 20, 30]");
size_values = ArrayFromJSON(run_ends_type, "[10, 20, 30]");
size_only_null = ArrayFromJSON(run_ends_type, "[null, null, null]");
}
};

TEST_P(TestRunLengthEncodedArray, MakeArray) {
ASSERT_OK_AND_ASSIGN(auto rle_array,
RunLengthEncodedArray::Make(int32_values, string_values, 3));
auto array_data = rle_array->data();
auto new_array = MakeArray(array_data);
ASSERT_ARRAYS_EQUAL(*new_array, *rle_array);
// should be the exact same ArrayData object
ASSERT_EQ(new_array->data(), array_data);
ASSERT_NE(std::dynamic_pointer_cast<RunLengthEncodedArray>(new_array), NULLPTR);
}

TEST_P(TestRunLengthEncodedArray, FromRunEndsAndValues) {
std::shared_ptr<RunLengthEncodedArray> rle_array;

ASSERT_OK_AND_ASSIGN(rle_array,
RunLengthEncodedArray::Make(size_values, int32_values, 3));
ASSERT_EQ(rle_array->length(), 3);
ASSERT_ARRAYS_EQUAL(*rle_array->values_array(), *int32_values);
ASSERT_ARRAYS_EQUAL(*rle_array->run_ends_array(), *size_values);
ASSERT_EQ(rle_array->offset(), 0);
ASSERT_EQ(rle_array->data()->null_count, 0);
// one dummy buffer, since code may assume there is exactly one buffer
ASSERT_EQ(rle_array->data()->buffers.size(), 1);

// explicitly passing offset
ASSERT_OK_AND_ASSIGN(rle_array,
RunLengthEncodedArray::Make(size_values, string_values, 2, 1));
ASSERT_EQ(rle_array->length(), 2);
ASSERT_ARRAYS_EQUAL(*rle_array->values_array(), *string_values);
ASSERT_ARRAYS_EQUAL(*rle_array->run_ends_array(), *size_values);
ASSERT_EQ(rle_array->offset(), 1);
// explicitly access null count variable so it is not calculated automatically
ASSERT_EQ(rle_array->data()->null_count, 0);

ASSERT_RAISES_WITH_MESSAGE(Invalid,
"Invalid: Run ends array must be int16, int32 or int64 type",
RunLengthEncodedArray::Make(string_values, int32_values, 3));
ASSERT_RAISES_WITH_MESSAGE(
Invalid, "Invalid: Run ends array cannot contain null values",
RunLengthEncodedArray::Make(size_only_null, int32_values, 3));
}

TEST_P(TestRunLengthEncodedArray, OffsetLength) {
auto run_ends = ArrayFromJSON(run_ends_type, "[100, 200, 300, 400, 500]");
auto values = ArrayFromJSON(utf8(), R"(["Hello", "beautiful", "world", "of", "RLE"])");
ASSERT_OK_AND_ASSIGN(auto rle_array,
RunLengthEncodedArray::Make(run_ends, values, 500));

ASSERT_EQ(rle_array->GetPhysicalLength(), 5);
ASSERT_EQ(rle_array->GetPhysicalOffset(), 0);

auto slice = std::dynamic_pointer_cast<RunLengthEncodedArray>(rle_array->Slice(199, 5));
ASSERT_EQ(slice->GetPhysicalLength(), 2);
ASSERT_EQ(slice->GetPhysicalOffset(), 1);

auto slice2 =
std::dynamic_pointer_cast<RunLengthEncodedArray>(rle_array->Slice(199, 101));
ASSERT_EQ(slice2->GetPhysicalLength(), 2);
ASSERT_EQ(slice2->GetPhysicalOffset(), 1);

auto slice3 =
std::dynamic_pointer_cast<RunLengthEncodedArray>(rle_array->Slice(400, 100));
ASSERT_EQ(slice3->GetPhysicalLength(), 1);
ASSERT_EQ(slice3->GetPhysicalOffset(), 4);

auto slice4 =
std::dynamic_pointer_cast<RunLengthEncodedArray>(rle_array->Slice(0, 150));
ASSERT_EQ(slice4->GetPhysicalLength(), 2);
ASSERT_EQ(slice4->GetPhysicalOffset(), 0);

auto zero_length_at_end =
std::dynamic_pointer_cast<RunLengthEncodedArray>(rle_array->Slice(500, 0));
ASSERT_EQ(zero_length_at_end->GetPhysicalLength(), 0);
ASSERT_EQ(zero_length_at_end->GetPhysicalOffset(), 5);
}

INSTANTIATE_TEST_SUITE_P(EncodedArrayTests, TestRunLengthEncodedArray,
::testing::Values(int16(), int32(), int64()));
} // anonymous namespace

} // namespace arrow
2 changes: 1 addition & 1 deletion cpp/src/arrow/array/builder_base.cc
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ struct AppendScalarImpl {
return Status::NotImplemented("AppendScalar for type ", type);
}

Status Convert() { return VisitTypeInline(*(*scalars_begin_)->type, this); }
Status Convert() { return VisitScalarTypeInline(*(*scalars_begin_)->type, this); }

const std::shared_ptr<Scalar>* scalars_begin_;
const std::shared_ptr<Scalar>* scalars_end_;
Expand Down
4 changes: 4 additions & 0 deletions cpp/src/arrow/array/concatenate.cc
Original file line number Diff line number Diff line change
Expand Up @@ -436,6 +436,10 @@ class ConcatenateImpl {
return Status::OK();
}

Status Visit(const RunLengthEncodedType& type) {
return Status::NotImplemented("concatenation of ", type);
}

Status Visit(const ExtensionType& e) {
// XXX can we just concatenate their storage?
return Status::NotImplemented("concatenation of ", e);
Expand Down
1 change: 1 addition & 0 deletions cpp/src/arrow/array/data.cc
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,7 @@ int GetNumBuffers(const DataType& type) {
case Type::NA:
case Type::STRUCT:
case Type::FIXED_SIZE_LIST:
case Type::RUN_LENGTH_ENCODED:
return 1;
case Type::BINARY:
case Type::LARGE_BINARY:
Expand Down
Loading