Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement string list concatenation #7929

Merged
merged 58 commits into from
Apr 26, 2021
Merged
Changes from 1 commit
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
7d4ac5a
Re-organize function declarations, and add new declarations
ttnghia Apr 7, 2021
da41c76
Add a skeleton implementation for the new `concatenate` APIs
ttnghia Apr 7, 2021
208dd51
Change docs
ttnghia Apr 7, 2021
ff6dc6d
Add conditions for checking the parameter validity
ttnghia Apr 7, 2021
e43fe3f
Rename variable
ttnghia Apr 8, 2021
51dcbdd
Implement a function that computes row size of the output strings column
ttnghia Apr 8, 2021
94fd3ea
Finish a draft for the concatenate API
ttnghia Apr 9, 2021
b4aa581
Cleanup `combine.cu`
ttnghia Apr 9, 2021
465b821
Add one test for StringsListsConcatenateTest
ttnghia Apr 9, 2021
13a352a
Merge remote-tracking branch 'origin/branch-0.20' into concat_ws
ttnghia Apr 9, 2021
a31e5f4
Finish ScalarSeparator test
ttnghia Apr 9, 2021
4a46292
Finish SlicedListsWithScalarSeparator test
ttnghia Apr 9, 2021
b61f384
Rewrite InvalidInput test
ttnghia Apr 9, 2021
0b56cdf
Rewrite EmptyInput, ZeroSizeStringsInput, and AllNullsStringsInput tests
ttnghia Apr 9, 2021
b3260ff
Finish ColumnSeparators test
ttnghia Apr 9, 2021
c3b4ecc
Finish SlicedListsWithColumnSeparators test
ttnghia Apr 9, 2021
5b948a4
Rename variables
ttnghia Apr 9, 2021
d1063a9
Fix InvalidInput test
ttnghia Apr 9, 2021
625407c
Fix ZeroSizeStringsInput test
ttnghia Apr 9, 2021
95dcdc2
Fix AllNullsStringsInput test
ttnghia Apr 9, 2021
5dc9fa4
Implement string lists concatenation with scalar separator
ttnghia Apr 9, 2021
ce6868f
Cleanup string lists concatenation functions
ttnghia Apr 9, 2021
227efde
Fix output string size computation
ttnghia Apr 9, 2021
584ea0f
Fix child accessing for lists of strings column
ttnghia Apr 9, 2021
4b235f5
Fix slice indices for tests
ttnghia Apr 9, 2021
2e4e7b4
Fix tests for sliced input column
ttnghia Apr 9, 2021
7c16e27
Fix ClangFormat style
ttnghia Apr 9, 2021
b67e20e
Add comments
ttnghia Apr 9, 2021
468a981
Rename APIs
ttnghia Apr 10, 2021
4f65010
Cleanup and fix ClangFormat style
ttnghia Apr 10, 2021
ebf2e03
Fix ClangFormat style
ttnghia Apr 12, 2021
5f79043
Merge remote-tracking branch 'origin/branch-0.20' into concat_ws
ttnghia Apr 20, 2021
cd6eb88
Remove redundant headers
ttnghia Apr 20, 2021
3df31cd
Resolve merge conflict with branch 0.20
ttnghia Apr 20, 2021
d0eee9e
Add `make_strings_children_with_null_mask` utility function
ttnghia Apr 20, 2021
a2496db
Simplify code by using the new utility function `make_strings_childre…
ttnghia Apr 20, 2021
6715b1d
Remove null_mask if the column does not have any null element
ttnghia Apr 20, 2021
17a3e9a
Revert "Remove null_mask if the column does not have any null element"
ttnghia Apr 20, 2021
097c788
Fix string concatenation tests
ttnghia Apr 20, 2021
391735f
Fix the return null_mask: if null_count is 0 then return an empty buffer
ttnghia Apr 20, 2021
3bdd92a
Re-organize code
ttnghia Apr 20, 2021
628e541
Complete `concatenate_list_elements`
ttnghia Apr 20, 2021
ced89bc
Reorder cmake file list
ttnghia Apr 20, 2021
aab44e0
Update comments
ttnghia Apr 20, 2021
2282010
Update comment
ttnghia Apr 20, 2021
22cd1d3
Reverse changes to `concatenate.cu` and reverse fixes for `combine_te…
ttnghia Apr 20, 2021
3d03bee
Fix ClangFormat style
ttnghia Apr 20, 2021
19a2819
Extract `strings/combine_tests.cpp` into 3 separate cpp files
ttnghia Apr 21, 2021
3cf2d10
Use an additional array of int8_t type to store the validity of the s…
ttnghia Apr 21, 2021
5fe852a
Avoid calling `for_each_fn` the second time if the output chars colum…
ttnghia Apr 21, 2021
691cceb
Refactor functors to remove duplicate code
ttnghia Apr 21, 2021
d1931a4
Fix ClangFormat style
ttnghia Apr 21, 2021
f883278
Rename variable
ttnghia Apr 21, 2021
b33fd2a
Change the print parameter in unit tests
ttnghia Apr 22, 2021
76ba7d1
Rewrite comments, and remove `thrust::uninitialized_fill` for validit…
ttnghia Apr 22, 2021
647ca8f
Fix copyright header and address review comments
ttnghia Apr 22, 2021
0e52459
Add a parameter `exec_size` to allow executing the functor at a diffe…
ttnghia Apr 23, 2021
cbee766
Minor improvement
ttnghia Apr 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Finish ScalarSeparator test
ttnghia committed Apr 9, 2021

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit a31e5f464738a853022cda7cb919775c10272194
170 changes: 157 additions & 13 deletions cpp/tests/strings/combine_tests.cpp
Original file line number Diff line number Diff line change
@@ -502,29 +502,173 @@ TEST_F(StringsConcatenateWithColSeparatorTest, MultiColumnNonNullableStrings)

struct StringsListsConcatenateTest : public cudf::test::BaseFixture {
};
using STRING_LISTS = cudf::test::lists_column_wrapper<cudf::string_view>;
using INT_LISTS = cudf::test::lists_column_wrapper<int32_t>;

namespace {
using STR_LISTS = cudf::test::lists_column_wrapper<cudf::string_view>;
using STR_COL = cudf::test::strings_column_wrapper;
using INT_LISTS = cudf::test::lists_column_wrapper<int32_t>;

constexpr bool print_all{true};

auto null_at(cudf::size_type idx)
{
return cudf::detail::make_counting_transform_iterator(0, [idx](auto i) { return i != idx; });
}

auto all_nulls()
{
return cudf::detail::make_counting_transform_iterator(0, [](auto) { return false; });
}

auto nulls_from_nullptr(std::vector<const char*> const& strs)
{
return thrust::make_transform_iterator(strs.begin(), [](auto ptr) { return ptr != nullptr; });
}

} // namespace

TEST_F(StringsListsConcatenateTest, InvalidInput)
{
auto const l = INT_LISTS{{1, 2, 3}, {4, 5, 6}}.release();
EXPECT_THROW(cudf::strings::concatenate(cudf::lists_column_view(l->view())), cudf::logic_error);
// Invalid list type
{
auto const l = INT_LISTS{{1, 2, 3}, {4, 5, 6}}.release();
EXPECT_THROW(cudf::strings::concatenate(cudf::lists_column_view(l->view())), cudf::logic_error);
}

// Invalid separator
{
auto const l = STR_LISTS{STR_LISTS{""}, STR_LISTS{"", "", ""}, STR_LISTS{"", ""}}.release();
auto const lv = cudf::lists_column_view(l->view());
EXPECT_THROW(cudf::strings::concatenate(cudf::lists_column_view(l->view()),
cudf::string_scalar("", false)),
cudf::logic_error);
}
}

TEST_F(StringsListsConcatenateTest, EmptyInput) {}
TEST_F(StringsListsConcatenateTest, EmptyInput)
{
auto const l = STR_LISTS{}.release();
auto const lv = cudf::lists_column_view(l->view());
auto const results = cudf::strings::concatenate(lv);
auto const expected = STR_COL{};
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*results, expected, print_all);
}

TEST_F(StringsListsConcatenateTest, ZeroSizeStringsInput) {}
TEST_F(StringsListsConcatenateTest, ZeroSizeStringsInput)
{
auto const l = STR_LISTS{STR_LISTS{""}, STR_LISTS{"", "", ""}, STR_LISTS{"", ""}}.release();
auto const lv = cudf::lists_column_view(l->view());
auto const results = cudf::strings::concatenate(lv);
auto const expected = STR_COL{"", "", ""};
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*results, expected, print_all);
}

TEST_F(StringsListsConcatenateTest, AllNullsStringsInput) {}
TEST_F(StringsListsConcatenateTest, AllNullsStringsInput)
{
auto const l = STR_LISTS{STR_LISTS{{""}, all_nulls()},
STR_LISTS{{"", "", ""}, all_nulls()},
STR_LISTS{{"", ""}, all_nulls()}}
.release();
auto const lv = cudf::lists_column_view(l->view());
auto const results = cudf::strings::concatenate(lv);
auto const expected = STR_COL{{"", "", ""}, all_nulls()};
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*results, expected, print_all);
}

TEST_F(StringsListsConcatenateTest, ScalarSeparatorNoReplacements) {}
TEST_F(StringsListsConcatenateTest, ScalarSeparator)
{
auto const l = STR_LISTS{{STR_LISTS{{"a", "bb" /*NULL*/, "ccc"}, null_at(1)},
STR_LISTS{}, /*NULL*/
STR_LISTS{{"ddd" /*NULL*/, "efgh", "ijk"}, null_at(0)},
STR_LISTS{"zzz", "xxxxx"}},
null_at(1)}
.release();
auto const lv = cudf::lists_column_view(l->view());

// No null replacement
{
auto const results = cudf::strings::concatenate(lv, cudf::string_scalar("+++"));
std::vector<const char*> h_expected{nullptr, nullptr, nullptr, "zzz+++xxxxx"};
auto const expected =
STR_COL{h_expected.begin(), h_expected.end(), nulls_from_nullptr(h_expected)};
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*results, expected, print_all);
}

TEST_F(StringsListsConcatenateTest, ScalarSeparatorWithReplacements) {}
// With null replacement
{
auto const results =
cudf::strings::concatenate(lv, cudf::string_scalar("+++"), cudf::string_scalar("___"));
std::vector<const char*> h_expected{
"a+++___+++ccc", nullptr, "___+++efgh+++ijk", "zzz+++xxxxx"};
auto const expected =
STR_COL{h_expected.begin(), h_expected.end(), nulls_from_nullptr(h_expected)};
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*results, expected, print_all);
}
}

TEST_F(StringsListsConcatenateTest, SlicedListsWithScalarSeparator)
{
auto const l = STR_LISTS{{STR_LISTS{{"a", "bb" /*NULL*/, "ccc"}, null_at(1)},
STR_LISTS{}, /*NULL*/
STR_LISTS{{"ddd" /*NULL*/, "efgh", "ijk"}, null_at(0)},
STR_LISTS{"zzz", "xxxxx"}},
null_at(1)}
.release();
auto const lv = cudf::lists_column_view(l->view());

// No null replacement
{
auto const results = cudf::strings::concatenate(lv, cudf::string_scalar("+++"));
std::vector<const char*> h_expected{nullptr, nullptr, nullptr, "zzz+++xxxxx"};
auto const expected =
STR_COL{h_expected.begin(), h_expected.end(), nulls_from_nullptr(h_expected)};
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*results, expected, print_all);
}

TEST_F(StringsListsConcatenateTest, SlicedListsWithScalarSeparator) {}
// With null replacement
{
auto const results =
cudf::strings::concatenate(lv, cudf::string_scalar("+++"), cudf::string_scalar("___"));
std::vector<const char*> h_expected{
"a+++___+++ccc", nullptr, "___+++efgh+++ijk", "zzz+++xxxxx"};
auto const expected =
STR_COL{h_expected.begin(), h_expected.end(), nulls_from_nullptr(h_expected)};
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*results, expected, print_all);
}
}

TEST_F(StringsListsConcatenateTest, ColumnSeparatorNoReplacements) {}
TEST_F(StringsListsConcatenateTest, ColumnSeparatorNoReplacements)
{
auto const l = STR_LISTS{STR_LISTS{{""}, all_nulls()},
STR_LISTS{{"", "", ""}, all_nulls()},
STR_LISTS{{"", ""}, all_nulls()}}
.release();
auto const lv = cudf::lists_column_view(l->view());
auto const results = cudf::strings::concatenate(lv);
auto const expected = STR_COL{{"", "", ""}, all_nulls()};
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*results, expected, print_all);
}

TEST_F(StringsListsConcatenateTest, ColumnSeparatorWithReplacements) {}
TEST_F(StringsListsConcatenateTest, ColumnSeparatorWithReplacements)
{
auto const l = STR_LISTS{STR_LISTS{{""}, all_nulls()},
STR_LISTS{{"", "", ""}, all_nulls()},
STR_LISTS{{"", ""}, all_nulls()}}
.release();
auto const lv = cudf::lists_column_view(l->view());
auto const results = cudf::strings::concatenate(lv);
auto const expected = STR_COL{{"", "", ""}, all_nulls()};
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*results, expected, print_all);
}

TEST_F(StringsListsConcatenateTest, SlicedListsWithColumnSeparator) {}
TEST_F(StringsListsConcatenateTest, SlicedListsWithColumnSeparator)
{
auto const l = STR_LISTS{STR_LISTS{{""}, all_nulls()},
STR_LISTS{{"", "", ""}, all_nulls()},
STR_LISTS{{"", ""}, all_nulls()}}
.release();
auto const lv = cudf::lists_column_view(l->view());
auto const results = cudf::strings::concatenate(lv);
auto const expected = STR_COL{{"", "", ""}, all_nulls()};
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*results, expected, print_all);
}