We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug There are cases when strings::extract will produce a null, when it should have matched an empty string and returned that.
Steps/Code to reproduce bug
diff --git a/cpp/tests/strings/extract_tests.cpp b/cpp/tests/strings/extract_tests.cpp index 5f38c142f..816877ff1 100644 --- a/cpp/tests/strings/extract_tests.cpp +++ b/cpp/tests/strings/extract_tests.cpp @@ -28,6 +28,84 @@ struct StringsExtractTests : public cudf::test::BaseFixture { }; +TEST_F(StringsExtractTests, EmptyExtractTest) +{ + std::vector<const char*> h_strings{ + "First Last", "Joe Schmoe", "John Smith", "Jane Smith", "Beyonce", "Sting", nullptr, ""}; + + cudf::test::strings_column_wrapper strings( + h_strings.begin(), + h_strings.end(), + thrust::make_transform_iterator(h_strings.begin(), [](auto str) { return str != nullptr; })); + auto strings_view = cudf::strings_column_view(strings); + + std::vector<const char*> h_expecteds{"First", + "Joe", + "John", + "Jane", + "Beyonce", + "Sting", + nullptr, + "", + "Last", + "Schmoe", + "Smith", + "Smith", + "", + "", + nullptr, + ""}; + + std::string pattern = "\\A(\\w*) ?(\\w*)\\Z"; + auto results = cudf::strings::extract(strings_view, pattern); + + cudf::test::strings_column_wrapper expected1( + h_expecteds.data(), + h_expecteds.data() + h_strings.size(), + thrust::make_transform_iterator(h_expecteds.begin(), [](auto str) { return str != nullptr; })); + cudf::test::strings_column_wrapper expected2( + h_expecteds.data() + h_strings.size(), + h_expecteds.data() + h_expecteds.size(), + thrust::make_transform_iterator(h_expecteds.data() + h_strings.size(), + [](auto str) { return str != nullptr; })); + std::vector<std::unique_ptr<cudf::column>> columns; + columns.push_back(expected1.release()); + columns.push_back(expected2.release()); + cudf::experimental::table expected(std::move(columns)); + cudf::test::expect_tables_equal(*results, expected); +} + +TEST_F(StringsExtractTests, EmptyExtractTes2) +{ + std::vector<const char*> h_strings{ + nullptr, "AAA", "AAA_A", "AAA_AAA_", "A__", ""}; + + cudf::test::strings_column_wrapper strings( + h_strings.begin(), + h_strings.end(), + thrust::make_transform_iterator(h_strings.begin(), [](auto str) { return str != nullptr; })); + auto strings_view = cudf::strings_column_view(strings); + + std::vector<const char*> h_expecteds{nullptr, + "AAA", + "A", + "", + "", + ""}; + + std::string pattern = "([^_]*)\\Z"; + auto results = cudf::strings::extract(strings_view, pattern); + + cudf::test::strings_column_wrapper expected1( + h_expecteds.data(), + h_expecteds.data() + h_strings.size(), + thrust::make_transform_iterator(h_expecteds.begin(), [](auto str) { return str != nullptr; })); + std::vector<std::unique_ptr<cudf::column>> columns; + columns.push_back(expected1.release()); + cudf::experimental::table expected(std::move(columns)); + cudf::test::expect_tables_equal(*results, expected); +} + TEST_F(StringsExtractTests, ExtractTest) { std::vector<const char*> h_strings{
Expected behavior Return an empty string in all cases where it would match, and null for cases where it would not.
The text was updated successfully, but these errors were encountered:
Looks like Pandas is returning empty strings as well:
>>> import pandas as pd >>> ps = pd.Series(["AAA","A","","",""]) >>> ps.str.extract("([^_]*)\\Z",expand=True) 0 0 AAA 1 A 2 3 4 >>> import cudf >>> ds = cudf.Series(["AAA","A","","",""]) >>> ds.str.extract("([^_]*)\\Z") 0 0 AAA 1 A 2 None 3 None 4 None
Sorry, something went wrong.
davidwendt
Successfully merging a pull request may close this issue.
Describe the bug
There are cases when strings::extract will produce a null, when it should have matched an empty string and returned that.
Steps/Code to reproduce bug
Expected behavior
Return an empty string in all cases where it would match, and null for cases where it would not.
The text was updated successfully, but these errors were encountered: