Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regex_program strings splitting java APIs and tests #12713

Merged

Conversation

cindyyuanjiang
Copy link
Contributor

@cindyyuanjiang cindyyuanjiang commented Feb 6, 2023

Description

This PR adds split_re, rsplit_re, split_record_re, rsplit_record_re related regex_program java APIs and unit tests.
Part of work for NVIDIA/spark-rapids#7295.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added the Java Affects Java cuDF API. label Feb 6, 2023
@cindyyuanjiang cindyyuanjiang added feature request New feature or request non-breaking Non-breaking change labels Feb 6, 2023
@codecov
Copy link

codecov bot commented Feb 7, 2023

Codecov Report

❗ No coverage uploaded for pull request base (branch-23.04@291c751). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-23.04   #12713   +/-   ##
===============================================
  Coverage                ?   85.81%           
===============================================
  Files                   ?      158           
  Lines                   ?    25153           
  Branches                ?        0           
===============================================
  Hits                    ?    21586           
  Misses                  ?     3567           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@cindyyuanjiang cindyyuanjiang marked this pull request as ready for review February 7, 2023 20:06
@cindyyuanjiang cindyyuanjiang requested a review from a team as a code owner February 7, 2023 20:06
/**
* Returns a list of columns by splitting each string using the specified regex program pattern.
* The number of rows in the output columns will be the same as the input column. Null entries
* are added for a row where split results have been exhausted. Null input entries result in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* are added for a row where split results have been exhausted. Null input entries result in
* are added for the rows where split results have been exhausted. Null input entries result in

/**
* Returns a list of columns by splitting each string using the specified regex program pattern.
* The number of rows in the output columns will be the same as the input column. Null entries
* are added for a row where split results have been exhausted. Null input entries result in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* are added for a row where split results have been exhausted. Null input entries result in
* are added for the rows where split results have been exhausted. Null input entries result in

* corresponding rows of the output columns.
* Returns a list of columns by splitting each string using the specified string literal
* delimiter. The number of rows in the output columns will be the same as the input column.
* Null entries are added for a row where split results have been exhausted. Null input entries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Null entries are added for a row where split results have been exhausted. Null input entries
* Null entries are added for the rows where split results have been exhausted. Null input entries

/**
* Returns a list of columns by splitting each string using the specified regular expression
* pattern. The number of rows in the output columns will be the same as the input column.
* Null entries are added for a row where split results have been exhausted. Null input entries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Null entries are added for a row where split results have been exhausted. Null input entries
* Null entries are added for the rows where split results have been exhausted. Null input entries

Comment on lines 726 to 727
auto const column_view = reinterpret_cast<cudf::column_view const *>(input_handle);
auto const strings_column = cudf::strings_column_view{*column_view};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to avoid column_view name as it may clash with cudf::column_view.

Suggested change
auto const column_view = reinterpret_cast<cudf::column_view const *>(input_handle);
auto const strings_column = cudf::strings_column_view{*column_view};
auto const input = reinterpret_cast<cudf::column_view const *>(input_handle);
auto const strings_column = cudf::strings_column_view{*input};

@@ -735,22 +754,43 @@ JNIEXPORT jlong JNICALL Java_ai_rapids_cudf_ColumnView_stringSplitRecord(JNIEnv

try {
cudf::jni::auto_set_device(env);
auto const input = reinterpret_cast<cudf::column_view *>(input_handle);
auto const strs_input = cudf::strings_column_view{*input};
auto const column_view = reinterpret_cast<cudf::column_view const *>(input_handle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above.


try {
cudf::jni::auto_set_device(env);
auto const column_view = reinterpret_cast<cudf::column_view const *>(input_handle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! All changes updated.

@cindyyuanjiang
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit fea6288 into rapidsai:branch-23.04 Feb 8, 2023
@cindyyuanjiang cindyyuanjiang deleted the splitting-regex-program-apis branch February 8, 2023 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Java Affects Java cuDF API. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants