Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for working with functions in dataframes, additional documentation #1430

Merged
merged 7 commits into from
Dec 15, 2021

Conversation

tobyhede
Copy link
Contributor

@tobyhede tobyhede commented Dec 10, 2021

Which issue does this PR close?

Closes #1364 Closes #1173

Several of the built-in function definitions are not setup correctly and the functions cannot actually be used at all.

Adds a test suite for using most of the functions with a dataframe.

In order to try and catch errors like this in future, as well as provide some extra documentation of intent, I've changed the helper macros to explicitly accept arguments rather than use a fixed arity.

I've played with a couple of options for functions that have a mixed arity.

btrim, as an example has two forms:

btrim(string); // defaults to trim whitespace from string
btrim(string, characters); // trims the supplied characters from string

At the moment, functions with varied arity expect a Vec

btrim(vec![col("a"), lit("ab")]);

Alternative I played with was using two different definitions:

btrim(string);
btrim_chars(string, characters);

We could also make these functions macros. Doing that would mean that some functions would be functions, some macros. Felt a bit strange.

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Dec 10, 2021
@alamb alamb changed the title Fixes for working with functions in dataframes Fixes for working with functions in dataframes, additional documentation Dec 13, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to try and catch errors like this in future, as well as provide some extra documentation of intent, I've changed the helper macros to explicitly accept arguments rather than use a fixed arity.

Thank you so much @tobyhede -- this is a great contribution.

Other than figuring out why the tests for regex_expressions are commented out I think this PR is looking ready to merge.

❤️

@@ -1564,7 +1564,7 @@ pub fn approx_distinct(expr: Expr) -> Expr {
/// Create an convenience function representing a unary scalar function
macro_rules! unary_scalar_expr {
($ENUM:ident, $FUNC:ident) => {
#[doc = "this scalar function is not documented yet"]
#[doc = concat!("Unary scalar function definition for ", stringify!($FUNC) ) ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 1627 to 1628
// scalar_expr!(Btrim, btrim, string);
// scalar_expr!(Btrim, btrim_chars, string, characters);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to these two functions (as in do you mean to leave them commented out)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb Left these in to demonstrate the alternative approach to handling the differnt function arities with different functions. Will clean up.

binary_scalar_expr!(DateTrunc, date_trunc);
binary_scalar_expr!(Digest, digest);
scalar_expr!(DatePart, date_part, part, date);
scalar_expr!(DateTrunc, date_trunc, part, date);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for other reviewers, digest was moved above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, digest was with date functions but is arguably string function or could be a section with some of the other hashes

datafusion/src/logical_plan/expr.rs Outdated Show resolved Hide resolved
datafusion/src/logical_plan/expr.rs Outdated Show resolved Hide resolved
@@ -2575,6 +2575,19 @@ mod tests {
Int32,
Int32Array
);
// #[cfg(feature = "regex_expressions")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to fix this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this one doesn't work and I have removed.

There is a bug in the regex handling #1429
I might pick up that one next.

// specific language governing permissions and limitations
// under the License.

use std::sync::Arc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module of tests is epic -- thank you @tobyhede

@alamb
Copy link
Contributor

alamb commented Dec 13, 2021

I also kicked off the CI

@alamb
Copy link
Contributor

alamb commented Dec 13, 2021

@tobyhede there appears to be some small clippy failures: https://github.com/apache/arrow-datafusion/runs/4506011831?check_suite_focus=true

@houqp houqp added bug Something isn't working documentation Improvements or additions to documentation labels Dec 14, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tobyhede

@alamb
Copy link
Contributor

alamb commented Dec 15, 2021

I ran cargo fmt and checked in the result as part of 253153f

@alamb alamb merged commit 6478a33 into apache:master Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datafusion Changes in the datafusion crate documentation Improvements or additions to documentation
Projects
None yet
3 participants