Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Precision<usize> copy to make it clear clones are not expensive #11828

Merged
merged 2 commits into from
Aug 13, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Aug 5, 2024

Which issue does this PR close?

Builds on #11802

Rationale for this change

While reviewing #11802 from @Rachelint I had to double check that a call to clone() was not copying anything large (specifically a ScalarValue) and it turns it that it is not

What changes are included in this PR?

Move the clone() (of a usize) into a function where the type is explict

Add #[Derive(Copy)] to Precision which results in only calls to Precision<ScalarValue> needing a clone

Are these changes tested?

By existing CI

Are there any user-facing changes?

Precision is now Copy in many places, so clippy may tell you you need to remove clone()

@@ -91,10 +91,10 @@ pub async fn get_statistics_with_limit(
// counts across all the files in question. If any file does not
// provide any information or provides an inexact value, we demote
// the statistic precision to inexact.
num_rows = add_row_stats(file_stats.num_rows.clone(), num_rows);
num_rows = add_row_stats(&file_stats.num_rows, &num_rows);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to have Precision to be Copy when T is Copy? That would simplify things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THank you @Dandandan -- that was a great idea -- I did it in in 7b71ea5 and it worked great

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👌

@Rachelint
Copy link
Contributor

Yes, I check the clone cost of Precision many times too... It is really good to make it clear

@alamb alamb force-pushed the alamb/maybe_cleaner branch from 62fb091 to 8cf84ab Compare August 8, 2024 15:45
@alamb alamb changed the title Minor: make it clearer that clone() is not slow Minor: make it clearer that clone() in addo_row_stats is not slow Aug 8, 2024
@alamb alamb changed the title Minor: make it clearer that clone() in addo_row_stats is not slow Make Precision<usize> copy to make it clear clones are not expensive Aug 8, 2024
@github-actions github-actions bot added the physical-expr Physical Expressions label Aug 8, 2024
@@ -91,10 +91,10 @@ pub async fn get_statistics_with_limit(
// counts across all the files in question. If any file does not
// provide any information or provides an inexact value, we demote
// the statistic precision to inexact.
num_rows = add_row_stats(file_stats.num_rows.clone(), num_rows);
num_rows = add_row_stats(&file_stats.num_rows, &num_rows);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THank you @Dandandan -- that was a great idea -- I did it in in 7b71ea5 and it worked great

@@ -25,7 +25,7 @@ use arrow_schema::Schema;

/// Represents a value with a degree of certainty. `Precision` is used to
/// propagate information the precision of statistical values.
#[derive(Clone, PartialEq, Eq, Default)]
#[derive(Clone, PartialEq, Eq, Default, Copy)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change only affects the type when the templated type is Copy

So in other words this doesn't make it easy to accidentally copy Precision<ScalarValue> only copies when the underlying T also supports copy

I found https://users.rust-lang.org/t/copy-bound-on-type-parameters/58928 helpful

Copy link
Contributor

@crepererum crepererum Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW: You can always manually implement the Copy trait if you want to have fine grained control about the bounds:

impl<T> Copy for Precision<T> where T: Debug + Clone + PartialEq + Eq + PartialOrd + Clone {}

Copy link
Contributor Author

@alamb alamb Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 -- I think that is basically what #[derive(Debug)] does in this case, which is kind of cool

let precision: Precision<ScalarValue> =
Precision::Exact(ScalarValue::Int64(Some(42)));
// Clippy would complain about this if it were Copy
let p2 = precision.clone();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is my test to ensure Precision<ScalarValue> is not copy

assert_eq!(exact_precision.clone().to_inexact(), inexact_precision);
assert_eq!(inexact_precision.clone().to_inexact(), inexact_precision);
assert_eq!(absent_precision.clone().to_inexact(), absent_precision);
assert_eq!(exact_precision.to_inexact(), inexact_precision);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once I derived Copy, clippy lead me to all the various other changes in this PR

@alamb alamb marked this pull request as ready for review August 8, 2024 16:05
@crepererum crepererum merged commit 5d3cda5 into apache:main Aug 13, 2024
26 checks passed
@alamb alamb deleted the alamb/maybe_cleaner branch August 13, 2024 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants