Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update contains_table to experimental row hasher and equality comparator #13119

Merged

Conversation

divyegala
Copy link
Member

@divyegala divyegala commented Apr 11, 2023

This is a part of #11844

Benchmarks: #13119 (comment)

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Apr 11, 2023
@divyegala divyegala added feature request New feature or request non-breaking Non-breaking change labels Apr 11, 2023
@divyegala
Copy link
Member Author

divyegala commented Apr 14, 2023

benchmarks
Join<int32_t, int32_t>/left_anti_join_32bit/100000/100000/manual_time                                              +0.0261         +0.0266             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit/100000/400000/manual_time                                              +0.0575         +0.0519             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit/100000/1000000/manual_time                                             +0.0767         +0.0710             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit/10000000/10000000/manual_time                                          +0.0050         +0.0050            11            11            11            11
Join<int32_t, int32_t>/left_anti_join_32bit/10000000/40000000/manual_time                                          +0.0078         +0.0078            25            25            25            25
Join<int32_t, int32_t>/left_anti_join_32bit/10000000/100000000/manual_time                                         +0.0089         +0.0090            52            53            52            53
Join<int32_t, int32_t>/left_anti_join_32bit/100000000/100000000/manual_time                                        +0.0029         +0.0029           118           118           118           118
Join<int32_t, int32_t>/left_anti_join_32bit/80000000/240000000/manual_time                                         +0.0050         +0.0050           172           173           172           173
Join<int64_t, int64_t>/left_anti_join_64bit/50000000/50000000/manual_time                                          +0.0025         +0.0025            59            59            59            59
Join<int64_t, int64_t>/left_anti_join_64bit/40000000/120000000/manual_time                                         +0.0044         +0.0044            86            87            86            87
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/100000/100000/manual_time                                        +0.0405         +0.0360             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/100000/400000/manual_time                                        +0.0575         +0.0518             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/100000/1000000/manual_time                                       +0.0806         +0.0750             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/10000000/10000000/manual_time                                    +0.0412         +0.0412             5             5             5             5
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/10000000/40000000/manual_time                                    +0.0713         +0.0712            10            11            10            11
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/10000000/100000000/manual_time                                   +0.0862         +0.0861            21            23            21            23
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/100000000/100000000/manual_time                                  +0.0387         +0.0387            51            53            51            53
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/80000000/240000000/manual_time                                   +0.0609         +0.0609            69            74            69            74
Join<int64_t, int64_t>/left_anti_join_64bit_nulls/50000000/50000000/manual_time                                    +0.0337         +0.0337            26            27            27            27
Join<int64_t, int64_t>/left_anti_join_64bit_nulls/40000000/120000000/manual_time                                   +0.0577         +0.0577            36            38            36            38
Join<int32_t, int32_t>/left_semi_join_32bit/100000/100000/manual_time                                              +0.0396         +0.0351             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit/100000/400000/manual_time                                              +0.0570         +0.0519             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit/100000/1000000/manual_time                                             +0.0695         +0.0641             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit/10000000/10000000/manual_time                                          +0.0049         +0.0048            11            11            11            11
Join<int32_t, int32_t>/left_semi_join_32bit/10000000/40000000/manual_time                                          +0.0077         +0.0077            25            25            25            25
Join<int32_t, int32_t>/left_semi_join_32bit/10000000/100000000/manual_time                                         +0.0090         +0.0090            52            53            52            53
Join<int32_t, int32_t>/left_semi_join_32bit/100000000/100000000/manual_time                                        +0.0029         +0.0029           118           118           118           118
Join<int32_t, int32_t>/left_semi_join_32bit/80000000/240000000/manual_time                                         +0.0048         +0.0048           171           172           171           172
Join<int64_t, int64_t>/left_semi_join_64bit/50000000/50000000/manual_time                                          +0.0025         +0.0024            59            59            59            59
Join<int64_t, int64_t>/left_semi_join_64bit/40000000/120000000/manual_time                                         +0.0042         +0.0042            86            86            86            86
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/100000/100000/manual_time                                        +0.0336         +0.0298             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/100000/400000/manual_time                                        +0.0575         +0.0522             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/100000/1000000/manual_time                                       +0.0808         +0.0755             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/10000000/10000000/manual_time                                    +0.0417         +0.0416             5             5             5             5
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/10000000/40000000/manual_time                                    +0.0697         +0.0695            10            11            10            11
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/10000000/100000000/manual_time                                   +0.0879         +0.0878            20            22            20            22
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/100000000/100000000/manual_time                                  +0.0392         +0.0392            51            53            51            53
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/80000000/240000000/manual_time                                   +0.0636         +0.0636            69            73            69            73
Join<int64_t, int64_t>/left_semi_join_64bit_nulls/50000000/50000000/manual_time                                    +0.0344         +0.0343            26            27            26            27
Join<int64_t, int64_t>/left_semi_join_64bit_nulls/40000000/120000000/manual_time                                   +0.0586         +0.0586            35            37            35            37

@divyegala divyegala marked this pull request as ready for review April 14, 2023 22:08
@divyegala divyegala requested a review from a team as a code owner April 14, 2023 22:08
//
// TODO: We should unify these code paths in the future when performance regression is no longer
// happening.
return contains_impl(haystack, needles, compare_nulls, compare_nans, stream, mr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that function is called just once, please move its content here and remove it completely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I don't see a need for a separate impl function here.

@GregoryKimball
Copy link
Contributor

GregoryKimball commented Apr 18, 2023

Thank you @divyegala for suggesting this change. We can just barely see a performance difference in the automated microbenchmarks - about 1.5% slower in the worst case. The difference is close to the noise level and acceptable in my opinion.

image

image

@abellina
Copy link
Contributor

@divyegala can the 1.5% worst-case regression that @GregoryKimball found be attributed to any of the code changes?

@divyegala
Copy link
Member Author

@divyegala can the 1.5% worst-case regression that @GregoryKimball found be attributed to any of the code changes?

@abellina Yes, I would say that if it's not just noise and we consistently see a 1.5% worst-case regression then it's because of the code changes. That's where your help with the NDS query would be a good insight to have.

@abellina
Copy link
Contributor

@divyegala my question was if you could explain why there would be a regression. e.g. would you expect a regression if there are nulls, certain data types, specific cuDF expressions.

@abellina
Copy link
Contributor

@divyegala I do not see regressions with this patch vs the 23.06 nightly. I ran it 5 times each, and then compared both sets of results:

Name = benchmark
Means = 419055.2, 416434.8
Time diff = 2620.4000000000233
Speedup = 1.0062924616290474
T-Test (test statistic, p value, df) = 1.043405649912458, 0.32726191693134077, 8.0
T-Test Confidence Interval = -3170.879006346624, 8411.67900634667
ALERT: significant change has been detected (p-value < 0.05)

@vyasr
Copy link
Contributor

vyasr commented Apr 19, 2023

Based on what I see above, this PR should be ready for review then? No real performance concerns here IIUC.

@divyegala
Copy link
Member Author

@vyasr yes, it's ready for review

@vyasr
Copy link
Contributor

vyasr commented Apr 24, 2023

@abellina just in case you didn't see (I'm not sure if there was additional offline communication here) #11299 was the original performance regression that triggered the creation of the extra code paths that are removed in this PR (#11330 introduced the code).

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving assuming Nghia's comment about combining functions is addressed.

//
// TODO: We should unify these code paths in the future when performance regression is no longer
// happening.
return contains_impl(haystack, needles, compare_nulls, compare_nans, stream, mr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I don't see a need for a separate impl function here.

Copy link
Contributor

@ttnghia ttnghia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since some code was removed, there are some headers such as struct utilities are no longer needed. Please clean them up too.

Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any need for additional unit tests for this change?

@divyegala
Copy link
Member Author

are there any need for additional unit tests for this change?

@karthikeyann this functionality already existed, this PR is just essentially a refactor

@divyegala
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 73423f8 into rapidsai:branch-23.06 Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants