Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor mixed_semi_join using cuco::static_set #16230

Merged

Conversation

srinivasyadav18
Copy link
Contributor

@srinivasyadav18 srinivasyadav18 commented Jul 9, 2024

Description

This PR refactors mixed_semi_join by replacing cuco legacy static_map with latest static_set.
Contributes to #12261.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jul 9, 2024
@srinivasyadav18 srinivasyadav18 added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. labels Jul 9, 2024
cpp/src/join/mixed_join_semi.cu Outdated Show resolved Hide resolved
cpp/src/join/mixed_join_kernels_semi.cu Outdated Show resolved Hide resolved
cpp/src/join/mixed_join_common_utils.cuh Outdated Show resolved Hide resolved
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jul 11, 2024
@srinivasyadav18 srinivasyadav18 marked this pull request as ready for review July 18, 2024 02:05
@srinivasyadav18 srinivasyadav18 requested a review from a team as a code owner July 18, 2024 02:05
cpp/src/join/mixed_join_kernels_semi.cu Outdated Show resolved Hide resolved
cpp/src/join/mixed_join_kernels_semi.cu Outdated Show resolved Hide resolved
cpp/src/join/mixed_join_kernels_semi.cu Outdated Show resolved Hide resolved
cpp/src/join/mixed_join_common_utils.cuh Outdated Show resolved Hide resolved
cpp/src/join/mixed_join_common_utils.cuh Show resolved Hide resolved
cpp/src/join/mixed_join_common_utils.cuh Show resolved Hide resolved
@@ -160,22 +121,16 @@ std::unique_ptr<rmm::device_uvector<size_type>> mixed_join_semi(
auto const preprocessed_probe =
experimental::row::equality::preprocessed_table::create(probe, stream);
auto const row_comparator =
cudf::experimental::row::equality::two_table_comparator{preprocessed_probe, preprocessed_build};
cudf::experimental::row::equality::two_table_comparator{preprocessed_build, preprocessed_probe};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason for this order change build and probe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to dig into why but without this change the gtests start failing. Maybe @srinivasyadav18 can provide us a quick insight.

Copy link
Member

@mhaseeb123 mhaseeb123 Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: We changed this order and added swap_tables=true at mixed_join_kernels_semi.cu:+72 (tagged you there) to fix the left and right directionality in the equality in single_expression_equality::operator(). Specifically, the single_expression_equality expects indices in {build, probe} order and we have also reset the hasher and equality functors for the cuco::static_set which swaps the internal left and right directions for {build, probe} hence us using swap_table=true.

I know it's quite confusing. 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly due to a breaking change in NVIDIA/cuCollections#479 where we always put query keys on the left-hand side for comparison.

@ttnghia
Copy link
Contributor

ttnghia commented Aug 15, 2024

This PR refactors mixed_semi_join by replacing cuco legacy static_map with latest static_set.

Please provide benchmark showing what benefit this PR will bring to us.

@srinivasyadav18 srinivasyadav18 requested a review from a team as a code owner August 16, 2024 19:25
@mhaseeb123 mhaseeb123 requested review from mhaseeb123 and removed request for mhaseeb123 and JayjeetAtGithub September 10, 2024 23:41
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor improvement otherwise looks great to me.

@mhaseeb123 Thank you for stepping in to help resolve the issue and finalize the PR! 🔥 🔥 🔥

cpp/src/join/mixed_join_kernels_semi.cu Outdated Show resolved Hide resolved
@@ -160,22 +121,16 @@ std::unique_ptr<rmm::device_uvector<size_type>> mixed_join_semi(
auto const preprocessed_probe =
experimental::row::equality::preprocessed_table::create(probe, stream);
auto const row_comparator =
cudf::experimental::row::equality::two_table_comparator{preprocessed_probe, preprocessed_build};
cudf::experimental::row::equality::two_table_comparator{preprocessed_build, preprocessed_probe};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly due to a breaking change in NVIDIA/cuCollections#479 where we always put query keys on the left-hand side for comparison.

@mhaseeb123 mhaseeb123 requested review from a team and JayjeetAtGithub September 16, 2024 17:53
@rapidsai rapidsai deleted a comment from srinivasyadav18 Sep 16, 2024
@mhaseeb123 mhaseeb123 added 5 - DO NOT MERGE Hold off on merging; see PR for details 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs Review Waiting for reviewer to review or respond 5 - DO NOT MERGE Hold off on merging; see PR for details labels Sep 17, 2024
@mhaseeb123
Copy link
Member

mhaseeb123 commented Sep 17, 2024

Performance results from Spark (spark-h). Thanks to @zpuller for running this. Queries of interest: q16 and q94 that use mixed_semi_join. @ttnghia @GregoryKimball

Regression alerts
-----------------

Speedup results
-----------------
query1: Previous (1881.6 ms) vs Current (1774.4 ms) Diff 107 E2E 1.06x
query2: Previous (1651.8 ms) vs Current (1693.4 ms) Diff -41 E2E 0.98x
query3: Previous (441.0 ms) vs Current (440.2 ms) Diff 0 E2E 1.00x
query4: Previous (8464.8 ms) vs Current (8578.4 ms) Diff -113 E2E 0.99x
query5: Previous (1917.6 ms) vs Current (2090.8 ms) Diff -173 E2E 0.92x
query6: Previous (1026.4 ms) vs Current (946.8 ms) Diff 79 E2E 1.08x
query7: Previous (2728.8 ms) vs Current (2871.4 ms) Diff -142 E2E 0.95x
query8: Previous (1070.8 ms) vs Current (1062.4 ms) Diff 8 E2E 1.01x
query9: Previous (3184.8 ms) vs Current (3115.6 ms) Diff 69 E2E 1.02x
query10: Previous (1529.4 ms) vs Current (1480.2 ms) Diff 49 E2E 1.03x
query11: Previous (4687.8 ms) vs Current (4840.4 ms) Diff -152 E2E 0.97x
query12: Previous (552.2 ms) vs Current (565.8 ms) Diff -13 E2E 0.98x
query13: Previous (1100.6 ms) vs Current (1098.0 ms) Diff 2 E2E 1.00x
query14_part1: Previous (5230.6 ms) vs Current (5320.0 ms) Diff -89 E2E 0.98x
query14_part2: Previous (4355.2 ms) vs Current (4310.8 ms) Diff 44 E2E 1.01x
query15: Previous (1063.8 ms) vs Current (1098.4 ms) Diff -34 E2E 0.97x
query16: Previous (3262.6 ms) vs Current (3335.4 ms) Diff -72 E2E 0.98x
query17: Previous (1844.0 ms) vs Current (1744.4 ms) Diff 99 E2E 1.06x
query18: Previous (3182.2 ms) vs Current (3178.6 ms) Diff 3 E2E 1.00x
query19: Previous (1488.0 ms) vs Current (1500.8 ms) Diff -12 E2E 0.99x
query20: Previous (591.6 ms) vs Current (629.8 ms) Diff -38 E2E 0.94x
query21: Previous (548.4 ms) vs Current (635.4 ms) Diff -87 E2E 0.86x
query22: Previous (962.0 ms) vs Current (950.6 ms) Diff 11 E2E 1.01x
query23_part1: Previous (10493.8 ms) vs Current (10460.0 ms) Diff 33 E2E 1.00x
query23_part2: Previous (14016.4 ms) vs Current (14153.8 ms) Diff -137 E2E 0.99x
query24_part1: Previous (6507.8 ms) vs Current (6394.2 ms) Diff 113 E2E 1.02x
query24_part2: Previous (6494.6 ms) vs Current (6488.6 ms) Diff 6 E2E 1.00x
query25: Previous (1638.2 ms) vs Current (1599.6 ms) Diff 38 E2E 1.02x
query26: Previous (849.2 ms) vs Current (851.0 ms) Diff -1 E2E 1.00x
query27: Previous (1346.0 ms) vs Current (1222.6 ms) Diff 123 E2E 1.10x
query28: Previous (6245.0 ms) vs Current (5135.8 ms) Diff 1109 E2E 1.22x
query29: Previous (2420.2 ms) vs Current (2526.8 ms) Diff -106 E2E 0.96x
query30: Previous (1756.0 ms) vs Current (1744.6 ms) Diff 11 E2E 1.01x
query31: Previous (2492.2 ms) vs Current (2522.8 ms) Diff -30 E2E 0.99x
query32: Previous (833.0 ms) vs Current (874.2 ms) Diff -41 E2E 0.95x
query33: Previous (1374.6 ms) vs Current (1313.0 ms) Diff 61 E2E 1.05x
query34: Previous (1649.6 ms) vs Current (1662.8 ms) Diff -13 E2E 0.99x
query35: Previous (1948.6 ms) vs Current (1984.4 ms) Diff -35 E2E 0.98x
query36: Previous (965.0 ms) vs Current (1033.8 ms) Diff -68 E2E 0.93x
query37: Previous (1196.2 ms) vs Current (1344.4 ms) Diff -148 E2E 0.89x
query38: Previous (3036.8 ms) vs Current (3376.4 ms) Diff -339 E2E 0.90x
query39_part1: Previous (2122.8 ms) vs Current (2113.8 ms) Diff 9 E2E 1.00x
query39_part2: Previous (1412.2 ms) vs Current (1311.0 ms) Diff 101 E2E 1.08x
query40: Previous (1165.6 ms) vs Current (1141.0 ms) Diff 24 E2E 1.02x
query41: Previous (326.4 ms) vs Current (306.0 ms) Diff 20 E2E 1.07x
query42: Previous (318.2 ms) vs Current (312.2 ms) Diff 6 E2E 1.02x
query43: Previous (726.8 ms) vs Current (739.4 ms) Diff -12 E2E 0.98x
query44: Previous (624.8 ms) vs Current (637.8 ms) Diff -13 E2E 0.98x
query45: Previous (1137.0 ms) vs Current (1100.4 ms) Diff 36 E2E 1.03x
query46: Previous (1616.2 ms) vs Current (1655.2 ms) Diff -39 E2E 0.98x
query47: Previous (1465.8 ms) vs Current (1507.8 ms) Diff -42 E2E 0.97x
query48: Previous (839.0 ms) vs Current (825.4 ms) Diff 13 E2E 1.02x
query49: Previous (2236.4 ms) vs Current (2186.2 ms) Diff 50 E2E 1.02x
query50: Previous (6857.4 ms) vs Current (7052.4 ms) Diff -195 E2E 0.97x
query51: Previous (2435.4 ms) vs Current (2233.2 ms) Diff 202 E2E 1.09x
query52: Previous (393.0 ms) vs Current (421.4 ms) Diff -28 E2E 0.93x
query53: Previous (732.2 ms) vs Current (562.8 ms) Diff 169 E2E 1.30x
query54: Previous (1319.2 ms) vs Current (1375.2 ms) Diff -56 E2E 0.96x
query55: Previous (363.8 ms) vs Current (368.8 ms) Diff -5 E2E 0.99x
query56: Previous (969.4 ms) vs Current (927.2 ms) Diff 42 E2E 1.05x
query57: Previous (1292.4 ms) vs Current (1290.6 ms) Diff 1 E2E 1.00x
query58: Previous (1091.0 ms) vs Current (1014.8 ms) Diff 76 E2E 1.08x
query59: Previous (1601.6 ms) vs Current (1399.4 ms) Diff 202 E2E 1.14x
query60: Previous (1569.8 ms) vs Current (1518.2 ms) Diff 51 E2E 1.03x
query61: Previous (1357.6 ms) vs Current (1431.0 ms) Diff -73 E2E 0.95x
query62: Previous (901.8 ms) vs Current (921.0 ms) Diff -19 E2E 0.98x
query63: Previous (903.8 ms) vs Current (777.2 ms) Diff 126 E2E 1.16x
query64: Previous (13170.4 ms) vs Current (13072.8 ms) Diff 97 E2E 1.01x
query65: Previous (3076.6 ms) vs Current (3032.6 ms) Diff 44 E2E 1.01x
query66: Previous (2326.6 ms) vs Current (2304.2 ms) Diff 22 E2E 1.01x
query67: Previous (18200.0 ms) vs Current (18346.8 ms) Diff -146 E2E 0.99x
query68: Previous (1091.6 ms) vs Current (1146.6 ms) Diff -55 E2E 0.95x
query69: Previous (1468.6 ms) vs Current (1471.8 ms) Diff -3 E2E 1.00x
query70: Previous (1663.4 ms) vs Current (1521.0 ms) Diff 142 E2E 1.09x
query71: Previous (2684.6 ms) vs Current (2529.8 ms) Diff 154 E2E 1.06x
query72: Previous (2697.6 ms) vs Current (2676.8 ms) Diff 20 E2E 1.01x
query73: Previous (809.2 ms) vs Current (879.2 ms) Diff -70 E2E 0.92x
query74: Previous (3871.2 ms) vs Current (3882.0 ms) Diff -10 E2E 1.00x
query75: Previous (6280.0 ms) vs Current (6336.0 ms) Diff -56 E2E 0.99x
query76: Previous (2696.4 ms) vs Current (1886.0 ms) Diff 810 E2E 1.43x
query77: Previous (780.2 ms) vs Current (773.0 ms) Diff 7 E2E 1.01x
query78: Previous (7196.6 ms) vs Current (7067.0 ms) Diff 129 E2E 1.02x
query79: Previous (1078.2 ms) vs Current (1164.4 ms) Diff -86 E2E 0.93x
query80: Previous (3464.8 ms) vs Current (3399.6 ms) Diff 65 E2E 1.02x
query81: Previous (2228.0 ms) vs Current (2266.0 ms) Diff -38 E2E 0.98x
query82: Previous (2196.4 ms) vs Current (1903.0 ms) Diff 293 E2E 1.15x
query83: Previous (651.6 ms) vs Current (654.0 ms) Diff -2 E2E 1.00x
query84: Previous (1128.2 ms) vs Current (1030.4 ms) Diff 97 E2E 1.09x
query85: Previous (1634.2 ms) vs Current (1914.6 ms) Diff -280 E2E 0.85x
query86: Previous (897.4 ms) vs Current (922.0 ms) Diff -24 E2E 0.97x
query87: Previous (3044.8 ms) vs Current (2964.0 ms) Diff 80 E2E 1.03x
query88: Previous (3685.4 ms) vs Current (3625.6 ms) Diff 59 E2E 1.02x
query89: Previous (795.6 ms) vs Current (806.6 ms) Diff -11 E2E 0.99x
query90: Previous (986.8 ms) vs Current (844.8 ms) Diff 142 E2E 1.17x
query91: Previous (1151.2 ms) vs Current (1164.8 ms) Diff -13 E2E 0.99x
query92: Previous (456.2 ms) vs Current (512.0 ms) Diff -55 E2E 0.89x
query93: Previous (9548.6 ms) vs Current (9570.0 ms) Diff -21 E2E 1.00x
query94: Previous (3812.8 ms) vs Current (3756.2 ms) Diff 56 E2E 1.02x
query95: Previous (5580.8 ms) vs Current (5419.2 ms) Diff 161 E2E 1.03x
query96: Previous (3956.8 ms) vs Current (4010.6 ms) Diff -53 E2E 0.99x
query97: Previous (1694.6 ms) vs Current (1641.8 ms) Diff 52 E2E 1.03x
query98: Previous (1284.0 ms) vs Current (1339.4 ms) Diff -55 E2E 0.96x
query99: Previous (1526.4 ms) vs Current (1535.2 ms) Diff -8 E2E 0.99x
benchmark: Previous (273600.0 ms) vs Current (271000.0 ms) Diff 2600 E2E 1.01x

@ttnghia
Copy link
Contributor

ttnghia commented Sep 18, 2024

Shorter for easy reading:

Queries of interest: q16 and q94 that use mixed_semi_join.

Speedup results
-----------------
query16: Previous (3262.6 ms) vs Current (3335.4 ms) Diff -72 E2E 0.98x
query94: Previous (3812.8 ms) vs Current (3756.2 ms) Diff 56 E2E 1.02x

@mhaseeb123
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit e68f55c into rapidsai:branch-24.10 Sep 18, 2024
100 checks passed
@mhaseeb123 mhaseeb123 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs Review Waiting for reviewer to review or respond labels Sep 18, 2024
mhaseeb123 added a commit that referenced this pull request Sep 19, 2024
rapids-bot bot pushed a commit that referenced this pull request Sep 20, 2024
rapids-bot bot pushed a commit that referenced this pull request Sep 26, 2024
This PR reapplies changes from #16230 and adds bug fixes and performance improvements for mixed_semi_join.

Authors:
  - Muhammad Haseeb (https://github.com/mhaseeb123)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - MithunR (https://github.com/mythrocks)
  - Nghia Truong (https://github.com/ttnghia)

URL: #16859
copy-pr-bot bot pushed a commit that referenced this pull request Sep 28, 2024
This PR reapplies changes from #16230 and adds bug fixes and performance improvements for mixed_semi_join.

Authors:
  - Muhammad Haseeb (https://github.com/mhaseeb123)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - MithunR (https://github.com/mythrocks)
  - Nghia Truong (https://github.com/ttnghia)

URL: #16859
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants