-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Add support for bucketed writes #22
Labels
cudf_dependency
An issue or PR with this label depends on a new feature in cudf
feature request
New feature or request
P1
Nice to have for release
SQL
part of the SQL/Dataframe plugin
Comments
revans2
added
feature request
New feature or request
? - Needs Triage
Need team to review and classify
SQL
part of the SQL/Dataframe plugin
labels
May 28, 2020
sameerz
added
P1
Nice to have for release
and removed
? - Needs Triage
Need team to review and classify
labels
Oct 13, 2020
This depends on #937 |
wjxiz1992
pushed a commit
to wjxiz1992/spark-rapids
that referenced
this issue
Oct 29, 2020
* Instructions for standalone/yarn wip * Update instructions * Fix typo * Small fixes * jars->jar
We could partially implement this now. |
revans2
added
the
cudf_dependency
An issue or PR with this label depends on a new feature in cudf
label
Feb 18, 2021
To fully implement this we will need full support for bit for bit identical murmur3 hashing. |
tgravescs
pushed a commit
to tgravescs/spark-rapids
that referenced
this issue
Nov 30, 2023
Signed-off-by: spark-rapids automation <[email protected]>
sperlingxx
added a commit
to sperlingxx/spark-rapids
that referenced
this issue
Jan 18, 2024
Signed-off-by: sperlingxx <[email protected]>
res-life
pushed a commit
to res-life/spark-rapids
that referenced
this issue
Jun 27, 2024
* optimzing Expand+Aggregate in sqlw with many count distinct Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * Add GpuBucketingUtils shim to Spark 4.0.0 (NVIDIA#11092) * Add GpuBucketingUtils shim to Spark 4.0.0 * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]> * Improve the diagnostics for 'conv' fallback explain (NVIDIA#11076) * Improve the diagnostics for 'conv' fallback explain Signed-off-by: Jihoon Son <[email protected]> * don't use nil Signed-off-by: Jihoon Son <[email protected]> * the bases should not be an empty string in the error message when the user input is not Signed-off-by: Jihoon Son <[email protected]> * more user-friendly message * Update sql-plugin/src/main/scala/org/apache/spark/sql/rapids/stringFunctions.scala Co-authored-by: Gera Shegalov <[email protected]> --------- Signed-off-by: Jihoon Son <[email protected]> Co-authored-by: Gera Shegalov <[email protected]> * Disable ANSI mode for window function tests [databricks] (NVIDIA#11073) * Disable ANSI mode for window function tests. Fixes NVIDIA#11019. Window function tests fail on Spark 4.0 because of NVIDIA#5114 (and NVIDIA#5120 broadly), because spark-rapids does not support SUM, COUNT, and certain other aggregations in ANSI mode. This commit disables ANSI mode tests for the failing window function tests. These may be revisited, once error/overflow checking is available for ANSI mode in spark-rapids. Signed-off-by: MithunR <[email protected]> * Switch from @ansi_mode_disabled to @disable_ansi_mode. --------- Signed-off-by: MithunR <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]> Signed-off-by: Raza Jafri <[email protected]> Signed-off-by: Jihoon Son <[email protected]> Signed-off-by: MithunR <[email protected]> Co-authored-by: Hongbin Ma (Mahone) <[email protected]> Co-authored-by: Raza Jafri <[email protected]> Co-authored-by: Jihoon Son <[email protected]> Co-authored-by: Gera Shegalov <[email protected]> Co-authored-by: MithunR <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
cudf_dependency
An issue or PR with this label depends on a new feature in cudf
feature request
New feature or request
P1
Nice to have for release
SQL
part of the SQL/Dataframe plugin
Is your feature request related to a problem? Please describe.
The SQL plugin supports partitioned writes but not bucketed writes. the main thing preventing this from working is consistent hashing between the CPU and GPU implementations. This will require us to create a version of the murmur3 hash the matches exactly with what spark does and may need us to write it ourselves as it is likely to be spark specific.
The text was updated successfully, but these errors were encountered: