forked from NVIDIA/spark-rapids
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latest lore framework. #39
Closed
liurenjie1024
wants to merge
75
commits into
nvliyuan:branch-24.06
from
liurenjie1024:renjie/issue-10987-liyuan
Closed
Latest lore framework. #39
liurenjie1024
wants to merge
75
commits into
nvliyuan:branch-24.06
from
liurenjie1024:renjie/issue-10987-liyuan
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Keep dependencies (JNI + private) as 24.06-SNAPSHOT until they're available. Filed an issue (NVIDIA#10867) to remind us to bump up dependencies to 24.08.0-SNAPSHOT. Signed-off-by: Tim Liu <[email protected]>
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
Signed-off-by: Zach Puller <[email protected]>
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
* Fixed Databricks build * Signing off Signed-off-by: Raza Jafri <[email protected]> * Removed unused import --------- Signed-off-by: Raza Jafri <[email protected]>
…IA#10871) Add classloader diagnostics to initShuffleManager error message --------- Signed-off-by: Zach Puller <[email protected]> Co-authored-by: Jason Lowe <[email protected]> Co-authored-by: Gera Shegalov <[email protected]> Co-authored-by: Alessandro Bellina <[email protected]>
…ricks] (NVIDIA#10945) * Revert "Revert "Add Support for Multiple Filtering Keys for Subquery Broadcas…" This reverts commit bb05b17. * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>
Closes NVIDIA#10875 Contributes to NVIDIA#10773 Unjar, cache, and share the test jar content among all test suites from the same jar Test: ```bash mvn package -Dbuildver=330 -pl tests -am -Dsuffixes='.*\.RapidsJsonSuite' ``` Signed-off-by: Gera Shegalov <[email protected]>
…A#10944) * Added shim for BatchScanExec to support Spark 4.0 Signed-off-by: Raza Jafri <[email protected]> * fixed the failing shim --------- Signed-off-by: Raza Jafri <[email protected]>
…hange. (NVIDIA#10863) * Account for `CommandUtils.uncacheTableOrView` signature change. Fixes NVIDIA#10710. This commit accounts for the changes in the signature of `CommandUtils.uncacheTableOrView` in Apache Spark 4.0. (See [SPARK-47191](apache/spark#45289).) Signed-off-by: MithunR <[email protected]> * Removed unnecessary base class. --------- Signed-off-by: MithunR <[email protected]>
This is a new feature adding the parquet support for GpuInsertIntoHiveTable, who only supports text write now. And this feature is tested by the new added tests in this PR. --------- Signed-off-by: Firestarman <[email protected]> Co-authored-by: Jason Lowe <[email protected]>
…ange. (NVIDIA#10857) * Account for PartitionedFileUtil.splitFiles signature change. Fixes NVIDIA#10299. In Apache Spark 4.0, the signature of `PartitionedFileUtil.splitFiles` was changed to remove unused parameters (apache/spark@eabea643c74). This causes the Spark RAPIDS plugin build to break with Spark 4.0. This commit introduces a shim to account for the signature change. Signed-off-by: MithunR <[email protected]> * Common base for PartitionFileUtilsShims. Signed-off-by: MithunR <[email protected]> * Reusing existing PartitionedFileUtilsShims. * More refactor, for pre-3.5 compile. * Updated Copyright date. * Fixed style error. * Re-fixed the copyright year. * Added missing import. --------- Signed-off-by: MithunR <[email protected]>
To fix: NVIDIA#10867 Change rapids private and jni dependency version to 24.08.0-SNAPSHOT Signed-off-by: Tim Liu <[email protected]>
NVIDIA#10947) Prevent '^[0-9]{n}' from being processed as `spark_rapids_jni::literal_range_pattern` that currently only supports "contains", not "starts with" Fixes NVIDIA#10928 Also adding missing tailrec annotations to recursive parser methods. Signed-off-by: Gera Shegalov <[email protected]>
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
* Add support for the renaming of PythonMapInArrow to MapInArrow * Signing off Signed-off-by: Raza Jafri <[email protected]> * Removed the unnecessary base class from 400 * addressed review comments --------- Signed-off-by: Raza Jafri <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Peixin Li <[email protected]>
…itten [skip ci] (NVIDIA#10966) * DO NOT REVIEW Signed-off-by: Peixin Li <[email protected]> * Add default value for REF to avoid overwritten while unexpected manual trigger Signed-off-by: Peixin Li <[email protected]> --------- Signed-off-by: Peixin Li <[email protected]>
* AnalysisException child class Signed-off-by: Raza Jafri <[email protected]> * Use errorClass for reporting AnalysisException * POM changes Signed-off-by: Raza Jafri <[email protected]> * Reuse the RapidsErrorUtils to throw the AnalysisException * Revert "POM changes" This reverts commit 0f765c9. * Updated copyrights * Added the TrampolineUtil method back to handle cases which don't use errorClass * Add doc to the RapidsAnalysisException * addressed review comments * Fixed imports * Moved the RapidsAnalysisException out of TrampolineUtil * fixed imports * addressed review comments * fixed unused import * Removed the TrampolineUtil method for throwing RapidsAnalysisException --------- Signed-off-by: Raza Jafri <[email protected]>
…icks] (NVIDIA#10970) * Incomplete impl of RaiseError for 400 * Removed RaiseError from 400 * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>
…VIDIA#10977) * rewrite multiple literal choice to multiple contains, wip Signed-off-by: Haoyang Li <[email protected]> * fix bug Signed-off-by: Haoyang Li <[email protected]> * optimize memory Signed-off-by: Haoyang Li <[email protected]> * remove debug log Signed-off-by: Haoyang Li <[email protected]> * address comments Signed-off-by: Haoyang Li <[email protected]> * Apply suggestions from code review Co-authored-by: Gera Shegalov <[email protected]> * support abc|def case Signed-off-by: Haoyang Li <[email protected]> * fix 2.13 Signed-off-by: Haoyang Li <[email protected]> * fix 2.13 build Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]> Co-authored-by: Gera Shegalov <[email protected]>
Signed-off-by: Robert (Bobby) Evans <[email protected]>
* concat_null_bug_fix Signed-off-by: fejiang <[email protected]> * concat_null_bug_fix Signed-off-by: fejiang <[email protected]> * Setting modified Signed-off-by: fejiang <[email protected]> * remove comment Signed-off-by: fejiang <[email protected]> * concat considered as empty string Signed-off-by: fejiang <[email protected]> --------- Signed-off-by: fejiang <[email protected]>
We missed spark343 shim for the scala2.13 dist jar on branch-24.06. Add scala2.13 spark343 shim for v24.06.0 Signed-off-by: Tim Liu <[email protected]>
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
Update change log with CLI: \n\n scripts/generate-changelog --token=<GIT_TOKEN> --releases=24.04,24.06 Signed-off-by: jenkins <jenkins@localhost> Co-authored-by: jenkins <jenkins@localhost>
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
This reverts commit d9686d4.
Revert "Add in the ability to fingerprint JSON columns (NVIDIA#11002)" [skip ci]
…1060) Also fixed issue with databricks dependency not being what we said it was. Signed-off-by: Robert (Bobby) Evans <[email protected]>
* Binary dedupe changes for Spark 4.0.0 Signed-off-by: Raza Jafri <[email protected]> * updated comments * Changed the URL for the common classes among shims * renamed spark34-common to spark-shared and renamed relevant variables * addressed review comments * renamed variable from common to shared --------- Signed-off-by: Raza Jafri <[email protected]>
* fix flaky array_item test failures Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix indent Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix whitespace Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
* Calculate parallelism to speed up pre-merge CI Calculate parallelism based on GPU memory to speed up pre-merge CI with appropriate amount of parallelism. But when TEST_PARALLEL > 8 and as it increases, the integration tests running speed will become slower and slower, so we limit TEST_PARALLEL <= 8. Based on this change, and ran pre-merge CI on powerful nodes, we observed the pre-merge CI 1 hour less than on common nodes. 16 CPU/128G Mem/24G GPU : [2hours] VS 8 CPU/64G Mem/16G GPU : [3hours] Note: currently we only have 3 fixed powerful nodes for the pre-merge CI job, so only 1 pre-merge CI be speeded up at the same time Signed-off-by: Tim Liu <[email protected]> * Add a variable to set maximum test parallelism for the integration tests Signed-off-by: Tim Liu <[email protected]> * Fix typo Signed-off-by: Tim Liu <[email protected]> --------- Signed-off-by: Tim Liu <[email protected]>
Signed-off-by: Peixin Li <[email protected]>
…IDIA#10996) * Fallback non-UTC TimeZoneAwareExpression with zoneId instead of timeZone config Signed-off-by: Haoyang Li <[email protected]> * clean up Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>
* feat: Introduce low shuffle merge. Signed-off-by: liurenjie1024 <[email protected]> * fix * Test databricks parallel * Test more databricks parallel * Fix comments * Config && scala 2.13 * Revert * Fix comments * scala 2.13 * Revert unnecessary changes * Revert "Revert unnecessary changes" This reverts commit 9fa4cf2. * restore change --------- Signed-off-by: liurenjie1024 <[email protected]>
This PR adds the GPU support for the bucketing write. - React the code of the dynamic partition single writer and concurrent writer to try to reuse the code as much as possible, and then add in the bucketing write logic for both of them. - Update the bucket check during the plan overriding for the write commands, including InsertIntoHadoopFsRelationCommand, CreateDataSourceTableAsSelectCommand, InsertIntoHiveTable, CreateHiveTableAsSelectCommand. - From 330, Spark also supports HiveHash to generate the bucket IDs, in addition to Murmur3Hash. So the shim object GpuBucketingUtils is introduced to handle the shim things. - This change also adds two functions (tagForHiveBucketingWrite and tagForBucketing) to do the overriding check for the two hashing functions separately. And the Hive write nodes will fall back to CPU when HiveHash is chosen, because HiveHash is not supported on GPU. --------- Signed-off-by: Firestarman <[email protected]>
Signed-off-by: YanxuanLiu <[email protected]>
…ks] (NVIDIA#11044) * Fixed arithmetic_ops_tests * Signing off Signed-off-by: Raza Jafri <[email protected]> * Added a mechanism to add ansi mode per test * Reverted unnecessary change to spark_init_internal.py * Corrected the year in the licence * Only set ansi conf to false when ansi_mode_disabled is set * Addressed review comments * Fixed the method name * Update integration_tests/src/main/python/conftest.py This handles cases like `cache_test.py` which should run with the default conf for `spark.sql.ansi.enabled`. --------- Signed-off-by: Raza Jafri <[email protected]> Co-authored-by: MithunR <[email protected]>
…IDIA#11062) * with call site print, not good because some test cases by design will dup Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * done Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * add file Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix comiple Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * address review comments Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
* optimzing Expand+Aggregate in sqlw with many count distinct Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * Add GpuBucketingUtils shim to Spark 4.0.0 (NVIDIA#11092) * Add GpuBucketingUtils shim to Spark 4.0.0 * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]> * Improve the diagnostics for 'conv' fallback explain (NVIDIA#11076) * Improve the diagnostics for 'conv' fallback explain Signed-off-by: Jihoon Son <[email protected]> * don't use nil Signed-off-by: Jihoon Son <[email protected]> * the bases should not be an empty string in the error message when the user input is not Signed-off-by: Jihoon Son <[email protected]> * more user-friendly message * Update sql-plugin/src/main/scala/org/apache/spark/sql/rapids/stringFunctions.scala Co-authored-by: Gera Shegalov <[email protected]> --------- Signed-off-by: Jihoon Son <[email protected]> Co-authored-by: Gera Shegalov <[email protected]> * Disable ANSI mode for window function tests [databricks] (NVIDIA#11073) * Disable ANSI mode for window function tests. Fixes NVIDIA#11019. Window function tests fail on Spark 4.0 because of NVIDIA#5114 (and NVIDIA#5120 broadly), because spark-rapids does not support SUM, COUNT, and certain other aggregations in ANSI mode. This commit disables ANSI mode tests for the failing window function tests. These may be revisited, once error/overflow checking is available for ANSI mode in spark-rapids. Signed-off-by: MithunR <[email protected]> * Switch from @ansi_mode_disabled to @disable_ansi_mode. --------- Signed-off-by: MithunR <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]> Signed-off-by: Raza Jafri <[email protected]> Signed-off-by: Jihoon Son <[email protected]> Signed-off-by: MithunR <[email protected]> Co-authored-by: Hongbin Ma (Mahone) <[email protected]> Co-authored-by: Raza Jafri <[email protected]> Co-authored-by: Jihoon Son <[email protected]> Co-authored-by: Gera Shegalov <[email protected]> Co-authored-by: MithunR <[email protected]>
* Introduce lore id * Introduce lore id * Fix type * Fix type * Conf * style * part * Dump * Introduce lore framework * Add tests. * Rename test case Signed-off-by: liurenjie1024 <[email protected]> * Fix AQE test * Fix style * Use args to display lore info. * Fix build break --------- Signed-off-by: liurenjie1024 <[email protected]>
* add a heristic to skip agg pass Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * commit doc change Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * refine naming Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix only reduction case Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix compile Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * clean Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix doc Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * reduce premergeci2 Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * reduce premergeci2, 2 Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * use test_parallel to workaround flaky array test Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * address review comment Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * remove comma Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * workaround for ci_scala213 Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * disable agg ratio heruistic by default Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix doc Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]> Co-authored-by: Hongbin Ma (Mahone) <[email protected]>
* case when improvement: avoid copy_if_else Signed-off-by: Chong Gao <[email protected]> * � This is the 1st commit message: case when improvement: avoid copy_if_else Signed-off-by: Chong Gao <[email protected]> � This is the commit message NVIDIA#2: Add test case Add test case Fix code format Use Table.gather instead of a custom kernel Signed-off-by: Chong Gao <[email protected]> --------- Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>
* Fix path in loreinfo * Remove path
* Add HiveHash support on GPU Signed-off-by: Firestarman <[email protected]> * Add integration tests Signed-off-by: Firestarman <[email protected]> * more tests Signed-off-by: Firestarman <[email protected]> --------- Signed-off-by: Firestarman <[email protected]> Co-authored-by: Firestarman <[email protected]>
* Introduce lore id * Introduce lore id * Fix type * Fix type * Conf * style * part * Dump * Introduce lore framework * Add tests. * Rename test case Signed-off-by: liurenjie1024 <[email protected]> * Fix AQE test * Fix style * Use args to display lore info. * Fix build break * Fix path in loreinfo * Remove path * Fix comments * Update configs * Fix comments * Fix config --------- Signed-off-by: liurenjie1024 <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.