Latest lore framework. #39

liurenjie1024 · 2024-07-02T02:46:13Z

No description provided.

Keep dependencies (JNI + private) as 24.06-SNAPSHOT until they're available. Filed an issue (NVIDIA#10867) to remind us to bump up dependencies to 24.08.0-SNAPSHOT. Signed-off-by: Tim Liu <[email protected]>

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Signed-off-by: Zach Puller <[email protected]>

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

* Fixed Databricks build * Signing off Signed-off-by: Raza Jafri <[email protected]> * Removed unused import --------- Signed-off-by: Raza Jafri <[email protected]>

…IA#10871) Add classloader diagnostics to initShuffleManager error message --------- Signed-off-by: Zach Puller <[email protected]> Co-authored-by: Jason Lowe <[email protected]> Co-authored-by: Gera Shegalov <[email protected]> Co-authored-by: Alessandro Bellina <[email protected]>

…ricks] (NVIDIA#10945) * Revert "Revert "Add Support for Multiple Filtering Keys for Subquery Broadcas…" This reverts commit bb05b17. * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>

Closes NVIDIA#10875 Contributes to NVIDIA#10773 Unjar, cache, and share the test jar content among all test suites from the same jar Test: ```bash mvn package -Dbuildver=330 -pl tests -am -Dsuffixes='.*\.RapidsJsonSuite' ``` Signed-off-by: Gera Shegalov <[email protected]>

…A#10944) * Added shim for BatchScanExec to support Spark 4.0 Signed-off-by: Raza Jafri <[email protected]> * fixed the failing shim --------- Signed-off-by: Raza Jafri <[email protected]>

…hange. (NVIDIA#10863) * Account for `CommandUtils.uncacheTableOrView` signature change. Fixes NVIDIA#10710. This commit accounts for the changes in the signature of `CommandUtils.uncacheTableOrView` in Apache Spark 4.0. (See [SPARK-47191](apache/spark#45289).) Signed-off-by: MithunR <[email protected]> * Removed unnecessary base class. --------- Signed-off-by: MithunR <[email protected]>

This is a new feature adding the parquet support for GpuInsertIntoHiveTable, who only supports text write now. And this feature is tested by the new added tests in this PR. --------- Signed-off-by: Firestarman <[email protected]> Co-authored-by: Jason Lowe <[email protected]>

…ange. (NVIDIA#10857) * Account for PartitionedFileUtil.splitFiles signature change. Fixes NVIDIA#10299. In Apache Spark 4.0, the signature of `PartitionedFileUtil.splitFiles` was changed to remove unused parameters (apache/spark@eabea643c74). This causes the Spark RAPIDS plugin build to break with Spark 4.0. This commit introduces a shim to account for the signature change. Signed-off-by: MithunR <[email protected]> * Common base for PartitionFileUtilsShims. Signed-off-by: MithunR <[email protected]> * Reusing existing PartitionedFileUtilsShims. * More refactor, for pre-3.5 compile. * Updated Copyright date. * Fixed style error. * Re-fixed the copyright year. * Added missing import. --------- Signed-off-by: MithunR <[email protected]>

To fix: NVIDIA#10867 Change rapids private and jni dependency version to 24.08.0-SNAPSHOT Signed-off-by: Tim Liu <[email protected]>

NVIDIA#10947) Prevent '^[0-9]{n}' from being processed as `spark_rapids_jni::literal_range_pattern` that currently only supports "contains", not "starts with" Fixes NVIDIA#10928 Also adding missing tailrec annotations to recursive parser methods. Signed-off-by: Gera Shegalov <[email protected]>

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

* Add support for the renaming of PythonMapInArrow to MapInArrow * Signing off Signed-off-by: Raza Jafri <[email protected]> * Removed the unnecessary base class from 400 * addressed review comments --------- Signed-off-by: Raza Jafri <[email protected]>

Signed-off-by: Firestarman <[email protected]>

Signed-off-by: Peixin Li <[email protected]>

…itten [skip ci] (NVIDIA#10966) * DO NOT REVIEW Signed-off-by: Peixin Li <[email protected]> * Add default value for REF to avoid overwritten while unexpected manual trigger Signed-off-by: Peixin Li <[email protected]> --------- Signed-off-by: Peixin Li <[email protected]>

* AnalysisException child class Signed-off-by: Raza Jafri <[email protected]> * Use errorClass for reporting AnalysisException * POM changes Signed-off-by: Raza Jafri <[email protected]> * Reuse the RapidsErrorUtils to throw the AnalysisException * Revert "POM changes" This reverts commit 0f765c9. * Updated copyrights * Added the TrampolineUtil method back to handle cases which don't use errorClass * Add doc to the RapidsAnalysisException * addressed review comments * Fixed imports * Moved the RapidsAnalysisException out of TrampolineUtil * fixed imports * addressed review comments * fixed unused import * Removed the TrampolineUtil method for throwing RapidsAnalysisException --------- Signed-off-by: Raza Jafri <[email protected]>

…icks] (NVIDIA#10970) * Incomplete impl of RaiseError for 400 * Removed RaiseError from 400 * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>

…VIDIA#10977) * rewrite multiple literal choice to multiple contains, wip Signed-off-by: Haoyang Li <[email protected]> * fix bug Signed-off-by: Haoyang Li <[email protected]> * optimize memory Signed-off-by: Haoyang Li <[email protected]> * remove debug log Signed-off-by: Haoyang Li <[email protected]> * address comments Signed-off-by: Haoyang Li <[email protected]> * Apply suggestions from code review Co-authored-by: Gera Shegalov <[email protected]> * support abc|def case Signed-off-by: Haoyang Li <[email protected]> * fix 2.13 Signed-off-by: Haoyang Li <[email protected]> * fix 2.13 build Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]> Co-authored-by: Gera Shegalov <[email protected]>

Signed-off-by: Robert (Bobby) Evans <[email protected]>

* concat_null_bug_fix Signed-off-by: fejiang <[email protected]> * concat_null_bug_fix Signed-off-by: fejiang <[email protected]> * Setting modified Signed-off-by: fejiang <[email protected]> * remove comment Signed-off-by: fejiang <[email protected]> * concat considered as empty string Signed-off-by: fejiang <[email protected]> --------- Signed-off-by: fejiang <[email protected]>

We missed spark343 shim for the scala2.13 dist jar on branch-24.06. Add scala2.13 spark343 shim for v24.06.0 Signed-off-by: Tim Liu <[email protected]>

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Update change log with CLI: \n\n scripts/generate-changelog --token=<GIT_TOKEN> --releases=24.04,24.06 Signed-off-by: jenkins <jenkins@localhost> Co-authored-by: jenkins <jenkins@localhost>

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

This reverts commit d9686d4.

Revert "Add in the ability to fingerprint JSON columns (NVIDIA#11002)" [skip ci]

…1060) Also fixed issue with databricks dependency not being what we said it was. Signed-off-by: Robert (Bobby) Evans <[email protected]>

* Binary dedupe changes for Spark 4.0.0 Signed-off-by: Raza Jafri <[email protected]> * updated comments * Changed the URL for the common classes among shims * renamed spark34-common to spark-shared and renamed relevant variables * addressed review comments * renamed variable from common to shared --------- Signed-off-by: Raza Jafri <[email protected]>

)

* fix flaky array_item test failures Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix indent Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix whitespace Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* Calculate parallelism to speed up pre-merge CI Calculate parallelism based on GPU memory to speed up pre-merge CI with appropriate amount of parallelism. But when TEST_PARALLEL > 8 and as it increases, the integration tests running speed will become slower and slower, so we limit TEST_PARALLEL <= 8. Based on this change, and ran pre-merge CI on powerful nodes, we observed the pre-merge CI 1 hour less than on common nodes. 16 CPU/128G Mem/24G GPU : [2hours] VS 8 CPU/64G Mem/16G GPU : [3hours] Note: currently we only have 3 fixed powerful nodes for the pre-merge CI job, so only 1 pre-merge CI be speeded up at the same time Signed-off-by: Tim Liu <[email protected]> * Add a variable to set maximum test parallelism for the integration tests Signed-off-by: Tim Liu <[email protected]> * Fix typo Signed-off-by: Tim Liu <[email protected]> --------- Signed-off-by: Tim Liu <[email protected]>

Signed-off-by: Peixin Li <[email protected]>

…IDIA#10996) * Fallback non-UTC TimeZoneAwareExpression with zoneId instead of timeZone config Signed-off-by: Haoyang Li <[email protected]> * clean up Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>

* feat: Introduce low shuffle merge. Signed-off-by: liurenjie1024 <[email protected]> * fix * Test databricks parallel * Test more databricks parallel * Fix comments * Config && scala 2.13 * Revert * Fix comments * scala 2.13 * Revert unnecessary changes * Revert "Revert unnecessary changes" This reverts commit 9fa4cf2. * restore change --------- Signed-off-by: liurenjie1024 <[email protected]>

This PR adds the GPU support for the bucketing write. - React the code of the dynamic partition single writer and concurrent writer to try to reuse the code as much as possible, and then add in the bucketing write logic for both of them. - Update the bucket check during the plan overriding for the write commands, including InsertIntoHadoopFsRelationCommand, CreateDataSourceTableAsSelectCommand, InsertIntoHiveTable, CreateHiveTableAsSelectCommand. - From 330, Spark also supports HiveHash to generate the bucket IDs, in addition to Murmur3Hash. So the shim object GpuBucketingUtils is introduced to handle the shim things. - This change also adds two functions (tagForHiveBucketingWrite and tagForBucketing) to do the overriding check for the two hashing functions separately. And the Hive write nodes will fall back to CPU when HiveHash is chosen, because HiveHash is not supported on GPU. --------- Signed-off-by: Firestarman <[email protected]>

Signed-off-by: YanxuanLiu <[email protected]>

…ks] (NVIDIA#11044) * Fixed arithmetic_ops_tests * Signing off Signed-off-by: Raza Jafri <[email protected]> * Added a mechanism to add ansi mode per test * Reverted unnecessary change to spark_init_internal.py * Corrected the year in the licence * Only set ansi conf to false when ansi_mode_disabled is set * Addressed review comments * Fixed the method name * Update integration_tests/src/main/python/conftest.py This handles cases like `cache_test.py` which should run with the default conf for `spark.sql.ansi.enabled`. --------- Signed-off-by: Raza Jafri <[email protected]> Co-authored-by: MithunR <[email protected]>

…IDIA#11062) * with call site print, not good because some test cases by design will dup Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * done Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * add file Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix comiple Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * address review comments Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* optimzing Expand+Aggregate in sqlw with many count distinct Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * Add GpuBucketingUtils shim to Spark 4.0.0 (NVIDIA#11092) * Add GpuBucketingUtils shim to Spark 4.0.0 * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]> * Improve the diagnostics for 'conv' fallback explain (NVIDIA#11076) * Improve the diagnostics for 'conv' fallback explain Signed-off-by: Jihoon Son <[email protected]> * don't use nil Signed-off-by: Jihoon Son <[email protected]> * the bases should not be an empty string in the error message when the user input is not Signed-off-by: Jihoon Son <[email protected]> * more user-friendly message * Update sql-plugin/src/main/scala/org/apache/spark/sql/rapids/stringFunctions.scala Co-authored-by: Gera Shegalov <[email protected]> --------- Signed-off-by: Jihoon Son <[email protected]> Co-authored-by: Gera Shegalov <[email protected]> * Disable ANSI mode for window function tests [databricks] (NVIDIA#11073) * Disable ANSI mode for window function tests. Fixes NVIDIA#11019. Window function tests fail on Spark 4.0 because of NVIDIA#5114 (and NVIDIA#5120 broadly), because spark-rapids does not support SUM, COUNT, and certain other aggregations in ANSI mode. This commit disables ANSI mode tests for the failing window function tests. These may be revisited, once error/overflow checking is available for ANSI mode in spark-rapids. Signed-off-by: MithunR <[email protected]> * Switch from @ansi_mode_disabled to @disable_ansi_mode. --------- Signed-off-by: MithunR <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]> Signed-off-by: Raza Jafri <[email protected]> Signed-off-by: Jihoon Son <[email protected]> Signed-off-by: MithunR <[email protected]> Co-authored-by: Hongbin Ma (Mahone) <[email protected]> Co-authored-by: Raza Jafri <[email protected]> Co-authored-by: Jihoon Son <[email protected]> Co-authored-by: Gera Shegalov <[email protected]> Co-authored-by: MithunR <[email protected]>

* Introduce lore id * Introduce lore id * Fix type * Fix type * Conf * style * part * Dump * Introduce lore framework * Add tests. * Rename test case Signed-off-by: liurenjie1024 <[email protected]> * Fix AQE test * Fix style * Use args to display lore info. * Fix build break --------- Signed-off-by: liurenjie1024 <[email protected]>

* add a heristic to skip agg pass Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * commit doc change Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * refine naming Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix only reduction case Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix compile Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * clean Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix doc Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * reduce premergeci2 Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * reduce premergeci2, 2 Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * use test_parallel to workaround flaky array test Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * address review comment Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * remove comma Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * workaround for ci_scala213 Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * disable agg ratio heruistic by default Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix doc Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]> Co-authored-by: Hongbin Ma (Mahone) <[email protected]>

* case when improvement: avoid copy_if_else Signed-off-by: Chong Gao <[email protected]> * � This is the 1st commit message: case when improvement: avoid copy_if_else Signed-off-by: Chong Gao <[email protected]> � This is the commit message NVIDIA#2: Add test case Add test case Fix code format Use Table.gather instead of a custom kernel Signed-off-by: Chong Gao <[email protected]> --------- Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>

* Fix path in loreinfo * Remove path

* Add HiveHash support on GPU Signed-off-by: Firestarman <[email protected]> * Add integration tests Signed-off-by: Firestarman <[email protected]> * more tests Signed-off-by: Firestarman <[email protected]> --------- Signed-off-by: Firestarman <[email protected]> Co-authored-by: Firestarman <[email protected]>

* Introduce lore id * Introduce lore id * Fix type * Fix type * Conf * style * part * Dump * Introduce lore framework * Add tests. * Rename test case Signed-off-by: liurenjie1024 <[email protected]> * Fix AQE test * Fix style * Use args to display lore info. * Fix build break * Fix path in loreinfo * Remove path * Fix comments * Update configs * Fix comments * Fix config --------- Signed-off-by: liurenjie1024 <[email protected]>

NvTimLiu and others added 30 commits May 22, 2024 23:06

Init version 24.08.0-SNAPSHOT

f9076a0

Keep dependencies (JNI + private) as 24.06-SNAPSHOT until they're available. Filed an issue (NVIDIA#10867) to remind us to bump up dependencies to 24.08.0-SNAPSHOT. Signed-off-by: Tim Liu <[email protected]>

Merge pull request NVIDIA#10879 from NVIDIA/branch-24.06

02a70d4

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10883 from NVIDIA/branch-24.06

0df3d05

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10885 from NVIDIA/branch-24.06

800ca6b

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10888 from NVIDIA/branch-24.06

ec9221f

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10926 from NVIDIA/branch-24.06

8a13793

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10927 from NVIDIA/branch-24.06

4e4be54

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

append zpuller to authorized user of blossom-ci (NVIDIA#10929)

02f4595

Signed-off-by: Zach Puller <[email protected]>

Merge pull request NVIDIA#10932 from NVIDIA/branch-24.06

2e8d43f

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10935 from NVIDIA/branch-24.06

2dce03d

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10936 from NVIDIA/branch-24.06

69cca07

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10937 from NVIDIA/branch-24.06

6086cac

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10939 from NVIDIA/branch-24.06

35b1575

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Fixed Databricks build [databricks] (NVIDIA#10933)

f0b13ed

* Fixed Databricks build * Signing off Signed-off-by: Raza Jafri <[email protected]> * Removed unused import --------- Signed-off-by: Raza Jafri <[email protected]>

Added Shim for BatchScanExec to Support Spark 4.0 [databricks] (NVIDI…

a7cdaa9

…A#10944) * Added shim for BatchScanExec to support Spark 4.0 Signed-off-by: Raza Jafri <[email protected]> * fixed the failing shim --------- Signed-off-by: Raza Jafri <[email protected]>

Change dependency version to 24.08.0-SNAPSHOT (NVIDIA#10949)

2a86bb5

To fix: NVIDIA#10867 Change rapids private and jni dependency version to 24.08.0-SNAPSHOT Signed-off-by: Tim Liu <[email protected]>

Merge pull request NVIDIA#10954 from NVIDIA/branch-24.06

bbdcac0

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

fix build errors for 4.0 shim (NVIDIA#10952)

1be42d4

Signed-off-by: Firestarman <[email protected]>

Add new blossom-ci allowed user (NVIDIA#10959)

5750ace

Signed-off-by: Peixin Li <[email protected]>

Move Support for RaiseError to a Shim Excluding Spark 4.0.0 [databr…

3111e2b

…icks] (NVIDIA#10970) * Incomplete impl of RaiseError for 400 * Removed RaiseError from 400 * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>

thirtiseven and others added 29 commits June 12, 2024 08:03

Add in the ability to fingerprint JSON columns (NVIDIA#11002)

d9686d4

Signed-off-by: Robert (Bobby) Evans <[email protected]>

Add spark343 shim for scala2.13 dist jar (NVIDIA#11052)

2bc5ab6

We missed spark343 shim for the scala2.13 dist jar on branch-24.06. Add scala2.13 spark343 shim for v24.06.0 Signed-off-by: Tim Liu <[email protected]>

Merge pull request NVIDIA#11055 from NVIDIA/branch-24.06

f355af5

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Update latest changelog [skip ci] (NVIDIA#11056)

4da4d4a

Update change log with CLI: \n\n scripts/generate-changelog --token=<GIT_TOKEN> --releases=24.04,24.06 Signed-off-by: jenkins <jenkins@localhost> Co-authored-by: jenkins <jenkins@localhost>

Merge pull request NVIDIA#11057 from NVIDIA/branch-24.06

05187aa

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Revert "Add in the ability to fingerprint JSON columns (NVIDIA#11002)"

cfd8f00

This reverts commit d9686d4.

Merge pull request NVIDIA#11059 from revans2/revert_json_datagen

900ae6f

Revert "Add in the ability to fingerprint JSON columns (NVIDIA#11002)" [skip ci]

Add in the ability to fingerprint JSON columns [databricks] (NVIDIA#1…

531a9f5

…1060) Also fixed issue with databricks dependency not being what we said it was. Signed-off-by: Robert (Bobby) Evans <[email protected]>

[FEA] Increase parallelism of deltalake test on databricks (NVIDIA#11051

356d5a1

)

WAR numpy2 failed fastparquet compatibility issue (NVIDIA#11072)

6eb854d

Signed-off-by: Peixin Li <[email protected]>

upgrade actions version (NVIDIA#11086)

18ec4b2

Signed-off-by: YanxuanLiu <[email protected]>

Fix lore path serde. (NVIDIA#34)

594338a

* Fix path in loreinfo * Remove path

Fix comments (NVIDIA#35)

4615429

liurenjie1024 closed this Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latest lore framework. #39

Latest lore framework. #39

liurenjie1024 commented Jul 2, 2024

Latest lore framework. #39

Latest lore framework. #39

Conversation

liurenjie1024 commented Jul 2, 2024