-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[comet-parquet-exec] Merge upstream/main and resolve conflicts #1183
Merged
andygrove
merged 50 commits into
apache:comet-parquet-exec
from
mbutrovich:merge_upstream_main
Dec 20, 2024
Merged
[comet-parquet-exec] Merge upstream/main and resolve conflicts #1183
andygrove
merged 50 commits into
apache:comet-parquet-exec
from
mbutrovich:merge_upstream_main
Dec 20, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* feat: support array_append * formatted code * rewrite array_append plan to match spark behaviour and fixed bug in QueryPlan serde * remove unwrap * Fix for Spark 3.3 * refactor array_append binary expression serde code * Disabled array_append test for spark 4.0+
apache#1062) * Require offHeap memory * remove unused import * use off heap memory in stability tests * reorder imports
* Update version number for build * update docs
* update TPC-H results * update Maven links * update benchmarking guide and add TPC-DS results * include q72
## Which issue does this PR close? Closes apache#1067 ## Rationale for this change Bug fix. A few expressions were failing some unsigned type related tests ## What changes are included in this PR? - For `u8`/`u16`, switched to use `generate_cast_to_signed!` in order to copy full i16/i32 width instead of padding zeros in the higher bits - `u64` becomes `Decimal(20, 0)` but there was a bug in `round()` (`>` vs `>=`) ## How are these changes tested? Put back tests for unsigned types
* include first batch in ScanExec metrics * record row count metric * fix regression
* Add native metrics for plan creation * make messages consistent * Include get_next_batch cost in metrics * formatting * fix double count of rows
* Part of the implementation of array_insert * Missing methods * Working version * Reformat code * Fix code-style * Add comments about spark's implementation. * Implement negative indices + fix tests for spark < 3.4 * Fix code-style * Fix scalastyle * Fix tests for spark < 3.4 * Fixes & tests - added test for the negative index - added test for the legacy spark mode * Use assume(isSpark34Plus) in tests * Test else-branch & improve coverage * Update native/spark-expr/src/list.rs Co-authored-by: Andy Grove <[email protected]> * Fix fallback test In one case there is a zero in index and test fails due to spark error * Adjust the behaviour for the NULL case to Spark * Move the logic of type checking to the method * Fix code-style --------- Co-authored-by: Andy Grove <[email protected]>
…apache#1086) * enable decimal to decimal cast of different precision and scale * add more test cases for negative scale and higher precision * add check for compatibility for decimal to decimal * fix code style * Update spark/src/main/scala/org/apache/comet/expressions/CometCast.scala Co-authored-by: Andy Grove <[email protected]> * fix the nit in comment --------- Co-authored-by: himadripal <[email protected]> Co-authored-by: Andy Grove <[email protected]>
* fix: Use RDD partition index * fix * fix * fix
…pache#1129) * Use exact class comparison for parquet scan * Add test * Add comment
* fix metrics issues * clippy * update tests
…iew (apache#1119) * Add more technical detail and new diagram to Comet plugin overview * update diagram * add info on Arrow IPC * update diagram * update diagram * update docs * address feedback
* save * remove shuffle jvm metric and update tuning guide * docs * add source for all ScanExecs * address feedback * address feedback
* Remove unused StringView struct * remove more dead code
* add some notes on shuffle * reads * improve docs
## Which issue does this PR close? Part of apache#372 and apache#551 ## Rationale for this change To be ready for Spark 4.0 ## What changes are included in this PR? This PR enables more Spark 4.0 tests that were fixed by recent changes ## How are these changes tested? tests enabled
* Refactor cast to use SparkCastOptions param * update tests * update benches * update benches * update benches
…che#1152) * move aggregate expressions to spark-expr crate * move more expressions * move benchmark * normalize_nan * bitwise not * comet scalar funcs * update bench imports
## Which issue does this PR close? Part of apache#372 and apache#551 ## Rationale for this change To be ready for Spark 4.0 ## What changes are included in this PR? This PR fixes the new test SPARK-47120 added in Spark 4.0 ## How are these changes tested? tests enabled
…e#1164) * Move string kernels and expressions to spark-expr crate * remove unused hash kernel * remove unused dependencies
…factoring (apache#1165) * move CheckOverflow to spark-expr crate * move NegativeExpr to spark-expr crate * move UnboundColumn to spark-expr crate * move ExpandExec from execution::datafusion::operators to execution::operators * refactoring to remove datafusion subpackage * update imports in benches * fix * fix
…he#1167) * Add ignored tests for reading structs from Parquet * add basic map test * add tests for Map and Array
…ache#1169) * Add Spark-compatible SchemaAdapterFactory implementation * remove prototype code * fix * refactor * implement more cast logic * implement more cast logic * add basic test * improve test * cleanup * fmt * add support for casting unsigned int to signed int * clippy * address feedback * fix test
## Which issue does this PR close? ## Rationale for this change After apache#1062 We have not running Spark tests for native execution ## What changes are included in this PR? Removed the off heap requirement for testing ## How are these changes tested? Bringing back Spark tests for native execution
* improve shuffle metrics * docs * more metrics * refactor * address feedback
# Conflicts: # native/Cargo.lock # native/Cargo.toml # native/core/src/execution/jni_api.rs # native/core/src/execution/planner.rs # native/core/src/execution/schema_adapter.rs # native/spark-expr/src/cast.rs # native/spark-expr/src/lib.rs # native/spark-expr/src/test_common/mod.rs # native/spark-expr/src/utils.rs # spark/src/main/scala/org/apache/comet/CometExecIterator.scala # spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala # spark/src/main/scala/org/apache/comet/Native.scala # spark/src/main/scala/org/apache/spark/sql/comet/operators.scala # spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala # spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala
See #1182 to see the diff of this branch versus upstream/main, which should give an idea of what comet-parquet-exec feature branch's diff against upstream/main is after this merges. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This reflects a merge of upstream/main (as of this morning) and then a resolution of the conflicts. This catches up comet-parquet-exec feature branch on about a month of changes, including a release.
COMET_FULL_NATIVE_SCAN_ENABLED
:Tests: succeeded 728, failed 64, canceled 2, ignored 52, pending 0
COMET_NATIVE_RECORDBATCH_READER_ENABLED
:Tests: succeeded 722, failed 70, canceled 2, ignored 52, pending 0
These new test failures are mostly converting timestamps to timestamps, for example:
Cause: org.apache.comet.CometNativeException: Cannot cast file schema field _19 of type Timestamp(Microsecond, Some("UTC")) to required schema field of type Timestamp(Microsecond, Some("America/Los_Angeles"))
@andygrove understands the issue, so we'll fix it after this PR merges.