Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge branch-24.12 into main #11848

Merged
merged 129 commits into from
Dec 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
614d8f5
Init version 24.12.0-SNAPSHOT
nvauto Sep 24, 2024
86d0f60
Merge pull request #11494 from NVIDIA/branch-24.10
nvauto Sep 24, 2024
9ed9b94
Merge pull request #11495 from NVIDIA/branch-24.10
nvauto Sep 24, 2024
bae19e3
Merge pull request #11504 from NVIDIA/branch-24.10
nvauto Sep 26, 2024
01c3003
Support legacy mode for yyyymmdd format [databricks] (#11493)
res-life Sep 26, 2024
7661abb
Merge pull request #11508 from NVIDIA/branch-24.10
nvauto Sep 26, 2024
4b61b45
Merge pull request #11513 from NVIDIA/branch-24.10
nvauto Sep 26, 2024
0e446ad
Merge pull request #11517 from NVIDIA/branch-24.10
nvauto Sep 27, 2024
9089b7f
Merge pull request #11518 from NVIDIA/branch-24.10
nvauto Sep 27, 2024
e9b89ff
Merge pull request #11523 from NVIDIA/branch-24.10
nvauto Sep 27, 2024
6692d17
Merge pull request #11540 from NVIDIA/branch-24.10
nvauto Sep 28, 2024
2036f16
Update rapids JNI and private dependency to 24.12.0-SNAPSHOT [skip ci…
nvauto Sep 30, 2024
1432649
Merge branch 'branch-24.10' into fix_merge_conflict
revans2 Oct 7, 2024
7a78951
Merge pull request #11563 from revans2/fix_merge_conflict
revans2 Oct 7, 2024
5eeddc6
Spark 4: Fix parquet_test.py [databricks] (#11519)
mythrocks Oct 8, 2024
cd46572
Update test case related to LEACY datetime format to unblock nightly …
res-life Oct 8, 2024
6897713
Add in a basic plugin for dataframe UDF support in Apache Spark (#11561)
revans2 Oct 8, 2024
506d212
Disk spill metric (#11564)
zpuller Oct 8, 2024
58eb33f
Merge branch 'branch-24.10' into fixmerge
jlowe Oct 9, 2024
180da0f
Log reconfigure multi-file thread pool only once (#11571)
gerashegalov Oct 9, 2024
025e62e
Merge pull request #11579 from jlowe/fixmerge
jlowe Oct 9, 2024
e8b78c0
[Spark 4.0] Address test failures in cast_test.py [databricks] (#11559)
mythrocks Oct 9, 2024
e8ac073
avoid long tail tasks due to PrioritySemaphore (#11574)
binmahone Oct 10, 2024
0ba4fd2
addressing jason's comment (#11587)
binmahone Oct 11, 2024
4866941
Merge pull request #11594 from NVIDIA/branch-24.10
nvauto Oct 11, 2024
aca15ab
Fix `collection_ops_tests` for Spark 4.0 [databricks] (#11414)
mythrocks Oct 12, 2024
3744ad2
Merge pull request #11601 from NVIDIA/branch-24.10
nvauto Oct 14, 2024
adc4e95
Merge branch-24.10 into branch-24.12
NvTimLiu Oct 14, 2024
8c55ef3
Merge pull request #11605 from NvTimLiu/fix-auto-merge-conflict-11604
jlowe Oct 14, 2024
2d3e0ec
Disable regex tests to unblock CI (#11606)
jlowe Oct 14, 2024
11964ae
Remove an unused config shuffle.spillThreads (#11595)
abellina Oct 14, 2024
0510a78
Adopt `JSONUtils.concatenateJsonStrings` for concatenating JSON strin…
ttnghia Oct 15, 2024
f8c386e
Change DataSource calendar interval error to fix spark400 build (#11610)
jlowe Oct 15, 2024
ed4c878
Use mvn -f scala2.13/ in the build scripts to build the 2.13 jars (#1…
NvTimLiu Oct 16, 2024
7c18198
`install_deps` changes for Databricks 14.3 [databricks] (#11597)
razajafri Oct 16, 2024
0089d25
Revert "Disable regex tests to unblock CI (#11606)" (#11612)
jlowe Oct 16, 2024
e3f3f51
Ensure repartition overflow test always overflows (#11614)
jlowe Oct 16, 2024
52c91d3
Quick fix for the build script failure of Scala 2.13 jars (#11617)
NvTimLiu Oct 17, 2024
00fe174
Merge pull request #11625 from NVIDIA/branch-24.10
nvauto Oct 18, 2024
b5c2868
Update JSON tests based on a closed/fixed issues (#11631)
revans2 Oct 19, 2024
e9f0e04
Merge pull request #11636 from NVIDIA/branch-24.10
nvauto Oct 21, 2024
f84b593
Spark UT framework: Read Parquet file generated by parquet-thrift Rap…
Feng-Jiang28 Oct 21, 2024
a24b575
Add support for Spark 3.5.3 [databricks] (#11570)
razajafri Oct 21, 2024
36ae266
Disable date/timestamp types by default when parsing JSON (#11640)
ttnghia Oct 22, 2024
b9a1a49
Fix udf-compiler scala2.13 internal return statements (#11553)
abellina Oct 22, 2024
732b25b
Fix `collection_ops_test` for [databricks] 14.3 (#11623)
mythrocks Oct 22, 2024
8e2e627
Spark 4 parquet_writer_test.py fixes (#11615)
rwlee Oct 22, 2024
a071efe
Update to_json to be more generic and fix some bugs (#11642)
revans2 Oct 23, 2024
5ed0a12
Datetime rebasing issue fixed (#11521)
Feng-Jiang28 Oct 24, 2024
db15a61
UT adjust test SPARK-26677: negated null-safe equality comparison (#1…
Feng-Jiang28 Oct 24, 2024
05f40b5
Put DF_UDF plugin code into the main uber jar. (#11634)
revans2 Oct 24, 2024
910b64d
UT adjust override checkScanSchemata & enabling ut of exclude_by_suff…
Feng-Jiang28 Oct 25, 2024
e31a710
Support invalid partToExtract for parse_url (#11661)
thirtiseven Oct 25, 2024
91db040
Support format 'yyyyMMdd HH:mm:ss' for legacy mode (#11658)
res-life Oct 28, 2024
b653ce2
Fix a NPE issue in GpuRand (#11647)
firestarman Oct 28, 2024
986eb5d
Generate classes identical up to the shim package name [databricks] (…
gerashegalov Oct 28, 2024
103e009
[DOC] update the supported OS in download page [skip ci] (#11656)
nvliyuan Oct 29, 2024
a6c4b34
Fix `orc_write_test.py` for [databricks] 14.3 (#11664)
mythrocks Oct 29, 2024
6b27556
Add a new NVTX range for task GPU ownership (#11596)
jihoonson Oct 29, 2024
81d1a3d
Fix race condition with Parquet filter pushdown modifying shared hado…
tgravescs Oct 30, 2024
f0ae2ba
Simplify Transpilation of $ with Extended Line Separator Support in c…
SurajAralihalli Oct 30, 2024
5486a8d
Merge pull request #11677 from NVIDIA/branch-24.10
nvauto Oct 31, 2024
7f8ff1b
Merge remote-tracking branch 'upstream/branch-24.10' into fix-auto-me…
pxLi Oct 31, 2024
2134f2e
Merge pull request #11682 from pxLi/fix-auto-merge-conflict-11679
pxLi Oct 31, 2024
372ca80
Use the new host memory allocation API (#11671)
revans2 Nov 1, 2024
fcede85
Merge pull request #11688 from NVIDIA/branch-24.10
nvauto Nov 4, 2024
4a1baa5
Reserve allocation should be displayed when erroring due to lack of m…
kuhushukla Nov 4, 2024
4d38dba
Exclude shimplify-generated files from scalastyle (#11685)
gerashegalov Nov 4, 2024
35980d6
Preparation for the coming Kudo support (#11667)
firestarman Nov 4, 2024
6e82c44
Skip AQE-join-DPP tests for [databricks] 14.3 (#11644)
mythrocks Nov 4, 2024
f533fc9
Fix skipping fixed_length_char ORC tests on [databricks] > 13.3 (#11652)
mythrocks Nov 4, 2024
2e16ff2
Fix `misc_expr_test` for [databricks] 14.3 (#11670)
mythrocks Nov 4, 2024
5afee5b
Update the Maven repository to download Spark JAR files (#11689)
NvTimLiu Nov 5, 2024
ad4233d
Fix spark400 build due to LogicalRelation signature changes (#11695)
jlowe Nov 6, 2024
ddbbba3
Add Spark 3.4.4 Shim (#11692)
gerashegalov Nov 6, 2024
6100334
Fix `string_test` for [databricks] 14.3 (#11669)
mythrocks Nov 6, 2024
61acf56
Fix Parquet Writer tests on [databricks] 14.3 (#11673)
mythrocks Nov 6, 2024
e13cd55
Add retry in sub hash join (#11706)
firestarman Nov 7, 2024
d208004
Add shim version 344 to LogicalPlanShims.scala (#11710)
SurajAralihalli Nov 8, 2024
7d2fec9
Make delta-lake shim dependencies parametrizable [databricks] (#11697)
gerashegalov Nov 8, 2024
3762569
impalaFile cannot be found by UT framework. (#11707)
Feng-Jiang28 Nov 11, 2024
894b636
Simplify $ transpiling and fix newline character bug (#11703)
SurajAralihalli Nov 12, 2024
862dab0
Let AWS Databricks automatically choose an Availability Zone (#11714)
NvTimLiu Nov 12, 2024
57b8caa
Added Shims for adding Databricks 14.3 Support [databricks] (#11635)
razajafri Nov 13, 2024
a8010cc
Improve JSON scan and `from_json` (#11702)
ttnghia Nov 14, 2024
9b06ae3
Change Databricks 14.3 shim name to spark350db143 (#11728)
NvTimLiu Nov 18, 2024
b16d107
Support multi string contains [databricks] (#11413)
res-life Nov 19, 2024
fd0781d
Add NullIntolerantShim to adapt to Spark 4.0 removing NullIntolerant …
jlowe Nov 19, 2024
3d26c4c
Support profiling for specific stages on a limited number of tasks (#…
thirtiseven Nov 20, 2024
45a54ac
Skip `from_json` overflow tests for [databricks] 14.3 (#11719)
mythrocks Nov 20, 2024
a5413e9
Widen type promotion for decimals with larger scale in Parquet Read […
nartal1 Nov 20, 2024
e1fefa5
Fix leak with RapidsHostColumnBuilder in GpuUserDefinedFunction (#11700)
abellina Nov 20, 2024
20c5281
Add in support for months_between (#11737)
revans2 Nov 20, 2024
d3cda26
Update to Spark 4.0 changing signature of SupportsV1Write.writeWithV1…
jlowe Nov 21, 2024
f2ea943
Integrate with kudo (#11724)
liurenjie1024 Nov 22, 2024
7110cf4
Do not package the Databricks 14.3 shim into the dist jar [skip ci] (…
NvTimLiu Nov 22, 2024
6e6ce33
Add a few more JSON tests for MAP<STRING,STRING> (#11721)
revans2 Nov 22, 2024
a847575
Add NVIDIA Copyright (#11723)
gerashegalov Nov 22, 2024
e5547a1
Remove batch size bytes limits (#11746)
zpuller Nov 22, 2024
cacc3ae
host watermark metric (#11725)
zpuller Nov 22, 2024
daaaf24
Execute `from_json` with struct schema using `JSONUtils.fromJSONToStr…
ttnghia Nov 23, 2024
6cba00d
Print out the current attempt object when OOM inside a retry block (#…
firestarman Nov 25, 2024
6539441
Enable JSON Scan and from_json by default (#11753)
revans2 Nov 25, 2024
938db21
Fix aqe_test failures on [databricks] 14.3. (#11750)
mythrocks Nov 25, 2024
6b90b2f
Add support for asynchronous writing for parquet (#11730)
jihoonson Nov 25, 2024
f5be35e
Fix Kudo batch serializer to only read header in hasNext (#11766)
jlowe Nov 26, 2024
2b6ac11
Avoid using StringBuffer in single-threaded methods. (#11759)
gerashegalov Nov 26, 2024
e3dce9e
Fix query hang when using rapids multithread shuffle manager with kud…
liurenjie1024 Nov 26, 2024
4fa0a1d
repartition-based fallback for hash aggregate v3 (#11712)
binmahone Nov 26, 2024
82c26f1
Append knoguchi22 to blossom-ci whitelist [skip ci] (#11777)
knoguchi22 Nov 26, 2024
ff0ca0f
Ability to decompress snappy and zstd Parquet files via CPU [databric…
jlowe Nov 26, 2024
ed02cfe
Fix `dpp_test.py` failures on [databricks] 14.3 (#11768)
mythrocks Nov 26, 2024
aa2da41
fix issue 11790 (#11792)
binmahone Nov 29, 2024
cb31afb
Fall back to CPU for non-UTC months_between (#11802)
revans2 Dec 3, 2024
738c8e3
exclude previous operator's time out of firstBatchHeuristic (#11794)
binmahone Dec 3, 2024
fb2f72d
Orc writes don't fully support Booleans with nulls (#11763)
kuhushukla Dec 7, 2024
3449c8a
Fixes a leak for the empty nlj iterator (#11832)
abellina Dec 8, 2024
45cdac3
Fix for lead/lag window test failures. (#11823)
mythrocks Dec 9, 2024
96a58d1
Fix leak in isTimeStamp (#11845)
kuhushukla Dec 10, 2024
2cb5a18
Merge branch-24.12 into main
nvauto Dec 10, 2024
1c540c1
Change version to 24.12.0
nvauto Dec 10, 2024
81b0b98
Increase the pre-merge CI timeout to 6 hours (#11857)
NvTimLiu Dec 11, 2024
4b9bb23
[DOC] update doc for 24.12 release [skip ci] (#11841)
nvliyuan Dec 13, 2024
4d7373b
Update rapids JNI and private dependency to 24.12.0 (#11849)
nvauto Dec 16, 2024
22680f5
Update latest changelog [skip ci] (#11851)
nvauto Dec 16, 2024
51a811f
Merge remote-tracking branch 'upstream/branch-24.12' into merge-branc…
YanxuanLiu Dec 16, 2024
795aef8
Remove 350db143 shim's build (#11874)
NvTimLiu Dec 16, 2024
d1ea935
Update latest changelog [skip ci] (#11876)
nvauto Dec 16, 2024
8a8bd3d
Merge branch 'branch-24.12' into merge-branch-24.12-to-main
YanxuanLiu Dec 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 2 additions & 1 deletion .github/workflows/blossom-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,8 @@ jobs:
github.actor == 'Feng-Jiang28' ||
github.actor == 'SurajAralihalli' ||
github.actor == 'jihoonson' ||
github.actor == 'ustcfy'
github.actor == 'ustcfy' ||
github.actor == 'knoguchi22'
)
steps:
- name: Check if comment is issued by authorized person
Expand Down
8 changes: 2 additions & 6 deletions .github/workflows/mvn-verify-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -246,12 +246,10 @@ jobs:
echo "Generated Scala 2.13 build files don't match what's in repository"
exit 1
fi
# change to Scala 2.13 Directory
cd scala2.13
# test command, will retry for 3 times if failed.
max_retry=3; delay=30; i=1
while true; do
mvn package \
mvn package -f scala2.13/ \
-pl integration_tests,tests,tools -am -P 'individual,pre-merge' \
-Dbuildver=${{ matrix.spark-version }} -Dmaven.scalastyle.skip=true \
-Drat.skip=true ${{ env.COMMON_MVN_FLAGS }} && break || {
Expand Down Expand Up @@ -303,12 +301,10 @@ jobs:
echo "Generated Scala 2.13 build files don't match what's in repository"
exit 1
fi
# change to Scala 2.13 Directory
cd scala2.13
# test command, will retry for 3 times if failed.
max_retry=3; delay=30; i=1
while true; do
mvn verify \
mvn verify -f scala2.13/ \
-P "individual,pre-merge,source-javadoc" -Dbuildver=${{ matrix.spark-version }} \
${{ env.COMMON_MVN_FLAGS }} && break || {
if [[ $i -le $max_retry ]]; then
Expand Down
390 changes: 180 additions & 210 deletions CHANGELOG.md

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,15 +127,15 @@ mvn -pl dist -PnoSnapshots package -DskipTests
Verify that shim-specific classes are hidden from a conventional classloader.

```bash
$ javap -cp dist/target/rapids-4-spark_2.12-24.10.1-cuda11.jar com.nvidia.spark.rapids.shims.SparkShimImpl
$ javap -cp dist/target/rapids-4-spark_2.12-24.12.0-cuda11.jar com.nvidia.spark.rapids.shims.SparkShimImpl
Error: class not found: com.nvidia.spark.rapids.shims.SparkShimImpl
```

However, its bytecode can be loaded if prefixed with `spark3XY` not contained in the package name

```bash
$ javap -cp dist/target/rapids-4-spark_2.12-24.10.1-cuda11.jar spark320.com.nvidia.spark.rapids.shims.SparkShimImpl | head -2
Warning: File dist/target/rapids-4-spark_2.12-24.10.1-cuda11.jar(/spark320/com/nvidia/spark/rapids/shims/SparkShimImpl.class) does not contain class spark320.com.nvidia.spark.rapids.shims.SparkShimImpl
$ javap -cp dist/target/rapids-4-spark_2.12-24.12.0-cuda11.jar spark320.com.nvidia.spark.rapids.shims.SparkShimImpl | head -2
Warning: File dist/target/rapids-4-spark_2.12-24.12.0-cuda11.jar(/spark320/com/nvidia/spark/rapids/shims/SparkShimImpl.class) does not contain class spark320.com.nvidia.spark.rapids.shims.SparkShimImpl
Compiled from "SparkShims.scala"
public final class com.nvidia.spark.rapids.shims.SparkShimImpl {
```
Expand Down Expand Up @@ -178,7 +178,7 @@ mvn package -pl dist -am -Dbuildver=340 -DallowConventionalDistJar=true
Verify `com.nvidia.spark.rapids.shims.SparkShimImpl` is conventionally loadable:

```bash
$ javap -cp dist/target/rapids-4-spark_2.12-24.10.1-cuda11.jar com.nvidia.spark.rapids.shims.SparkShimImpl | head -2
$ javap -cp dist/target/rapids-4-spark_2.12-24.12.0-cuda11.jar com.nvidia.spark.rapids.shims.SparkShimImpl | head -2
Compiled from "SparkShims.scala"
public final class com.nvidia.spark.rapids.shims.SparkShimImpl {
```
Expand Down
117 changes: 117 additions & 0 deletions DF_UDF_README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Scala / Java UDFS implemented using data frame

User Defined Functions (UDFs) are used for a number of reasons in Apache Spark. Much of the time it is to implement
logic that is either very difficult or impossible to implement using existing SQL/Dataframe APIs directly. But they
are also used as a way to standardize processing logic across an organization or for code reused.

But UDFs come with some downsides. The biggest one is visibility into the processing being done. SQL is a language that
can be highly optimized. But a UDF in most cases is a black box, that the SQL optimizer cannot do anything about.
This can result in less than ideal query planning. Additionally, accelerated execution environments, like the
RAPIDS Accelerator for Apache Spark have no easy way to replace UDFs with accelerated versions, which can result in
slow performance.

This attempts to add visibility to the code reuse use case by providing a way to implement a UDF in terms of dataframe
commands.

## Setup

The dataframe UDF plugin is packaged in the same jar as the RAPIDS Accelerator for Apache Spark. This jar will need to
be added as a compile time dependency for code that wants to use this feature as well as adding the jar to your Spark
classpath just like you would do for GPU acceleration.

If you plan to not use the GPU accelerated processing, but still want dataframe UDF support on CPU applications then
add `com.nvidia.spark.DFUDFPlugin` to the `spark.sql.extensions` config. If you do use GPU accelerated processing
the RAPIDS Plugin will enable this automatically. You don't need to set the `spark.sql.extensions` config, but it
won't hurt anything if you do add it. Now you can implement a UDF in terms of Dataframe operations.

## Usage

```scala
import com.nvidia.spark.functions._

import org.apache.spark.sql.Column
import org.apache.spark.sql.functions._

val sum_array = df_udf((longArray: Column) =>
aggregate(longArray,
lit(0L),
(a, b) => coalesce(a, lit(0L)) + coalesce(b, lit(0L)),
a => a))
spark.udf.register("sum_array", sum_array)
```

You can then use `sum_array` however you would have used any other UDF. This allows you to provide a drop in replacement
implementation of an existing UDF.

```scala
Seq(Array(1L, 2L, 3L)).toDF("data").selectExpr("sum_array(data) as result").show()

+------+
|result|
+------+
| 6|
+------+
```

Java APIs are also supported and should work the same as Spark's UDFs

```java
import com.nvidia.spark.functions.df_udf

import org.apache.spark.sql.*;
import org.apache.spark.sql.api.java.UDF2;
import org.apache.spark.sql.expressions.UserDefinedFunction;


UserDefinedFunction myAdd = df_udf((Column lhs, Column rhs) -> lhs + rhs)
spark.udf().register("myadd", myAdd)

spark.sql("SELECT myadd(1, 1) as r").show();
// +--+
// | r|
// +--+
// | 2|
// +--+

```

## Type Checks

DataFrame APIs do not provide type safety when writing the code and that is the same here. There are no builtin type
checks for inputs yet. Also, because of how types are resolved in Spark there is no way to adjust the query based on
the types passed in. Type checks are handled by the SQL planner/optimizer after the UDF has been replaced. This means
that the final SQL will not violate any type safety, but it also means that the errors might be confusing. For example,
if I passed in an `ARRAY<DOUBLE>` to `sum_array` instead of an `ARRAY<LONG>` I would get an error like

```scala
Seq(Array(1.0, 2.0, 3.0)).toDF("data").selectExpr("sum_array(data) as result").show()
org.apache.spark.sql.AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "aggregate(data, 0, lambdafunction((coalesce(namedlambdavariable(), 0) + coalesce(namedlambdavariable(), 0)), namedlambdavariable(), namedlambdavariable()), lambdafunction(namedlambdavariable(), namedlambdavariable()))" due to data type mismatch: Parameter 3 requires the "BIGINT" type, however "lambdafunction((coalesce(namedlambdavariable(), 0) + coalesce(namedlambdavariable(), 0)), namedlambdavariable(), namedlambdavariable())" has the type "DOUBLE".; line 1 pos 0;
Project [aggregate(data#46, 0, lambdafunction((cast(coalesce(lambda x_9#49L, 0) as double) + coalesce(lambda y_10#50, cast(0 as double))), lambda x_9#49L, lambda y_10#50, false), lambdafunction(lambda x_11#51L, lambda x_11#51L, false)) AS result#48L]
+- Project [value#43 AS data#46]
+- LocalRelation [value#43]

at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:269)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:256)
```

Which is not as simple to understand as a normal UDF.

```scala
val sum_array = udf((a: Array[Long]) => a.sum)

spark.udf.register("sum_array", sum_array)

Seq(Array(1.0, 2.0, 3.0)).toDF("data").selectExpr("sum_array(data) as result").show()
org.apache.spark.sql.AnalysisException: [CANNOT_UP_CAST_DATATYPE] Cannot up cast array element from "DOUBLE" to "BIGINT".
The type path of the target object is:
- array element class: "long"
- root class: "[J"
You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object
at org.apache.spark.sql.errors.QueryCompilationErrors$.upCastFailureError(QueryCompilationErrors.scala:285)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveUpCast$$fail(Analyzer.scala:3646)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$57$$anonfun$applyOrElse$234.applyOrElse(Analyzer.scala:3677)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$57$$anonfun$applyOrElse$234.applyOrElse(Analyzer.scala:3654)
```

We hope to add optional type checks in the future.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ as a `provided` dependency.
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark_2.12</artifactId>
<version>24.10.1</version>
<version>24.12.0</version>
<scope>provided</scope>
</dependency>
```
Loading
Loading