Skip to content

Commit

Permalink
merged from branch-0.5
Browse files Browse the repository at this point in the history
Squashed commit of the following:

commit dc66f03
commit f66c3ef
...
commit 72b2e12

Signed-off-by: Firestarman <[email protected]>
  • Loading branch information
firestarman committed Mar 8, 2021
1 parent 3185811 commit 0cedde8
Show file tree
Hide file tree
Showing 118 changed files with 2,980 additions and 11,743 deletions.
283 changes: 282 additions & 1 deletion CHANGELOG.md

Large diffs are not rendered by default.

10 changes: 0 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,8 @@ The RAPIDS Accelerator for Apache Spark provides a set of plugins for
[Apache Spark](https://spark.apache.org) that leverage GPUs to accelerate processing
via the [RAPIDS](https://rapids.ai) libraries and [UCX](https://www.openucx.org/).

![TPCxBB Like query results](./docs/img/tpcxbb-like-results.png "TPCxBB Like Query Results")

The chart above shows results from running ETL queries based off of the
[TPCxBB benchmark](http://www.tpc.org/tpcx-bb/default.asp). These are **not** official results in
any way. It uses a 10TB Dataset (scale factor 10,000), stored in parquet. The processing happened on
a two node DGX-2 cluster. Each node has 96 CPU cores, 1.5TB host memory, 16 V100 GPUs, and 512 GB
GPU memory.

To get started and try the plugin out use the [getting started guide](./docs/get-started/getting-started.md).

For more information about these benchmarks, see the [benchmark guide](./docs/benchmarks.md).

## Compatibility

The SQL plugin tries to produce results that are bit for bit identical with Apache Spark.
Expand Down
6 changes: 6 additions & 0 deletions api_validation/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,12 @@
<spark.version>${spark311.version}</spark.version>
</properties>
</profile>
<profile>
<id>spark320</id>
<properties>
<spark.version>${spark320.version}</spark.version>
</properties>
</profile>
</profiles>

<dependencies>
Expand Down
8 changes: 4 additions & 4 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ nav_order: 11

### What versions of Apache Spark does the RAPIDS Accelerator for Apache Spark support?

The RAPIDS Accelerator for Apache Spark requires version 3.0.0 or 3.0.1 of Apache Spark. Because the
plugin replaces parts of the physical plan that Apache Spark considers to be internal the code for
those plans can change even between bug fix releases. As a part of our process, we try to stay on
top of these changes and release updates as quickly as possible.
The RAPIDS Accelerator for Apache Spark requires version 3.0.0, 3.0.1, 3.0.2 or 3.1.1 of Apache
Spark. Because the plugin replaces parts of the physical plan that Apache Spark considers to be
internal the code for those plans can change even between bug fix releases. As a part of our
process, we try to stay on top of these changes and release updates as quickly as possible.

### Which distributions are supported?

Expand Down
4 changes: 3 additions & 1 deletion docs/additional-functionality/rapids-shuffle.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,10 @@ In this section, we are using a docker container built using the sample dockerfi
| 3.0.1 | com.nvidia.spark.rapids.spark301.RapidsShuffleManager |
| 3.0.1 EMR | com.nvidia.spark.rapids.spark301emr.RapidsShuffleManager |
| 3.0.2 | com.nvidia.spark.rapids.spark302.RapidsShuffleManager |
| 3.0.3 | com.nvidia.spark.rapids.spark303.RapidsShuffleManager |
| 3.1.1 | com.nvidia.spark.rapids.spark311.RapidsShuffleManager |
| 3.1.2 | com.nvidia.spark.rapids.spark312.RapidsShuffleManager |
| 3.2.0 | com.nvidia.spark.rapids.spark320.RapidsShuffleManager |
2. Recommended settings for UCX 1.9.0+
```shell
Expand All @@ -270,7 +273,6 @@ In this section, we are using a docker container built using the sample dockerfi
--conf spark.executorEnv.UCX_MAX_RNDV_RAILS=1 \
--conf spark.executorEnv.UCX_MEMTYPE_CACHE=n \
--conf spark.executorEnv.UCX_IB_RX_QUEUE_LEN=1024 \
--conf spark.executorEnv.LD_LIBRARY_PATH=/usr/lib:/usr/lib/ucx \
--conf spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR}
```

Expand Down
212 changes: 0 additions & 212 deletions docs/benchmarks.md

This file was deleted.

1 change: 1 addition & 0 deletions docs/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ Name | SQL Function(s) | Description | Default Value | Notes
<a name="sql.expression.CreateNamedStruct"></a>spark.rapids.sql.expression.CreateNamedStruct|`named_struct`, `struct`|Creates a struct with the given field names and values|true|None|
<a name="sql.expression.CurrentRow$"></a>spark.rapids.sql.expression.CurrentRow$| |Special boundary for a window frame, indicating stopping at the current row|true|None|
<a name="sql.expression.DateAdd"></a>spark.rapids.sql.expression.DateAdd|`date_add`|Returns the date that is num_days after start_date|true|None|
<a name="sql.expression.DateAddInterval"></a>spark.rapids.sql.expression.DateAddInterval| |Adds interval to date|true|None|
<a name="sql.expression.DateDiff"></a>spark.rapids.sql.expression.DateDiff|`datediff`|Returns the number of days from startDate to endDate|true|None|
<a name="sql.expression.DateSub"></a>spark.rapids.sql.expression.DateSub|`date_sub`|Returns the date that is num_days before start_date|true|None|
<a name="sql.expression.DayOfMonth"></a>spark.rapids.sql.expression.DayOfMonth|`dayofmonth`, `day`|Returns the day of the month from a date or timestamp|true|None|
Expand Down
4 changes: 2 additions & 2 deletions docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ This release includes additional performance improvements, including
* Instructions on how to use [Alluxio caching](get-started/getting-started-alluxio.md) with Spark to
leverage caching.

The release is supported on Apache Spark 3.0.0, 3.0.1, 3.1.1, Databricks 7.3 ML LTS and Google Cloud
Platform Dataproc 2.0.
The release is supported on Apache Spark 3.0.0, 3.0.1, 3.0.2, 3.1.1, Databricks 7.3 ML LTS and
Google Cloud Platform Dataproc 2.0.

The list of all supported operations is provided [here](supported_ops.md).

Expand Down
1 change: 0 additions & 1 deletion docs/get-started/Dockerfile.cuda
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ RUN set -ex && \
ln -s /lib /lib64 && \
mkdir -p /opt/spark && \
mkdir -p /opt/spark/jars && \
mkdir -p /opt/tpch && \
mkdir -p /opt/spark/examples && \
mkdir -p /opt/spark/work-dir && \
mkdir -p /opt/sparkRapidsPlugin && \
Expand Down
Loading

0 comments on commit 0cedde8

Please sign in to comment.