Skip to content

Commit

Permalink
Shim Layer to support multiple Spark versions (NVIDIA#414)
Browse files Browse the repository at this point in the history
* Shim Layer to support multiple Spark versions - adds Spark 3.0.0, 3.0.1, and 3.1.0

Signed-off-by: Thomas Graves <[email protected]>
  • Loading branch information
tgravescs authored Jul 23, 2020
1 parent 3e53bda commit 21a5210
Show file tree
Hide file tree
Showing 59 changed files with 2,501 additions and 237 deletions.
38 changes: 38 additions & 0 deletions dist/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,17 @@
<artifactId>rapids-4-spark-shuffle_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-shims-aggregator_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<!-- required for conf generation script -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<scope>provided</scope>
</dependency>
</dependencies>

<build>
Expand All @@ -49,6 +60,9 @@
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
<shadedArtifactAttached>false</shadedArtifactAttached>
<createDependencyReducedPom>true</createDependencyReducedPom>
<relocations>
Expand Down Expand Up @@ -94,6 +108,30 @@
</execution>
</executions>
</plugin>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<executions>
<execution>
<id>update_config</id>
<phase>verify</phase>
<goals>
<goal>run</goal>
</goals>
</execution>
</executions>
<configuration>
<launchers>
<launcher>
<id>update_rapids_config</id>
<mainClass>com.nvidia.spark.rapids.RapidsConf</mainClass>
<args>
<arg>${project.basedir}/../docs/configs.md</arg>
</args>
</launcher>
</launchers>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.rat</groupId>
<artifactId>apache-rat-plugin</artifactId>
Expand Down
8 changes: 6 additions & 2 deletions docs/get-started/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -417,11 +417,15 @@ With `nv_peer_mem`, IB/RoCE-based transfers can perform zero-copy transfers dire
2) Install [UCX 1.8.1](https://github.com/openucx/ucx/releases/tag/v1.8.1).

3) You will need to configure your spark job with extra settings for UCX (we are looking to
simplify these settings in the near future):
simplify these settings in the near future). Choose the version of the shuffle manager
that matches your Spark version. Currently we support
Spark 3.0.0 (com.nvidia.spark.rapids.spark300.RapidsShuffleManager) and
Spark 3.0.1 (com.nvidia.spark.rapids.spark301.RapidsShuffleManager) and
Spark 3.1.0 (com.nvidia.spark.rapids.spark310.RapidsShuffleManager):

```shell
...
--conf spark.shuffle.manager=com.nvidia.spark.RapidsShuffleManager \
--conf spark.shuffle.manager=com.nvidia.spark.rapids.spark300.RapidsShuffleManager \
--conf spark.shuffle.service.enabled=false \
--conf spark.rapids.shuffle.transport.enabled=true \
--conf spark.executorEnv.UCX_TLS=cuda_copy,cuda_ipc,rc,tcp \
Expand Down
3 changes: 3 additions & 0 deletions docs/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ They generally follow TPCH but are not guaranteed to be the same.

Unit tests exist in the tests directory. This is unconventional and is done so we can run the tests
on the final shaded version of the plugin. It also helps with how we collect code coverage.
You can run the unit tests against different versions of Spark using the different profiles. The
default version runs again Spark 3.0.0, `-Pspark301tests` runs against Spark 3.0.1, and `-Pspark310tests`
runs unit tests against Spark 3.1.0.

## Integration tests

Expand Down
28 changes: 28 additions & 0 deletions integration_tests/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,42 @@
<artifactId>rapids-4-spark-integration-tests_2.12</artifactId>
<version>0.2.0-SNAPSHOT</version>

<properties>
<spark.test.version>3.0.0</spark.test.version>
</properties>
<profiles>
<profile>
<id>spark301tests</id>
<properties>
<spark.test.version>3.0.1-SNAPSHOT</spark.test.version>
</properties>
</profile>
<profile>
<id>spark310tests</id>
<properties>
<spark.test.version>3.1.0-SNAPSHOT</spark.test.version>
</properties>
</profile>
</profiles>

<dependencies>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>jul-to-slf4j</artifactId>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>jcl-over-slf4j</artifactId>
<!-- runtime scope is appropriate, but causes SBT build problems -->
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${spark.test.version}</version>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

package com.nvidia.spark.rapids.tests.mortgage

import com.nvidia.spark.RapidsShuffleManager
import com.nvidia.spark.rapids.ShimLoader
import org.scalatest.FunSuite

import org.apache.spark.sql.SparkSession
Expand All @@ -34,7 +34,7 @@ class MortgageSparkSuite extends FunSuite {
.config("spark.rapids.sql.test.enabled", false)
.config("spark.rapids.sql.incompatibleOps.enabled", true)
.config("spark.rapids.sql.hasNans", false)
val rapidsShuffle = classOf[RapidsShuffleManager].getCanonicalName
val rapidsShuffle = ShimLoader.getSparkShims.getRapidsShuffleManagerClass
val prop = System.getProperty("rapids.shuffle.manager.override", "false")
if (prop.equalsIgnoreCase("true")) {
println("RAPIDS SHUFFLE MANAGER ACTIVE")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@

package com.nvidia.spark.rapids.tests.tpch

import com.nvidia.spark.RapidsShuffleManager
import com.nvidia.spark.rapids.{ColumnarRdd, ExecutionPlanCaptureCallback}
import com.nvidia.spark.rapids.ShimLoader
import org.scalatest.{BeforeAndAfterAll, FunSuite}

import org.apache.spark.sql.{DataFrame, SparkSession}
Expand All @@ -44,7 +44,7 @@ class TpchLikeSparkSuite extends FunSuite with BeforeAndAfterAll {
.config("spark.rapids.sql.explain", true)
.config("spark.rapids.sql.incompatibleOps.enabled", true)
.config("spark.rapids.sql.hasNans", false)
val rapidsShuffle = classOf[RapidsShuffleManager].getCanonicalName
val rapidsShuffle = ShimLoader.getSparkShims.getRapidsShuffleManagerClass
val prop = System.getProperty("rapids.shuffle.manager.override", "false")
if (prop.equalsIgnoreCase("true")) {
println("RAPIDS SHUFFLE MANAGER ACTIVE")
Expand Down
29 changes: 29 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@
<module>sql-plugin</module>
<module>tests</module>
<module>integration_tests</module>
<module>shims</module>
<module>api_validation</module>
</modules>

Expand Down Expand Up @@ -128,6 +129,12 @@
<rat.consoleOutput>true</rat.consoleOutput>
</properties>
</profile>
<profile>
<id>spark301tests</id>
</profile>
<profile>
<id>spark310tests</id>
</profile>
</profiles>

<properties>
Expand All @@ -152,6 +159,7 @@
<project.reporting.sourceEncoding>UTF-8</project.reporting.sourceEncoding>
<pytest.TEST_TAGS>not qarun</pytest.TEST_TAGS>
<rat.consoleOutput>false</rat.consoleOutput>
<slf4j.version>1.7.30</slf4j.version>
</properties>

<dependencyManagement>
Expand All @@ -168,6 +176,17 @@
<classifier>${cuda.version}</classifier>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>jul-to-slf4j</artifactId>
<version>${slf4j.version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>jcl-over-slf4j</artifactId>
<version>${slf4j.version}</version>
<!-- runtime scope is appropriate, but causes SBT build problems -->
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
Expand Down Expand Up @@ -547,5 +566,15 @@
<enabled>true</enabled>
</snapshots>
</repository>
<repository>
<id>apache-snapshots-repo</id>
<url>https://repository.apache.org/content/repositories/snapshots/</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
</project>
55 changes: 55 additions & 0 deletions shims/aggregator/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright (c) 2020, NVIDIA CORPORATION.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-shims_2.12</artifactId>
<version>0.2.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-shims-aggregator_2.12</artifactId>
<packaging>jar</packaging>
<name>RAPIDS Accelerator for Apache Spark SQL Plugin Shim Aggregator</name>
<description>The RAPIDS SQL plugin for Apache Spark Shim Aggregator</description>
<version>0.2.0-SNAPSHOT</version>

<dependencies>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-shims-spark310_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-shims-spark301_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-shims-spark300_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<scope>compile</scope>
</dependency>
</dependencies>
</project>
72 changes: 72 additions & 0 deletions shims/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright (c) 2020, NVIDIA CORPORATION.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-parent</artifactId>
<version>0.2.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-shims_2.12</artifactId>
<packaging>pom</packaging>
<name>RAPIDS Accelerator for Apache Spark SQL Plugin Shims</name>
<description>The RAPIDS SQL plugin for Apache Spark Shims</description>
<version>0.2.0-SNAPSHOT</version>

<modules>
<module>spark300</module>
<module>spark301</module>
<module>spark310</module>
<module>aggregator</module>
</modules>
<dependencies>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-sql_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>ai.rapids</groupId>
<artifactId>cudf</artifactId>
<classifier>${cuda.version}</classifier>
<scope>provided</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.apache.rat</groupId>
<artifactId>apache-rat-plugin</artifactId>
<configuration>
<excludes>
<exclude>**/src/main/resources/META-INF/services/*</exclude>
</excludes>
</configuration>
</plugin>
</plugins>
</build>
</project>
Loading

0 comments on commit 21a5210

Please sign in to comment.