-
Notifications
You must be signed in to change notification settings - Fork 33
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added instructions for using Bitnami Spark images
- Loading branch information
1 parent
c1edd9b
commit 8a3155b
Showing
6 changed files
with
216 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
services: | ||
spark: | ||
image: bitnami/spark:3.5.3 | ||
ports: | ||
- "8080:8080" | ||
- "7077:7077" | ||
- "4040:4040" | ||
environment: | ||
- SPARK_MODE=master | ||
- SPARK_RPC_AUTHENTICATION_ENABLED=no | ||
- SPARK_RPC_ENCRYPTION_ENABLED=no | ||
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no | ||
- SPARK_SSL_ENABLED=no | ||
- SPARK_PUBLIC_DNS=localhost | ||
volumes: | ||
- type: bind | ||
source: ./spark-defaults.conf | ||
target: /opt/bitnami/spark/conf/spark-defaults.conf | ||
- type: bind | ||
source: ../../ppl-spark-integration/target/scala-2.12/ppl-spark-integration-assembly-0.7.0-SNAPSHOT.jar | ||
target: /opt/bitnami/spark/jars/ppl-spark-integration-assembly-0.7.0-SNAPSHOT.jar | ||
|
||
spark-worker: | ||
image: bitnami/spark:3.5.3 | ||
environment: | ||
- SPARK_MODE=worker | ||
- SPARK_MASTER_URL=spark://spark:7077 | ||
- SPARK_WORKER_MEMORY=1G | ||
- SPARK_WORKER_CORES=1 | ||
- SPARK_RPC_AUTHENTICATION_ENABLED=no | ||
- SPARK_RPC_ENCRYPTION_ENABLED=no | ||
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no | ||
- SPARK_SSL_ENABLED=no | ||
- SPARK_PUBLIC_DNS=localhost | ||
volumes: | ||
- type: bind | ||
source: ./spark-defaults.conf | ||
target: /opt/bitnami/spark/conf/spark-defaults.conf | ||
- type: bind | ||
source: ../../ppl-spark-integration/target/scala-2.12/ppl-spark-integration-assembly-0.7.0-SNAPSHOT.jar | ||
target: /opt/bitnami/spark/jars/ppl-spark-integration-assembly-0.7.0-SNAPSHOT.jar |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
# Default system properties included when running spark-submit. | ||
# This is useful for setting default environmental settings. | ||
|
||
# Example: | ||
# spark.master spark://master:7077 | ||
# spark.eventLog.enabled true | ||
# spark.eventLog.dir hdfs://namenode:8021/directory | ||
# spark.serializer org.apache.spark.serializer.KryoSerializer | ||
# spark.driver.memory 5g | ||
# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three" | ||
spark.sql.extensions org.opensearch.flint.spark.FlintPPLSparkExtensions | ||
spark.sql.catalog.dev org.apache.spark.opensearch.catalog.OpenSearchCatalog |
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
# Running Queries with Apache Spark in Docker | ||
|
||
There are [Bitnami Apache Spark docker images](https://hub.docker.com/r/bitnami/spark). These | ||
can be modified to be able to include the OpenSearch Spark PPL extension. With the OpenSearch | ||
Spark PPL extension, the docker image can be used to test PPL commands. | ||
|
||
The Bitnami Apache Spark image can be used to run a Spark cluster and also to run | ||
`spark-shell` for running queries. | ||
|
||
## Setup | ||
|
||
### spark-conf | ||
|
||
Contains the Apache Spark configuration. Need to add three lines to the `spark-defaults.conf` | ||
file: | ||
``` | ||
spark.sql.legacy.createHiveTableByDefault false | ||
spark.sql.extensions org.opensearch.flint.spark.FlintPPLSparkExtensions | ||
spark.sql.catalog.dev org.apache.spark.opensearch.catalog.OpenSearchCatalog | ||
``` | ||
|
||
An example file available in this repository at `docker/apache-spark-sample/spark-defaults.conf` | ||
|
||
## Prepare OpenSearch Spark PPL Extension | ||
|
||
Create a local build or copy of the OpenSearch Spark PPL extension. Make a note of the | ||
location of the Jar file as well as the name of the Jar file. | ||
|
||
## Run the Spark Cluster | ||
|
||
Need to run a master node and a worker node. For these to communicate, first create a network | ||
for them to use. | ||
|
||
``` | ||
docker network create spark-network | ||
``` | ||
|
||
### Master Node | ||
|
||
The master node can be run with the following command: | ||
``` | ||
docker run \ | ||
-d \ | ||
--name spark \ | ||
--network spark-network \ | ||
-p 8080:8080 \ | ||
-p 7077:7077 \ | ||
-p 4040:4040 \ | ||
-e SPARK_MODE=master \ | ||
-e SPARK_RPC_AUTHENTICATION_ENABLED=no \ | ||
-e SPARK_RPC_ENCRYPTION_ENABLED=no \ | ||
-e SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no \ | ||
-e SPARK_SSL_ENABLED=no \ | ||
-e SPARK_PUBLIC_DNS=localhost \ | ||
-v <PATH_TO_SPARK_CONFIG_FILE>:/opt/bitnami/spark/conf/spark-defaults.conf \ | ||
-v <PATH_TO_SPARK_PPL_JAR_FILE>/<SPARK_PPL_JAR_FILE>:/opt/bitnami/spark/jars/<SPARK_PPL_JAR_FILE> \ | ||
bitnami/spark:3.5.3 | ||
``` | ||
|
||
* `-d` | ||
Run the container in the background and return to the shell | ||
* `--name spark` | ||
Name the docker container `spark` | ||
* `<PATH_TO_SPARK_CONFIG_FILE>` | ||
Replace with the path to the Spark configuration file. | ||
* `<PATH_TO_SPARK_PPL_JAR_FILE>` | ||
Replace with the path to the directory containing the OpenSearch Spark PPL extension | ||
Jar file. | ||
* `<SPARK_PPL_JAR_FILE>` | ||
Replace with the filename of the OpenSearch Spark PPL extension Jar file. | ||
|
||
### Worker Node | ||
|
||
The worker node can be run with the following command: | ||
``` | ||
docker run \ | ||
-d \ | ||
--name spark-worker \ | ||
--network spark-network \ | ||
-e SPARK_MODE=worker \ | ||
-e SPARK_MASTER_URL=spark://spark:7077 \ | ||
-e SPARK_WORKER_MEMORY=1G \ | ||
-e SPARK_WORKER_CORES=1 \ | ||
-e SPARK_RPC_AUTHENTICATION_ENABLED=no \ | ||
-e SPARK_RPC_ENCRYPTION_ENABLED=no \ | ||
-e SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no \ | ||
-e SPARK_SSL_ENABLED=no \ | ||
-e SPARK_PUBLIC_DNS=localhost \ | ||
-v <PATH_TO_SPARK_CONFIG_FILE>:/opt/bitnami/spark/conf/spark-defaults.conf \ | ||
-v <PATH_TO_SPARK_PPL_JAR_FILE>/<SPARK_PPL_JAR_FILE>:/opt/bitnami/spark/jars/<SPARK_PPL_JAR_FILE> \ | ||
bitnami/spark:3.5.3 | ||
``` | ||
|
||
* `-d` | ||
Run the container in the background and return to the shell | ||
* `--name spark-worker` | ||
Name the docker container `spark-worker` | ||
* `<PATH_TO_SPARK_CONFIG_FILE>` | ||
Replace with the path to the Spark configuration file. | ||
* `<PATH_TO_SPARK_PPL_JAR_FILE>` | ||
Replace with the path to the directory containing the OpenSearch Spark PPL extension | ||
Jar file. | ||
* `<SPARK_PPL_JAR_FILE>` | ||
Replace with the filename of the OpenSearch Spark PPL extension Jar file. | ||
|
||
## Running Spark Shell | ||
|
||
Can run `spark-shell` on the master node. | ||
|
||
``` | ||
docker exec -it spark /opt/bitnami/spark/bin/spark-shell | ||
``` | ||
|
||
Within the Spark Shell, you can submit queries, including PPL queries. | ||
|
||
## Docker Compose Sample | ||
|
||
There is a sample `docker-compose.yml` file in this repository at | ||
`docker/apache-spark-sample/docker-compose.yml` It can be used to start up both nodes with | ||
the command: | ||
|
||
``` | ||
docker compose up -d | ||
``` | ||
|
||
The cluster can be stopped with: | ||
|
||
``` | ||
docker compose down | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters