Instructions for standalone/yarn wip (NVIDIA#22)

* Instructions for standalone/yarn wip * Update instructions * Fix typo * Small fixes * jars->jar
wjxiz1992 · Jun 26, 2019 · 699d761 · 699d761
1 parent 870c067
commit 699d761
Show file tree

Hide file tree

Showing 2 changed files with 232 additions and 38 deletions.
diff --git a/docs/standalone.md b/docs/standalone.md
@@ -1,11 +1,18 @@
-Get Started with XGBoost4J-Spark on Spark Standalone Cluster
-============================================================
-This is a getting started guide to XGBoost4J-Spark on a Spark Standalone Cluster. At the end of this guide, the reader will be able to run a sample Apache Spark application that runs on NVIDIA GPUs.
+Get Started with XGBoost4J-Spark on an Apache Spark Standalone Cluster
+======================================================================
+This is a getting started guide to XGBoost4J-Spark on an Apache Spark Standalone Cluster. At the end of this guide, the reader will be able to run a sample Apache Spark application that runs on NVIDIA GPUs.
 
 Prerequisites
 -------------
 * Apache Spark 2.3+ Standalone Cluster
-* Please see [RAPIDS.ai requirements] for GPU hardware required.
+* Hardware Requirements
+  * NVIDIA Pascal™ GPU architecture or better
+  * Multi-node clusters with homogenous GPU configuration
+* Software Requirements
+  * Ubuntu 16.04/CentOS
+  * NVIDIA driver 410.48+
+  * CUDA V10.0/9.2
+  * NCCL 2.4.7
 * `EXCLUSIVE_PROCESS` must be set for all GPUs in each host. This can be accomplished using the `nvidia-smi` utility:
 
   ```
@@ -22,21 +29,18 @@ Prerequisites
 * The number of GPUs in each host dictates the number of Spark executors that can run there. Additionally, cores per Spark executor and cores per Spark task must match, such that each executor can run 1 task at any given time. For example: if each host has 4 GPUs, there should be 4 executors running on each host, and each executor should run 1 task (for a total of 4 tasks running on 4 GPUs). In Spark Standalone mode, the default configuration is for an executor to take up all the cores assigned to each Spark Worker. In this example, we will limit the number of cores to 1, to match our dataset. Please see https://spark.apache.org/docs/latest/spark-standalone.html for more documentation regarding Standalone configuration.
 * The `SPARK_HOME` environment variable is assumed to point to the cluster's Apache Spark installation.
 
-Download Jars and Dataset
--------------------------
-1. Jars: [pointer to the 4 jars].
-2. Dataset: [pointer to mortage.zip].
+Get Application Jar and Dataset
+-------------------------------
+1. Jar: Please build the sample_xgboost_apps jar with dependencies as specified the [README](https://github.com/rapidsai/spark-examples)
+2. Dataset: https://rapidsai-data.s3.us-east-2.amazonaws.com/spark/mortgage.zip
 
-Fetch the required jars and dataset to a local directory. In this example the jars are in the `xgboost4j_spark/jars` directory, and the `mortgage.zip` dataset was unzipped in the `xgboost4j_spark/data` directory. 
+Place the required jar and dataset in a local directory. In this example the jar is in the `xgboost4j_spark/jars` directory, and the `mortgage.zip` dataset was unzipped in the `xgboost4j_spark/data` directory. 
 
 ```
 [xgboost4j_spark]$ find . -type f -print|sort
 ./data/mortgage/csv/test/mortgage_eval_merged.csv
 ./data/mortgage/csv/train/mortgage_train_merged.csv
-./jars/cudf-0.8-SNAPSHOT-cuda10.jar
-./jars/sample_xgboost_apps-0.1.4.jar
-./jars/xgboost4j-0.90-SNAPSHOT.jar
-./jars/xgboost4j-spark-0.90-SNAPSHOT.jar
+./jars/sample_xgboost_apps-0.1.4-jar-with-dependencies.jar
 ``` 
 
 Launch a Standalone Spark Cluster
@@ -71,7 +75,7 @@ export SPARK_MASTER=spark://`hostname -f`:7077
 # location where data was downloaded 
 export DATA_PATH=./xgboost4j_spark/data
 
-# location where required jars were downloaded
+# location for the required jar
 export JARS_PATH=./xgboost4j_spark/jars
 
 # Currently the number of tasks and executors must match the number of input files.
@@ -93,20 +97,11 @@ export SPARK_DRIVER_MEMORY=4g
 # spark executor memory
 export SPARK_EXECUTOR_MEMORY=8g
 
-# Library jars: 
-#     - cudf-0.8-SNAPSHOT-cuda10.jar
-#     - xgboost4j-0.90-SNAPSHOT.jar 
-#     - xgboost4j-spark-0.90-SNAPSHOT.jar
-export JARS_LIBRARY="\
-${JARS_PATH}/cudf-0.8-SNAPSHOT-cuda10.jar,\
-${JARS_PATH}/xgboost4j-0.90-SNAPSHOT.jar,\
-${JARS_PATH}/xgboost4j-spark-0.90-SNAPSHOT.jar"
-
 # example class to use
 export EXAMPLE_CLASS=ai.rapids.spark.examples.mortgage.GPUMain
 
 # XGBoost4J example jar (holds example classes):
-export JAR_EXAMPLE=${JARS_PATH}/sample_xgboost_apps-0.1.4.jar
+export JAR_EXAMPLE=${JARS_PATH}/sample_xgboost_apps-0.1.4-jar-with-dependencies.jar
 
 # tree construction algorithm
 export TREE_METHOD=gpu_hist
@@ -120,7 +115,6 @@ ${SPARK_HOME}/bin/spark-submit
  --driver-memory ${SPARK_DRIVER_MEMORY}                                         \
  --executor-memory ${SPARK_EXECUTOR_MEMORY}                                     \
  --conf spark.cores.max=${TOTAL_CORES}                                          \
- --jars ${JARS_LIBRARY}                                                         \
  --class ${EXAMPLE_CLASS}                                                       \
  ${JAR_EXAMPLE}                                                                 \
  -trainDataPath=${DATA_PATH}/mortgage/csv/train/mortgage_train_merged.csv       \
@@ -168,7 +162,7 @@ export SPARK_MASTER=spark://`hostname -f`:7077
 # location where data was downloaded 
 export DATA_PATH=./xgboost4j_spark/data
 
-# location where required jars were downloaded
+# location where the required jar was downloaded
 export JARS_PATH=./xgboost4j_spark/jars
 
 # Currently the number of tasks and executors must match the number of input files.
@@ -190,20 +184,11 @@ export SPARK_DRIVER_MEMORY=4g
 # spark executor memory
 export SPARK_EXECUTOR_MEMORY=8g
 
-# Library jars: 
-#     - cudf-0.8-SNAPSHOT-cuda10.jar
-#     - xgboost4j-0.90-SNAPSHOT.jar 
-#     - xgboost4j-spark-0.90-SNAPSHOT.jar
-export JARS_LIBRARY="\
-${JARS_PATH}/cudf-0.8-SNAPSHOT-cuda10.jar,\
-${JARS_PATH}/xgboost4j-0.90-SNAPSHOT.jar,\
-${JARS_PATH}/xgboost4j-spark-0.90-SNAPSHOT.jar"
-
 # example class to use
 export EXAMPLE_CLASS=ai.rapids.spark.examples.mortgage.CPUMain
 
 # XGBoost4J example jar (holds example classes):
-export JAR_EXAMPLE=${JARS_PATH}/sample_xgboost_apps-0.1.4.jar
+export JAR_EXAMPLE=${JARS_PATH}/sample_xgboost_apps-0.1.4-jar-with-dependencies.jar
 
 # tree construction algorithm
 export TREE_METHOD=hist
@@ -217,7 +202,6 @@ ${SPARK_HOME}/bin/spark-submit
  --driver-memory ${SPARK_DRIVER_MEMORY}                                         \
  --executor-memory ${SPARK_EXECUTOR_MEMORY}                                     \
  --conf spark.cores.max=${TOTAL_CORES}                                          \
- --jars ${JARS_LIBRARY}                                                         \
  --class ${EXAMPLE_CLASS}                                                       \
  ${JAR_EXAMPLE}                                                                 \
  -trainDataPath=${DATA_PATH}/mortgage/csv/train/mortgage_train_merged.csv       \
@@ -244,4 +228,4 @@ In the `stdout` driver log, you should see timings<sup>*</sup> (in seconds), and
 0.9873709530950067
 ```
 
-<sup>*</sup> The timings in this Getting Started guide are only illustrative. Please see [link to benchmarks] for official benchmarks.
+<sup>*</sup> The timings in this Getting Started guide are only illustrative. 
diff --git a/docs/yarn.md b/docs/yarn.md
@@ -0,0 +1,210 @@
+Get Started with XGBoost4J-Spark on Apache Hadoop YARN
+======================================================
+This is a getting started guide to XGBoost4J-Spark on Apache Hadoop YARN. At the end of this guide, the reader will be able to run a sample Apache Spark application that runs on NVIDIA GPUs.
+
+Prerequisites
+-------------
+* Apache Spark 2.3+ running on YARN.
+* Hardware Requirements
+  * NVIDIA Pascal™ GPU architecture or better
+  * Multi-node clusters with homogenous GPU configuration
+* Software Requirements
+  * Ubuntu 16.04/CentOS
+  * NVIDIA driver 410.48+
+  * CUDA V10.0/9.2
+  * NCCL 2.4.7
+* `EXCLUSIVE_PROCESS` must be set for all GPUs in each NodeManager. This can be accomplished using the `nvidia-smi` utility:
+
+  ```
+  nvidia-smi -i [gpu index] -c EXCLUSIVE_PROCESS
+  ```
+
+  For example:
+
+  ```
+  nvidia-smi -i 0 -c EXCLUSIVE_PROCESS
+  ```
+
+  Sets `EXCLUSIVE_PROCESS` for GPU _0_.
+* The number of GPUs per NodeManager dictates the number of Spark executors that can run in that NodeManager. Additionally, cores per Spark executor and cores per Spark task must match, such that each executor can run 1 task at any given time. For example: if each NodeManager has 4 GPUs, there should be 4 executors running on each NodeManager, and each executor should run 1 task (for a total of 4 tasks running on 4 GPUs). In order to achieve this, you may need to adjust `spark.task.cpus` and `spark.executor.cores` to match (both set to 1 by default). Additionally, we recommend adjusting `executor-memory` to divide host memory evenly amongst the number of GPUs in each NodeManager, such that Spark will schedule as many executors as there are GPUs in each NodeManager.
+* The `SPARK_HOME` environment variable is assumed to point to the cluster's Apache Spark installation.
+
+Get Application Jar and Dataset
+-------------------------------
+1. Jar: Please build the sample_xgboost_apps jar with dependencies as specified the [README](https://github.com/rapidsai/spark-examples)
+2. Dataset: https://rapidsai-data.s3.us-east-2.amazonaws.com/spark/mortgage.zip
+
+First place the required jar and dataset in a local directory. In this example the jar is in the `xgboost4j_spark/jars` directory, and the `mortgage.zip` dataset was unzipped in the `xgboost4j_spark/data` directory. 
+
+```
+[xgboost4j_spark]$ find . -type f -print|sort
+./data/mortgage/csv/test/mortgage_eval_merged.csv
+./data/mortgage/csv/train/mortgage_train_merged.csv
+./jars/sample_xgboost_apps-0.1.4-jar-with-dependencies.jar
+``` 
+
+Create a directory in HDFS, and copy:
+
+```
+[xgboost4j_spark]$ hadoop fs -mkdir /tmp/xgboost4j_spark
+[xgboost4j_spark]$ hadoop fs -copyFromLocal * /tmp/xgboost4j_spark
+```
+
+Verify that the jar and dataset are in HDFS:
+
+```
+[xgboost4j_spark]$ hadoop fs -find /tmp/xgboost4j_spark -print|grep "\."|sort
+/tmp/xgboost4j_spark/data/mortgage/csv/test/mortgage_eval_merged.csv
+/tmp/xgboost4j_spark/data/mortgage/csv/train/mortgage_train_merged.csv
+/tmp/xgboost4j_spark/jars/sample_xgboost_apps-0.1.4-jar-with-dependencies.jar
+```
+
+Launch GPU Mortgage Example
+---------------------------
+Variables required to run spark-submit command:
+
+```
+# location where data was downloaded 
+export DATA_PATH=hdfs:/tmp/xgboost4j_spark/data
+
+# location for the required jar
+export JARS_PATH=hdfs:/tmp/xgboost4j_spark/jars
+
+# spark deploy mode (see Apache Spark documentation for more information) 
+export SPARK_DEPLOY_MODE=cluster
+
+# run a single executor for this example to limit the number of spark tasks and
+# partitions to 1 as currently this number must match the number of input files
+export SPARK_NUM_EXECUTORS=1
+
+# spark driver memory
+export SPARK_DRIVER_MEMORY=4g
+
+# spark executor memory
+export SPARK_EXECUTOR_MEMORY=8g
+
+# example class to use
+export EXAMPLE_CLASS=ai.rapids.spark.examples.mortgage.GPUMain
+
+# XGBoost4J example jar (holds example classes):
+export JAR_EXAMPLE=${JARS_PATH}/sample_xgboost_apps-0.1.4-jar-with-dependencies.jar
+
+# tree construction algorithm
+export TREE_METHOD=gpu_hist
+```
+
+Run spark-submit:
+
+```
+${SPARK_HOME}/bin/spark-submit                                                  \
+ --master yarn                                                                  \
+ --deploy-mode ${SPARK_DEPLOY_MODE}                                             \
+ --num-executors ${SPARK_NUM_EXECUTORS}                                         \
+ --driver-memory ${SPARK_DRIVER_MEMORY}                                         \
+ --executor-memory ${SPARK_EXECUTOR_MEMORY}                                     \
+ --class ${EXAMPLE_CLASS}                                                       \
+ ${JAR_EXAMPLE}                                                                 \
+ -trainDataPath=${DATA_PATH}/mortgage/csv/train/mortgage_train_merged.csv       \
+ -evalDataPath=${DATA_PATH}/mortgage/csv/test/mortgage_eval_merged.csv          \
+ -format=csv                                                                    \
+ -numWorkers=${SPARK_NUM_EXECUTORS}                                             \
+ -treeMethod=${TREE_METHOD}                                                     \
+ -numRound=100                                                                  \
+ -maxDepth=8                                                                    
+```
+
+In the `stdout` driver log, you should see timings<sup>*</sup> (in seconds), and the RMSE accuracy metric:
+
+```
+--------------
+==> Benchmark: Elapsed time for [train]: 29.642s
+--------------
+
+--------------
+==> Benchmark: Elapsed time for [transform]: 21.272s
+--------------
+
+------Accuracy of Evaluation------
+0.9874184013493451
+```
+
+Launch CPU Mortgage Example
+---------------------------
+If you are running this example after running the GPU example above, please set these variables, to set both training and testing to run on the CPU exclusively:
+
+```
+# example class to use
+export EXAMPLE_CLASS=ai.rapids.spark.examples.mortgage.CPUMain
+
+# tree construction algorithm
+export TREE_METHOD=hist
+```
+
+This is the full variable listing, if you are running the CPU example from scratch:
+
+```
+# location where data was downloaded 
+export DATA_PATH=hdfs:/tmp/xgboost4j_spark/data
+
+# location where required jar were downloaded
+export JARS_PATH=hdfs:/tmp/xgboost4j_spark/jars
+
+# spark deploy mode (see Apache Spark documentation for more information) 
+export SPARK_DEPLOY_MODE=cluster
+
+# run a single executor for this example to limit the number of spark tasks and
+# partitions to 1 as currently this number must match the number of input files
+export SPARK_NUM_EXECUTORS=1
+
+# spark driver memory
+export SPARK_DRIVER_MEMORY=4g
+
+# spark executor memory
+export SPARK_EXECUTOR_MEMORY=8g
+
+# example class to use
+export EXAMPLE_CLASS=ai.rapids.spark.examples.mortgage.CPUMain
+
+# XGBoost4J example jar (holds example classes):
+export JAR_EXAMPLE=${JARS_PATH}/sample_xgboost_apps-0.1.4-jar-with-dependencies.jar
+
+# tree construction algorithm
+export TREE_METHOD=hist
+```
+
+This is the same command as for the GPU example, repeated for convenience:
+
+```
+${SPARK_HOME}/bin/spark-submit                                                  \
+ --master yarn                                                                  \
+ --deploy-mode ${SPARK_DEPLOY_MODE}                                             \
+ --num-executors ${SPARK_NUM_EXECUTORS}                                         \
+ --driver-memory ${SPARK_DRIVER_MEMORY}                                         \
+ --executor-memory ${SPARK_EXECUTOR_MEMORY}                                     \
+ --class ${EXAMPLE_CLASS}                                                       \
+ ${JAR_EXAMPLE}                                                                 \
+ -trainDataPath=${DATA_PATH}/mortgage/csv/train/mortgage_train_merged.csv       \
+ -evalDataPath=${DATA_PATH}/mortgage/csv/test/mortgage_eval_merged.csv          \
+ -format=csv                                                                    \
+ -numWorkers=${SPARK_NUM_EXECUTORS}                                             \
+ -treeMethod=${TREE_METHOD}                                                     \
+ -numRound=100                                                                  \
+ -maxDepth=8                                                                    
+```
+
+In the `stdout` driver log, you should see timings<sup>*</sup> (in seconds), and the RMSE accuracy metric:
+
+```
+--------------
+==> Benchmark: Elapsed time for [train]: 286.398s
+--------------
+
+--------------
+==> Benchmark: Elapsed time for [transform]: 49.836s
+--------------
+
+------Accuracy of Evaluation------
+0.9873709530950067 
+```
+
+<sup>*</sup> The timings in this Getting Started guide are only illustrative.