From 0f6aef1ffcd4c145e695b2b485b6917089660590 Mon Sep 17 00:00:00 2001 From: Salonijain27 Date: Fri, 3 May 2019 07:51:49 -0500 Subject: [PATCH] Update README.md --- README.md | 307 ++++++------------------------------------------------ 1 file changed, 31 insertions(+), 276 deletions(-) diff --git a/README.md b/README.md index ad996e7e89c..e5e8c8bf367 100644 --- a/README.md +++ b/README.md @@ -4,37 +4,54 @@ The [RAPIDS](https://rapids.ai) cuGraph library is a collection of graph analytics that process data found in GPU Dataframes - see [cuDF](https://github.com/rapidsai/cudf). cuGraph aims to provide a NetworkX-like API that will be familiar to data scientists, so they can now build GPU-accelerated workflows more easily. - For more project details, see [rapids.ai](https://rapids.ai/). - -**NOTE:** For the latest stable [README.md](https://github.com/rapidsai/cudf/blob/master/README.md) ensure you are on the `master` branch. - - - +For example, the following snippet downloads a CSV file containing friendship information of students in a karate dojo, then uses the GPU to parse it into rows and columns and greate a graph. This graph is then used to run cugraph's pagerank algorithm to find the student/students who are the most popular. +```python +import cugraph, cudf, requests +from collections import OrderedDict +from io import StringIO + +# read the data into a cuDF DataFrame +url = "https://raw.githubusercontent.com/rapidsai/cugraph/branch-0.7/datasets/karate.csv" +content = requests.get(url).content.decode('utf-8') +gdf = cudf.read_csv(StringIO(content), names=["subject", "friend"], + delimiter=' ', dtype=['int32', 'int32']) + +# create graph with nodes being unique students and edges representing friendship +G = cugraph.Graph() +G.add_edge_list(gdf["subject"], gdf["friend"]) +# apply the pagerank algorithm to the graph +results = cugraph.pagerank(G) + +# Find the most connected student(s) using the scores from the pagerank algorithm: +max_score = results['pagerank'].max() +popular_subject = [i for i in range(len(results)) + if results['pagerank'][i] == max_score] +print("Most connected student(s): " + str(popular_subject) + " have a score of: " + str(max_score)) +``` +Output: +``` +Most connected student(s): [33] have a score of: 0.10091735 +``` +For additional examples, browse our complete [API documentation](https://docs.rapids.ai/api/cugraph/stable/), or check out our more detailed [notebooks](https://github.com/rapidsai/notebooks-extended). +**NOTE:** For the latest stable [README.md](https://github.com/rapidsai/cudf/blob/master/README.md) ensure you are on the `master` branch. ## Getting cuGraph + ### Intro There are 4 ways to get cuGraph : 1. [Quick start with Docker Demo Repo](#quick) 1. [Conda Installation](#conda) 1. [Build from Source](#source) - - -Building from source is currently the only viable option. Once version 0.6 is release, the other options will be available. - - - ## Quick Start Please see the [Demo Docker Repository](https://hub.docker.com/r/rapidsai/rapidsai/), choosing a tag based on the NVIDIA CUDA version you’re running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize all of the RAPIDS libraries: cuDF, cuML, and cuGraph. - - ### Conda @@ -53,272 +70,10 @@ conda install -c nvidia/label/cuda10.0 -c rapidsai/label/cuda10.0 -c numba -c co Note: This conda installation only applies to Linux and Python versions 3.6/3.7. - - -### Build from Source - -The following instructions are for developers and contributors to cuGraph OSS development. These instructions are tested on Linux Ubuntu 16.04 & 18.04. Use these instructions to build cuGraph from source and contribute to its development. Other operating systems may be compatible, but are not currently tested. - -The cuGraph package include both a C/C++ CUDA portion and a python portion. Both libraries need to be installed in order for cuGraph to operate correctly. - -The following instructions are tested on Linux systems. - - - -#### Prerequisites - -Compiler requirement: - -* `gcc` version 5.4+ -* `nvcc` version 9.2 -* `cmake` version 3.12 - - - -CUDA requirement: - -* CUDA 9.2+ -* NVIDIA driver 396.44+ -* Pascal architecture or better - -You can obtain CUDA from [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads). - - - -Since `cmake` will download and build Apache Arrow you may need to install Boost C++ (version 1.58+) before running -`cmake`: - -```bash -# Install Boost C++ for Ubuntu 16.04/18.04 -$ sudo apt-get install libboost-all-dev -``` - -or - -```bash -# Install Boost C++ for Conda -$ conda install -c conda-forge boost -``` - - - -#### Build and Install the C/C++ CUDA components - -To install cuGraph from source, ensure the dependencies are met and follow the steps below: - -1) Clone the repository and submodules - - ```bash - # Set the localtion to cuGraph in an environment variable CUGRAPH_HOME - export CUGRAPH_HOME=$(pwd)/cugraph - - # Download the cuGraph repo - git clone https://github.com/rapidsai/cugraph.git $CUGRAPH_HOME - - # Next load all the submodules - cd $CUGRAPH_HOME - git submodule update --init --recursive - ``` - - - -2) Create the conda development environment - - -```bash -# create the conda environment (assuming in base `cugraph` directory) -# for CUDA 9.2 -conda env create --name cugraph_dev --file conda/environments/cugraph_dev.yml - -# for CUDA 10 -conda env create --name cugraph_dev --file conda/environments/cugraph_dev_cuda10.yml - -# activate the environment -conda activate cugraph_dev - -# to deactivate an environment -conda deactivate -``` - - - - - The environment can be updated as development includes/changes the dependencies. To do so, run: - - - -```bash -# for CUDA 9.2 -conda env update --name cugraph_dev --file conda/environments/cugraph_dev.yml - -# for CUDA 10 -conda env update --name cugraph_dev --file conda/environments/cugraph_dev_cuda10.yml - -conda activate cugraph_dev -``` - - - - - -3) Build and install `libcugraph`. CMake depends on the `nvcc` executable being on your path or defined in `$CUDACXX`. - - This project uses cmake for building the C/C++ library. CMake will also automatically build and install nvGraph library (`$CUGRAPH_HOME/cpp/nvgraph`) which may take a few minutes. To configure cmake, run: - - ```bash - # Set the localtion to cuGraph in an environment variable CUGRAPH_HOME - export CUGRAPH_HOME=$(pwd)/cugraph - - cd $CUGRAPH_HOME - cd cpp # enter cpp directory - mkdir build # create build directory - cd build # enter the build directory - cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX - - # now build the code - make -j # "-j" starts multiple threads - make install # install the libraries - ``` - -The default installation locations are `$CMAKE_INSTALL_PREFIX/lib` and `$CMAKE_INSTALL_PREFIX/include/cugraph` respectively. - - - -#### Building and installing the Python package - -5. Install the Python package to your Python path: - -```bash -cd $CUGRAPH_HOME -cd python -python setup.py install # install cugraph python bindings -``` - - - - - -#### Run tests - -6. Run either the C++ or the Python tests with datasets - - - **Python tests with datasets** - - ```bash - cd $CUGRAPH_HOME - cd python - pytest - ``` - - **C++ stand alone tests** - - From the build directory : - - ```bash - # Run the cugraph tests - cd $CUGRAPH_HOME - cd cpp/build - gtests/GDFGRAPH_TEST # this is an executable file - ``` - - **C++ tests with larger datasets** - - If you already have the datasets: - - ```bash - export RAPIDS_DATASET_ROOT_DIR= - ``` - If you do not have the datasets: - - ```bash - cd $CUGRAPH_HOME/datasets - source get_test_data.sh #This takes about 10 minutes and download 1GB data (>5 GB uncompressed) - ``` - - Run the C++ tests on large input: - - ```bash - cd $CUGRAPH_HOME/cpp/build - #test one particular analytics (eg. pagerank) - gtests/PAGERANK_TEST - #test everything - make test - ``` - -Note: This conda installation only applies to Linux and Python versions 3.6/3.7. - - - -## Documentation - -Python API documentation can be generated from [docs](docs) directory. - - - -## C++ ABI issues - -cuGraph builds with C++14 features. By default, we build cuGraph with the latest ABI (the ABI changed with C++11). The version of cuDF pointed to in the conda installation above is build with the new ABI. - -If you see link errors indicating trouble finding functions that use C++ strings when trying to build cuGraph you may have an ABI incompatibility. - -There are a couple of complications that may make this a problem: -* if you need to link in a library built with the old ABI, you may need to build the entire tool chain from source using the old ABI. -* if you build cudf from source (for whatever reason), the default behavior for cudf (at least through version 0.5.x) is to build using the old ABI. You can build with the new ABI, but you need to follow the instructions in CUDF to explicitly turn that on. - -If you must build cugraph with the old ABI, you can use the following command (instead of the cmake call above): - -```bash -cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DCMAKE_CXX11_ABI=OFF -``` - - - -### (OPTIONAL) Set environment variable on activation - -It is possible to configure the conda environment to set environmental variables on activation. Providing instructions to set PATH to include the CUDA toolkit bin directory and LD_LIBRARY_PATH to include the CUDA lib64 directory will be helpful. - - - -```bash -cd ~/anaconda3/envs/cugraph_dev - -mkdir -p ./etc/conda/activate.d -mkdir -p ./etc/conda/deactivate.d -touch ./etc/conda/activate.d/env_vars.sh -touch ./etc/conda/deactivate.d/env_vars.sh -``` - - - -Next the env_vars.sh file needs to be edited - -```bash -vi ./etc/conda/activate.d/env_vars.sh - -#!/bin/bash -export PATH=/usr/local/cuda-10.0/bin:$PATH # or cuda-9.2 if using CUDA 9.2 -export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH # or cuda-9.2 if using CUDA 9.2 -``` - - - -``` -vi ./etc/conda/deactivate.d/env_vars.sh - -#!/bin/bash -unset PATH -unset LD_LIBRARY_PATH -``` - - - -## nvGraph - -The nvGraph library is now open source and part of cuGraph. It can be build as a stand alone by following nvgraph's [readme](cpp/nvgraph/). - ------ - - ##
Open GPU Data Science The RAPIDS suite of open source software libraries aim to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.