Skip to content

Commit

Permalink
Merge branch 'branch-21.10' of github.com:rapidsai/cudf into fea-noat…
Browse files Browse the repository at this point in the history
…omic-groupbyreduce
  • Loading branch information
karthikeyann committed Aug 27, 2021
2 parents aea6886 + 4d8e401 commit 5bd4321
Show file tree
Hide file tree
Showing 245 changed files with 10,348 additions and 4,296 deletions.
299 changes: 176 additions & 123 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,99 +55,25 @@ implementation of the issue, ask them in the issue instead of the PR.

The following instructions are for developers and contributors to cuDF OSS development. These instructions are tested on Linux Ubuntu 16.04 & 18.04. Use these instructions to build cuDF from source and contribute to its development. Other operating systems may be compatible, but are not currently tested.

### Code Formatting

#### Python

cuDF uses [Black](https://black.readthedocs.io/en/stable/),
[isort](https://readthedocs.org/projects/isort/), and
[flake8](http://flake8.pycqa.org/en/latest/) to ensure a consistent code format
throughout the project. `Black`, `isort`, and `flake8` can be installed with
`conda` or `pip`:

```bash
conda install black isort flake8
```

```bash
pip install black isort flake8
```

These tools are used to auto-format the Python code, as well as check the Cython
code in the repository. Additionally, there is a CI check in place to enforce
that committed code follows our standards. You can use the tools to
automatically format your python code by running:

```bash
isort --atomic python/**/*.py
black python
```
### General requirements

and then check the syntax of your Python and Cython code by running:

```bash
flake8 python
flake8 --config=python/.flake8.cython
```

Additionally, many editors have plugins that will apply `isort` and `Black` as
you edit files, as well as use `flake8` to report any style / syntax issues.

#### C++/CUDA

cuDF uses [`clang-format`](https://clang.llvm.org/docs/ClangFormat.html)

In order to format the C++/CUDA files, navigate to the root (`cudf`) directory and run:
```
python3 ./cpp/scripts/run-clang-format.py -inplace
```

Additionally, many editors have plugins or extensions that you can set up to automatically run `clang-format` either manually or on file save.

#### Pre-commit hooks

Optionally, you may wish to setup [pre-commit hooks](https://pre-commit.com/)
to automatically run `isort`, `Black`, `flake8` and `clang-format` when you make a git commit.
This can be done by installing `pre-commit` via `conda` or `pip`:

```bash
conda install -c conda-forge pre_commit
```

```bash
pip install pre-commit
```

and then running:

```bash
pre-commit install
```

from the root of the cuDF repository. Now `isort`, `Black`, `flake8` and `clang-format` will be
run each time you commit changes.

### Get libcudf Dependencies

Compiler requirements:
Compilers:

* `gcc` version 9.3+
* `nvcc` version 11.0+
* `cmake` version 3.20.1+

CUDA/GPU requirements:
CUDA/GPU:

* CUDA 11.0+
* NVIDIA driver 450.80.02+
* Pascal architecture or better

You can obtain CUDA from [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads).

## Script to build cuDF from source

### Build from Source

To install cuDF from source, ensure the dependencies are met and follow the steps below:
### Create the build Environment

- Clone the repository and submodules
```bash
Expand All @@ -166,86 +92,147 @@ conda activate cudf_dev
```
- For other CUDA versions, check the corresponding cudf_dev_cuda*.yml file in conda/environments

- Build and install `libcudf` after its dependencies. CMake depends on the `nvcc` executable being on your path or defined in `$CUDACXX`.
### Build cuDF from source

- A `build.sh` script is provided in `$CUDF_HOME`. Running the script with no additional arguments will install the `libcudf`, `cudf` and `dask_cudf` libraries. By default, the libraries are installed to the `$CONDA_PREFIX` directory. To install into a different location, set the location in `$INSTALL_PREFIX`. Finally, note that the script depends on the `nvcc` executable being on your path, or defined in `$CUDACXX`.
```bash
cd $CUDF_HOME

# Choose one of the following commands, depending on whether
# you want to build and install the libcudf C++ library only,
# or include the cudf and/or dask_cudf Python libraries:

./build.sh # libcudf, cudf and dask_cudf
./build.sh libcudf # libcudf only
./build.sh libcudf cudf # libcudf and cudf only
```
- Other libraries like `cudf-kafka` and `custreamz` can be installed with this script. For the complete list of libraries as well as details about the script usage, run the `help` command:
```bash
./build.sh --help
```

### Build, install and test cuDF libraries for contributors

The general workflow is provided below. Please, also see the last section about [code formatting](###code-formatting).

#### `libcudf` (C++)

If you're only interested in building the library (and not the unit tests):

```bash
cd $CUDF_HOME
./build.sh libcudf
```
If, in addition, you want to build tests:

```bash
./build.sh libcudf tests
```
To run the tests:

```bash
$ cd $CUDF_HOME/cpp # navigate to C/C++ CUDA source root directory
$ mkdir build # make a build directory
$ cd build # enter the build directory
make test
```

# CMake options:
# -DCMAKE_INSTALL_PREFIX set to the install path for your libraries or $CONDA_PREFIX if you're using Anaconda, i.e. -DCMAKE_INSTALL_PREFIX=/install/path or -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX
$ cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX # configure cmake ...
$ make -j # compile the libraries librmm.so, libcudf.so ... '-j' will start a parallel job using the number of physical cores available on your system
$ make install # install the libraries librmm.so, libcudf.so to the CMAKE_INSTALL_PREFIX
#### `cudf` (Python)

- First, build the `libcudf` C++ library following the steps above

- To build and install in edit/develop `cudf` python package:
```bash
cd $CUDF_HOME/python/cudf
python setup.py build_ext --inplace
python setup.py develop
```

- As a convenience, a `build.sh` script is provided in `$CUDF_HOME`. To execute the same build commands above, run the script as shown below. Note that the libraries will be installed to the location set in `$INSTALL_PREFIX` if set (i.e. `export INSTALL_PREFIX=/install/path`), otherwise to `$CONDA_PREFIX`.
- To run `cudf` tests :
```bash
$ cd $CUDF_HOME
$ ./build.sh # To build both C++ and Python cuDF versions with their dependencies
cd $CUDF_HOME/python
py.test -v cudf/cudf/tests
```
- To build only the C++ component with the script

#### `dask-cudf` (Python)

- First, build the `libcudf` C++ and `cudf` Python libraries following the steps above

- To install in edit/develop mode the `dask-cudf` python package:
```bash
$ ./build.sh libcudf # Build only the cuDF C++ components and install them to $INSTALL_PREFIX if set, otherwise $CONDA_PREFIX
cd $CUDF_HOME/python/dask_cudf
python setup.py build_ext --inplace
python setup.py develop
```

- To run tests (Optional):
- To run `dask_cudf` tests :
```bash
$ make test
cd $CUDF_HOME/python
py.test -v dask_cudf
```
- Build the `cudf` python package, in the `python/cudf` folder:

#### `libcudf_kafka` (C++)

If you're only interested in building the library (and not the unit tests):

```bash
$ cd $CUDF_HOME/python/cudf
$ python setup.py build_ext --inplace
$ python setup.py install
cd $CUDF_HOME
./build.sh libcudf_kafka
```
If, in addition, you want to build tests:

- Like the `libcudf` build step above, `build.sh` can also be used to build the `cudf` python package, as shown below:
```bash
$ cd $CUDF_HOME
$ ./build.sh cudf
./build.sh libcudf_kafka tests
```
To run the tests:

- Additionally to build the `dask-cudf` python package, in the `python/dask_cudf` folder:
```bash
$ cd $CUDF_HOME/python/dask_cudf
$ python setup.py install
make test
```

- The `build.sh` script can also be used to build the `dask-cudf` python package, as shown below:
#### `cudf-kafka` (Python)

- First, build the `libcudf` and `libcudf_kafka` following the steps above

- To install in edit/develop mode the `cudf-kafka` python package:
```bash
$ cd $CUDF_HOME
$ ./build.sh dask_cudf
cd $CUDF_HOME/python/cudf_kafka
python setup.py build_ext --inplace
python setup.py develop
```

- To run Python tests (Optional):
#### `custreamz` (Python)

- First, build `libcudf`, `libcudf_kafka`, and `cudf_kafka` following the steps above

- To install in edit/develop mode the `custreamz` python package:
```bash
$ cd $CUDF_HOME/python
$ py.test -v cudf # run cudf test suite
$ py.test -v dask_cudf # run dask_cudf test suite
cd $CUDF_HOME/python/custreamz
python setup.py build_ext --inplace
python setup.py develop
```

- Other `build.sh` options:
- To run `custreamz` tests :
```bash
$ cd $CUDF_HOME
$ ./build.sh clean # remove any prior build artifacts and configuration (start over)
$ ./build.sh libcudf -v # compile and install libcudf with verbose output
$ ./build.sh libcudf -g # compile and install libcudf for debug
$ PARALLEL_LEVEL=4 ./build.sh libcudf # compile and install libcudf limiting parallel build jobs to 4 (make -j4)
$ ./build.sh libcudf -n # compile libcudf but do not install
cd $CUDF_HOME/python
py.test -v custreamz
```

Done! You are ready to develop for the cuDF OSS project.
#### `cudf` (Java):

- First, build the `libcudf` C++ library following the steps above

- Then, refer to [Java README](https://github.com/rapidsai/cudf/blob/branch-21.10/java/README.md)


Done! You are ready to develop for the cuDF OSS project. But please go to [code formatting](###code-formatting) to ensure that you contributing code follows the expected format.

## Debugging cuDF

### Building Debug mode from source

Follow the [above instructions](#build-from-source) to build from source and add `-DCMAKE_BUILD_TYPE=Debug` to the `cmake` step.
Follow the [above instructions](####build-cudf-from-source) to build from source and add `-g` to the `./build.sh` command.

For example:
```bash
$ cmake .. -DCMAKE_INSTALL_PREFIX=/install/path -DCMAKE_BUILD_TYPE=Debug # configure cmake ... use -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX if you're using Anaconda
./build.sh libcudf -g
```

This builds `libcudf` in Debug mode which enables some `assert` safety checks and includes symbols in the library for debugging.
Expand Down Expand Up @@ -289,6 +276,7 @@ You can then use `cuda-dbg` to debug into the kernels in that source file.
Before submitting a pull request, you can do a local build and test on your machine that mimics our gpuCI environment using the `ci/local/build.sh` script.
For detailed information on usage of this script, see [here](ci/local/README.md).


## Automated Build in Docker Container

A Dockerfile is provided with a preconfigured conda environment for building and installing cuDF from source based off of the main branch.
Expand All @@ -303,11 +291,11 @@ A Dockerfile is provided with a preconfigured conda environment for building and

From cudf project root run the following, to build with defaults:
```bash
$ docker build --tag cudf .
docker build --tag cudf .
```
After the container is built run the container:
```bash
$ docker run --runtime=nvidia -it cudf bash
docker run --runtime=nvidia -it cudf bash
```
Activate the conda environment `cudf` to use the newly built cuDF and libcudf libraries:
```
Expand Down Expand Up @@ -337,6 +325,71 @@ flag. Below is a list of the available arguments and their purpose:
| `CYTHON_VERSION` | 0.29 | Not supported | set Cython version |
| `PYTHON_VERSION` | 3.7 | 3.8 | set python version |


### Code Formatting


#### Python

cuDF uses [Black](https://black.readthedocs.io/en/stable/),
[isort](https://readthedocs.org/projects/isort/), and
[flake8](http://flake8.pycqa.org/en/latest/) to ensure a consistent code format
throughout the project. They have been installed during the `cudf_dev` environment creation.

These tools are used to auto-format the Python code, as well as check the Cython
code in the repository. Additionally, there is a CI check in place to enforce
that committed code follows our standards. You can use the tools to
automatically format your python code by running:

```bash
isort --atomic python/**/*.py
black python
```

and then check the syntax of your Python and Cython code by running:

```bash
flake8 python
flake8 --config=python/.flake8.cython
```

Additionally, many editors have plugins that will apply `isort` and `Black` as
you edit files, as well as use `flake8` to report any style / syntax issues.

#### C++/CUDA

cuDF uses [`clang-format`](https://clang.llvm.org/docs/ClangFormat.html)

In order to format the C++/CUDA files, navigate to the root (`cudf`) directory and run:
```
python3 ./cpp/scripts/run-clang-format.py -inplace
```

Additionally, many editors have plugins or extensions that you can set up to automatically run `clang-format` either manually or on file save.

#### Pre-commit hooks

Optionally, you may wish to setup [pre-commit hooks](https://pre-commit.com/)
to automatically run `isort`, `Black`, `flake8` and `clang-format` when you make a git commit.
This can be done by installing `pre-commit` via `conda` or `pip`:

```bash
conda install -c conda-forge pre_commit
```

```bash
pip install pre-commit
```

and then running:

```bash
pre-commit install
```

from the root of the cuDF repository. Now `isort`, `Black`, `flake8` and `clang-format` will be
run each time you commit changes.

---

## Attribution
Expand Down
2 changes: 1 addition & 1 deletion ci/gpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ gpuci_mamba_retry install -y \
"rapids-notebook-env=$MINOR_VERSION.*" \
"dask-cuda=${MINOR_VERSION}" \
"rmm=$MINOR_VERSION.*" \
"ucx-py=0.21.*"
"ucx-py=0.22.*"

# https://docs.rapids.ai/maintainers/depmgmt/
# gpuci_mamba_retry remove --force rapids-build-env rapids-notebook-env
Expand Down
Loading

0 comments on commit 5bd4321

Please sign in to comment.