Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructuring Contributing doc [skip ci] #9026

Merged
merged 30 commits into from
Aug 26, 2021
Merged
Changes from 22 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
d9ed1ba
doc restructuration
iskode Aug 12, 2021
23ea87c
dependency + build-from-source updates
iskode Aug 13, 2021
0a0b0ab
emphasis on env files
iskode Aug 13, 2021
edf19e9
build-from-source instruction updates
iskode Aug 13, 2021
69eb32a
Merge branch 'contributing-doc' of github.com:iskode/cudf into contri…
iskode Aug 13, 2021
e03693d
Update `Python cudf contributor` title
iskode Aug 13, 2021
ec951fb
workflow paragraph update
iskode Aug 13, 2021
6e1d901
update `Any contributor` title
iskode Aug 13, 2021
45b0169
remove `build` in `dask-cudf` installation paragraph
iskode Aug 13, 2021
f4e9451
Remove `cd $CUDF_HOME` in build-from-source
iskode Aug 13, 2021
1ea8cdb
update other libraries paragraph in build-from-source section
iskode Aug 13, 2021
f8f37db
emphasis on cudf_dev_cuda*.yml
iskode Aug 13, 2021
5900635
Merge branch 'contributing-doc' of github.com:iskode/cudf into contri…
iskode Aug 13, 2021
0cd676a
add step in dask_cudf install
iskode Aug 13, 2021
e68ed18
add kafka and custreamz sections
iskode Aug 17, 2021
f7515cb
add section for each library
iskode Aug 17, 2021
d3073d0
remove `requirements` in cudf installation
iskode Aug 26, 2021
7e167ae
remove folder specification in cudf installation
iskode Aug 26, 2021
2859c1d
remove contributor label in C++ libcudf install instructions
iskode Aug 26, 2021
f804fbe
remove contributor label in dask-cudf install instructions
iskode Aug 26, 2021
df483bf
add `installing` in cudf section title
iskode Aug 26, 2021
e7159b1
reformulate requirement instructions in all subsections
iskode Aug 26, 2021
4d24281
remove language names
iskode Aug 26, 2021
f96e46e
remove contributor label in libcudf_kafka section
iskode Aug 26, 2021
793867a
reorganize C++ libraries' instructions
iskode Aug 26, 2021
98de65a
refactor subsection titles and remove redundant words
iskode Aug 26, 2021
dd93ade
fix typo
iskode Aug 26, 2021
f792bf2
remove $ in all bash codes
iskode Aug 26, 2021
7a628ac
correct cases
iskode Aug 26, 2021
9706118
correct cases
iskode Aug 26, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
283 changes: 163 additions & 120 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,99 +55,25 @@ implementation of the issue, ask them in the issue instead of the PR.

The following instructions are for developers and contributors to cuDF OSS development. These instructions are tested on Linux Ubuntu 16.04 & 18.04. Use these instructions to build cuDF from source and contribute to its development. Other operating systems may be compatible, but are not currently tested.

### Code Formatting

#### Python

cuDF uses [Black](https://black.readthedocs.io/en/stable/),
[isort](https://readthedocs.org/projects/isort/), and
[flake8](http://flake8.pycqa.org/en/latest/) to ensure a consistent code format
throughout the project. `Black`, `isort`, and `flake8` can be installed with
`conda` or `pip`:

```bash
conda install black isort flake8
```

```bash
pip install black isort flake8
```

These tools are used to auto-format the Python code, as well as check the Cython
code in the repository. Additionally, there is a CI check in place to enforce
that committed code follows our standards. You can use the tools to
automatically format your python code by running:

```bash
isort --atomic python/**/*.py
black python
```

and then check the syntax of your Python and Cython code by running:

```bash
flake8 python
flake8 --config=python/.flake8.cython
```

Additionally, many editors have plugins that will apply `isort` and `Black` as
you edit files, as well as use `flake8` to report any style / syntax issues.

#### C++/CUDA

cuDF uses [`clang-format`](https://clang.llvm.org/docs/ClangFormat.html)
### General requirements

In order to format the C++/CUDA files, navigate to the root (`cudf`) directory and run:
```
python3 ./cpp/scripts/run-clang-format.py -inplace
```

Additionally, many editors have plugins or extensions that you can set up to automatically run `clang-format` either manually or on file save.

#### Pre-commit hooks

Optionally, you may wish to setup [pre-commit hooks](https://pre-commit.com/)
to automatically run `isort`, `Black`, `flake8` and `clang-format` when you make a git commit.
This can be done by installing `pre-commit` via `conda` or `pip`:

```bash
conda install -c conda-forge pre_commit
```

```bash
pip install pre-commit
```

and then running:

```bash
pre-commit install
```

from the root of the cuDF repository. Now `isort`, `Black`, `flake8` and `clang-format` will be
run each time you commit changes.

### Get libcudf Dependencies

Compiler requirements:
Compilers:

* `gcc` version 9.3+
* `nvcc` version 11.0+
* `cmake` version 3.20.1+

CUDA/GPU requirements:
CUDA/GPU:

* CUDA 11.0+
* NVIDIA driver 450.80.02+
* Pascal architecture or better

You can obtain CUDA from [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads).

## Script to build cuDF from source

### Build from Source

To install cuDF from source, ensure the dependencies are met and follow the steps below:
### Create the build Environment

- Clone the repository and submodules
```bash
Expand All @@ -166,86 +92,137 @@ conda activate cudf_dev
```
- For other CUDA versions, check the corresponding cudf_dev_cuda*.yml file in conda/environments
iskode marked this conversation as resolved.
Show resolved Hide resolved

- Build and install `libcudf` after its dependencies. CMake depends on the `nvcc` executable being on your path or defined in `$CUDACXX`.
```bash
$ cd $CUDF_HOME/cpp # navigate to C/C++ CUDA source root directory
$ mkdir build # make a build directory
$ cd build # enter the build directory

# CMake options:
# -DCMAKE_INSTALL_PREFIX set to the install path for your libraries or $CONDA_PREFIX if you're using Anaconda, i.e. -DCMAKE_INSTALL_PREFIX=/install/path or -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX
$ cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX # configure cmake ...
$ make -j # compile the libraries librmm.so, libcudf.so ... '-j' will start a parallel job using the number of physical cores available on your system
$ make install # install the libraries librmm.so, libcudf.so to the CMAKE_INSTALL_PREFIX
```
### Build cuDF from Source
iskode marked this conversation as resolved.
Show resolved Hide resolved

- As a convenience, a `build.sh` script is provided in `$CUDF_HOME`. To execute the same build commands above, run the script as shown below. Note that the libraries will be installed to the location set in `$INSTALL_PREFIX` if set (i.e. `export INSTALL_PREFIX=/install/path`), otherwise to `$CONDA_PREFIX`.
- A `build.sh` script is provided in `$CUDF_HOME`. Running the script with no additional arguments will install the `libcudf`, `cudf` and `dask_cudf` libraries. By default, the libraries are installed to the `$CONDA_PREFIX` directory. To install into a different location, set the location in `$INSTALL_PREFIX`. Finally, note that the script depends on the `nvcc` executable being on your path, or defined in `$CUDACXX`.
```bash
$ cd $CUDF_HOME
$ ./build.sh # To build both C++ and Python cuDF versions with their dependencies
cd $CUDF_HOME

# Choose one of the following commands, depending on whether
# you want to build and install the libcudf C++ library only,
# or include the cudf and/or dask_cudf Python libraries:

./build.sh # libcudf, cudf and dask_cudf
./build.sh libcudf # libcudf only
./build.sh libcudf cudf # libcudf and cudf only
```
- To build only the C++ component with the script
- Other libraries like `cudf-kafka` and `custreamz` can be installed with this script. For the complete list of libraries as well as details about the script usage, run the `help` command:
```bash
$ ./build.sh libcudf # Build only the cuDF C++ components and install them to $INSTALL_PREFIX if set, otherwise $CONDA_PREFIX
$ ./build.sh --help
```

- To run tests (Optional):
### Install and Test cuDF for contributors

The general workflow for building and testing the C++ and Python components of cuDF are provided below. Please also see the last section about [code formatting](###code-formatting).

#### Building and testing the `libcudf` C++ library

This section provides instructions for building and testing C++ libcudf.

```bash
$ make test
$ cd $CUDF_HOME
$ # for C++ contributors
$./build.sh libcudf tests # building C++ cuDF and test components
$ make test # running C++ cuDF unit tests
$ # for other contributors
$./build.sh libcudf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid grouping contributors into "C++", "cudf Python", etc., Many contributions span multiple components.

Instead, I think we should just provide two sets of instructions, one just for building the library, and another for building and running tests. For example, something like:

If you're only interested in building the library (and not the unit tests):

cd $CUDF_HOME
./build.sh libcudf

If, in addition, you want to build and run tests:

./build.sh libcudf tests

To run the tests:

make test

Copy link
Contributor Author

@iskode iskode Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is already the case for Python libraries where we have one set for building in edit mode and the other for running tests. This concerns libcudf and libcudf_kafka.

```
- Build the `cudf` python package, in the `python/cudf` folder:

#### Building, installing, and testing the `cudf` Python library

- First, build the `libcudf` C++ library following the steps above

- To build and install in edit/develop `cudf` python package in edit/develop mode:
```bash
$ cd $CUDF_HOME/python/cudf
$ python setup.py build_ext --inplace
$ python setup.py install
$ python setup.py develop
```

- Like the `libcudf` build step above, `build.sh` can also be used to build the `cudf` python package, as shown below:
- To run `cudf` tests :
```bash
$ cd $CUDF_HOME
$ ./build.sh cudf
$ cd $CUDF_HOME/python
$ py.test -v cudf/cudf/tests # run cudf test suite
```

- Additionally to build the `dask-cudf` python package, in the `python/dask_cudf` folder:
#### Installing and testing the `dask-cudf` Python library

- First, build the `libcudf` C++ and `cudf` Python libraries following the steps above

- To install in edit/develop mode the `dask-cudf` python package:
```bash
$ cd $CUDF_HOME/python/dask_cudf
$ python setup.py install
$ python setup.py build_ext --inplace
$ python setup.py develop
```

- To run `dask_cudf` tests :
```bash
$ cd $CUDF_HOME/python
$ py.test -v dask_cudf # run dask_cudf test suite
```

- The `build.sh` script can also be used to build the `dask-cudf` python package, as shown below:
#### Building and testing the `libcudf_kafka` C++ library

This section provides instructions for building and testing C++ libcudf_kafka. Thus, C++ libcudf_kafka only contributor will be all set after following instructions below:
iskode marked this conversation as resolved.
Show resolved Hide resolved

- First, build the `libcudf` C++ library following the steps above

```bash
$ cd $CUDF_HOME
$ ./build.sh dask_cudf
$ # for C++ contributors
$./build.sh libcudf_kafka tests # building C++ cudf_kafka and test components
$ make test # running C++ cudf_kafka unit tests
$ # for other contributors
$./build.sh libcudf_kafka
```

- To run Python tests (Optional):
#### Python cudf-kafka contributors:

- First, build the `libcudf` and `libcudf_kafka` C++ libraries following the steps above

- To install in edit/develop mode the `cudf-kafka` python package:
```bash
$ cd $CUDF_HOME/python
$ py.test -v cudf # run cudf test suite
$ py.test -v dask_cudf # run dask_cudf test suite
$ cd $CUDF_HOME/python/cudf_kafka
$ python setup.py build_ext --inplace
$ python setup.py develop
```

#### Python custreamz contributors:

- First, build C++ `libcudf`, C++ `libcudf_kafka`, and Python `cudf_kafka` following the steps above
iskode marked this conversation as resolved.
Show resolved Hide resolved

- To install in edit/develop mode the `custreamz` python package:
```bash
$ cd $CUDF_HOME/python/custreamz
$ python setup.py build_ext --inplace
$ python setup.py develop
```

- Other `build.sh` options:
- To run `custreamz` tests :
```bash
$ cd $CUDF_HOME
$ ./build.sh clean # remove any prior build artifacts and configuration (start over)
$ ./build.sh libcudf -v # compile and install libcudf with verbose output
$ ./build.sh libcudf -g # compile and install libcudf for debug
$ PARALLEL_LEVEL=4 ./build.sh libcudf # compile and install libcudf limiting parallel build jobs to 4 (make -j4)
$ ./build.sh libcudf -n # compile libcudf but do not install
$ cd $CUDF_HOME/python
$ py.test -v custreamz # run custreamz test suite
```

Done! You are ready to develop for the cuDF OSS project.
#### Java contributors:

- First, build the `libcudf` C++ library following the steps above

Please refer to [Java README](https://github.com/rapidsai/cudf/blob/branch-21.10/java/README.md)


Done! You are ready to develop for the cuDF OSS project. But please go to [code formatting](###code-formatting) to ensure that you contributing code follows the expected format.

## Debugging cuDF

### Building Debug mode from source

Follow the [above instructions](#build-from-source) to build from source and add `-DCMAKE_BUILD_TYPE=Debug` to the `cmake` step.
Follow the [above instructions](####build-cudf-from-source) to build from source and add `-g` to the `./build.sh` command.

For example:
```bash
$ cmake .. -DCMAKE_INSTALL_PREFIX=/install/path -DCMAKE_BUILD_TYPE=Debug # configure cmake ... use -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX if you're using Anaconda
$ ./build.sh libcudf -g
```

This builds `libcudf` in Debug mode which enables some `assert` safety checks and includes symbols in the library for debugging.
Expand Down Expand Up @@ -289,6 +266,7 @@ You can then use `cuda-dbg` to debug into the kernels in that source file.
Before submitting a pull request, you can do a local build and test on your machine that mimics our gpuCI environment using the `ci/local/build.sh` script.
For detailed information on usage of this script, see [here](ci/local/README.md).


## Automated Build in Docker Container

A Dockerfile is provided with a preconfigured conda environment for building and installing cuDF from source based off of the main branch.
Expand Down Expand Up @@ -337,6 +315,71 @@ flag. Below is a list of the available arguments and their purpose:
| `CYTHON_VERSION` | 0.29 | Not supported | set Cython version |
| `PYTHON_VERSION` | 3.7 | 3.8 | set python version |


### Code Formatting


#### Python

cuDF uses [Black](https://black.readthedocs.io/en/stable/),
[isort](https://readthedocs.org/projects/isort/), and
[flake8](http://flake8.pycqa.org/en/latest/) to ensure a consistent code format
throughout the project. They have been installed during the `cudf_dev` environment creation.

These tools are used to auto-format the Python code, as well as check the Cython
code in the repository. Additionally, there is a CI check in place to enforce
that committed code follows our standards. You can use the tools to
automatically format your python code by running:

```bash
isort --atomic python/**/*.py
black python
```

and then check the syntax of your Python and Cython code by running:

```bash
flake8 python
flake8 --config=python/.flake8.cython
```

Additionally, many editors have plugins that will apply `isort` and `Black` as
you edit files, as well as use `flake8` to report any style / syntax issues.

#### C++/CUDA

cuDF uses [`clang-format`](https://clang.llvm.org/docs/ClangFormat.html)

In order to format the C++/CUDA files, navigate to the root (`cudf`) directory and run:
```
python3 ./cpp/scripts/run-clang-format.py -inplace
```

Additionally, many editors have plugins or extensions that you can set up to automatically run `clang-format` either manually or on file save.

#### Pre-commit hooks

Optionally, you may wish to setup [pre-commit hooks](https://pre-commit.com/)
to automatically run `isort`, `Black`, `flake8` and `clang-format` when you make a git commit.
This can be done by installing `pre-commit` via `conda` or `pip`:

```bash
conda install -c conda-forge pre_commit
```

```bash
pip install pre-commit
```

and then running:

```bash
pre-commit install
```

from the root of the cuDF repository. Now `isort`, `Black`, `flake8` and `clang-format` will be
run each time you commit changes.

---

## Attribution
Expand Down