Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README #14374

Merged
merged 12 commits into from
Nov 8, 2023
54 changes: 33 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,51 @@
# <div align="left"><img src="img/rapids_logo.png" width="90px"/>&nbsp;cuDF - GPU DataFrames</div>

**NOTE:** For the latest stable [README.md](https://github.com/rapidsai/cudf/blob/main/README.md) ensure you are on the `main` branch.
cuDF is a GPU DataFrame library for loading joining, aggregating,
filtering, and otherwise manipulating data. cuDF leverages
[libcudf](https://github.com/rapidsai/cudf/pull/14374/), a
shwina marked this conversation as resolved.
Show resolved Hide resolved
shwina marked this conversation as resolved.
Show resolved Hide resolved
blazing-fast C++/CUDA dataframe library and the [Apache
Arrow](https://arrow.apache.org/) columnar format to provide a
GPU-acclerated pandas API.
shwina marked this conversation as resolved.
Show resolved Hide resolved

## Resources
You can use cuDF directly, or easily accelerate existing pandas
code with `cudf.pandas`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
code with `cudf.pandas`:
code with `cudf.pandas`, with automatic CPU fallback if needed:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved things around a bit to include this.


- [cuDF Reference Documentation](https://docs.rapids.ai/api/cudf/stable/): Python API reference, tutorials, and topic guides.
- [libcudf Reference Documentation](https://docs.rapids.ai/api/libcudf/stable/): C/C++ CUDA library API reference.
- [Getting Started](https://rapids.ai/start.html): Instructions for installing cuDF.
- [RAPIDS Community](https://rapids.ai/community.html): Get help, contribute, and collaborate.
- [GitHub repository](https://github.com/rapidsai/cudf): Download the cuDF source code.
- [Issue tracker](https://github.com/rapidsai/cudf/issues): Report issues or request features.
**Use cuDF directly**:

## Overview
```python
import cudf
import requests
from io import StringIO

url = "https://github.com/plotly/datasets/raw/master/tips.csv"
content = requests.get(url).content.decode('utf-8')
shwina marked this conversation as resolved.
Show resolved Hide resolved

tips_df = cudf.read_csv(StringIO(content))
tips_df['tip_percentage'] = tips_df['tip'] / tips_df['total_bill'] * 100
shwina marked this conversation as resolved.
Show resolved Hide resolved

Built based on the [Apache Arrow](http://arrow.apache.org/) columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
# display average tip by dining party size
print(tips_df.groupby('size').tip_percentage.mean())
shwina marked this conversation as resolved.
Show resolved Hide resolved
```

cuDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.
**Use `cudf.pandas` with existing pandas code**:

For example, the following snippet downloads a CSV, then uses the GPU to parse it into rows and columns and run calculations:
```python
import cudf, requests
%load_ext cudf.pandas # pandas operations now use the GPU!

import pandas as pd
import requests
from io import StringIO

url = "https://github.com/plotly/datasets/raw/master/tips.csv"
content = requests.get(url).content.decode('utf-8')
shwina marked this conversation as resolved.
Show resolved Hide resolved

tips_df = cudf.read_csv(StringIO(content))
tips_df = pd.read_csv(StringIO(content))
tips_df['tip_percentage'] = tips_df['tip'] / tips_df['total_bill'] * 100

# display average tip by dining party size
print(tips_df.groupby('size').tip_percentage.mean())
shwina marked this conversation as resolved.
Show resolved Hide resolved
```

Output:
```
shwina marked this conversation as resolved.
Show resolved Hide resolved
size
1 21.729201548727808
Expand All @@ -44,15 +57,14 @@ size
Name: tip_percentage, dtype: float64
```

For additional examples, browse our complete [API documentation](https://docs.rapids.ai/api/cudf/stable/), or check out our more detailed [notebooks](https://github.com/rapidsai/notebooks-contrib).

## Quick Start

Please see the [Demo Docker Repository](https://hub.docker.com/r/rapidsai/rapidsai/), choosing a tag based on the NVIDIA CUDA version you're running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize cuDF.
- [Try it now](https://nvda.ws/rapids-cudf): Explore `cudf.pandas` on a free GPU enabled instance on Google Colab!
- [Install](https://rapids.ai/start.html): Instructions for installing cuDF and other [RAPIDS](https://rapids.ai) libraries.
- [Python documentation](https://docs.rapids.ai/api/cudf/stable/)
- [libcudf (C++/CUDA) documentation](https://docs.rapids.ai/api/libcudf/stable/)
- [RAPIDS Community](https://rapids.ai/community.html): Get help, contribute, and collaborate.
shwina marked this conversation as resolved.
Show resolved Hide resolved

## Installation


### CUDA/GPU requirements

* CUDA 11.2+
Expand Down
Loading