Skip to content

Commit

Permalink
More prose
Browse files Browse the repository at this point in the history
  • Loading branch information
wence- committed Feb 8, 2023
1 parent 18c00e2 commit cdf6e19
Show file tree
Hide file tree
Showing 2 changed files with 83 additions and 3 deletions.
12 changes: 10 additions & 2 deletions docs/dask_cudf/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ data reading facilities, followed by calling
read_text,
read_parquet

.. warning::

FIXME: where should the following live?

.. autofunction:: dask_cudf.concat

.. autofunction:: dask_cudf.from_delayed

Grouping
========

Expand All @@ -51,8 +59,8 @@ if possible.
.. autofunction:: dask_cudf.groupby_agg


Dask Collections
================
DataFrames and Series
=====================

The core distributed objects provided by Dask-cuDF are the
:class:`.DataFrame` and :class:`.Series`. These inherit respectively
Expand Down
74 changes: 73 additions & 1 deletion docs/dask_cudf/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,81 @@ When running on multi-GPU systems, `Dask-CUDA
simplify the setup of the cluster, taking advantage of all features of
the GPU and networking hardware.

Using Dask-cuDF
---------------

When installed, Dask-cuDF registers itself as a dataframe backend for
Dask. This means that in many cases, using cuDF-backed dataframes requires
only small changes to an existing workflow. The minimal change is to
select cuDF as the dataframe backend in :doc:`Dask's
configuration <dask:configuration>`. To do so, we must set the option
``dataframe.backend`` to ``cudf``. From Python, this can be achieved
like so::

import dask

dask.config.set({"dataframe.backend": "cudf"})

Alternatively, you can set ``DASK_DATAFRAME__BACKEND=cudf`` in the
environment before running your code.

Dataframe creation from on-disk formats
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If your workflow creates Dask dataframes from on-disk formats
(for example using :func:`dask.dataframe.read_parquet`), then setting
the backend may well be enough to migrate your workflow.

For example, consider reading a dataframe from parquet::

import dask.dataframe as dd

# By default, we obtain a pandas-backed dataframe
df = dd.read_parquet("data.parquet", ...)


To obtain a cuDF-backed dataframe, we must set the
``dataframe.backend`` configuration option::

import dask
import dask.dataframe as dd

dask.config.set({"dataframe.backend": "cudf"})
# This gives us a cuDF-backed dataframe
df = dd.read_parquet("data.parquet", ...)

This code will use cuDF's GPU-accelerated :func:`parquet reader
<cudf.read_parquet>` to read partitions of the data.

Dataframe creation from in-memory formats
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you already have a dataframe in memory and want to convert it to a
cuDF-backend one, there are two options depending on whether the
dataframe is already a Dask one or not. If you have a Dask dataframe,
then :func:`dask_cudf.from_dask_dataframe` will convert for you; if
you have a pandas dataframe then you can either call
:func:`dask.dataframe.from_pandas` followed by
:func:`~dask_cudf.from_dask_dataframe` or first convert the dataframe
with :func:`cudf.from_pandas` and then parallelise this with
:func:`dask_cudf.from_cudf`.

API Reference
-------------

Generally speaking, Dask-cuDF tries to offer exactly the same API as
Dask itself. There are, however, some minor differences mostly because
cuDF does not :doc:`perfectly mirror <cudf:user_guide/PandasCompat>`
the pandas API, or because cuDF provides additional configuration
flags (these mostly occur in data reading and writing interfaces).

As a result, straightforward workflows can be migrated without too
much trouble, but more complex ones that utilise more features may
need a bit of tweaking. The API documentation describes details of the
differences and all functionality that Dask-cuDF supports.

.. toctree::
:maxdepth: 2
:caption: Contents:

api

Expand Down

0 comments on commit cdf6e19

Please sign in to comment.