Skip to content

Commit

Permalink
Add docstring for dask_cudf.read_csv (#8355)
Browse files Browse the repository at this point in the history
Fixes: #2277 

This PR adds python docstring for `dask_cudf.read_csv` API.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Keith Kraus (https://github.com/kkraus14)
  - Ayush Dattagupta (https://github.com/ayushdg)

URL: #8355
  • Loading branch information
galipremsagar authored May 26, 2021
1 parent e598361 commit ddba88d
Showing 1 changed file with 47 additions and 0 deletions.
47 changes: 47 additions & 0 deletions python/dask_cudf/dask_cudf/io/csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,53 @@


def read_csv(path, chunksize="256 MiB", **kwargs):
"""
Read CSV files into a dask_cudf.DataFrame
This API parallelizes the ``cudf.read_csv`` function in the following ways:
It supports loading many files at once using globstrings:
>>> import dask_cudf
>>> df = dask_cudf.read_csv("myfiles.*.csv")
In some cases it can break up large files:
>>> df = dask_cudf.read_csv("largefile.csv", chunksize="256 MiB")
It can read CSV files from external resources (e.g. S3, HTTP, FTP)
>>> df = dask_cudf.read_csv("s3://bucket/myfiles.*.csv")
>>> df = dask_cudf.read_csv("https://www.mycloud.com/sample.csv")
Internally ``dask_cudf.read_csv`` uses ``cudf.read_csv`` and supports
many of the same keyword arguments with the same performance guarantees.
See the docstring for ``cudf.read_csv()`` for more information on available
keyword arguments.
Parameters
----------
path : str, path object, or file-like object
Either a path to a file (a str, pathlib.Path, or
py._path.local.LocalPath), URL (including http, ftp, and S3 locations),
or any object with a read() method (such as builtin open() file
handler function or StringIO).
chunksize : int or str, default "256 MiB"
The target task partition size. If `None`, a single block
is used for each file.
**kwargs : dict
Passthrough key-word arguments that are sent to ``cudf.read_csv``.
Examples
--------
>>> import dask_cudf
>>> ddf = dask_cudf.read_csv("sample.csv", usecols=["a", "b"])
>>> ddf.compute()
a b
0 1 hi
1 2 hello
2 3 ai
"""
if "://" in str(path):
func = make_reader(cudf.read_csv, "read_csv", "CSV")
return func(path, blocksize=chunksize, **kwargs)
Expand Down

0 comments on commit ddba88d

Please sign in to comment.