Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add clear indication of non-GPU accelerated parameters in read_json docstring #11825

Merged
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 79 additions & 12 deletions python/cudf/cudf/utils/ioutils.py
Original file line number Diff line number Diff line change
Expand Up @@ -451,7 +451,7 @@
"""
doc_to_orc = docfmt_partial(docstring=_docstring_to_orc)

_docstring_read_json = """
_docstring_read_json = r"""
Load a JSON dataset into a DataFrame

Parameters
Expand All @@ -466,8 +466,14 @@
engine : {{ 'auto', 'cudf', 'cudf_experimental', 'pandas' }}, default 'auto'
Parser engine to use. If 'auto' is passed, the engine will be
automatically selected based on the other parameters.
orient : string,
Indication of expected JSON string format (pandas engine only).
orient : string

.. admonition:: Not GPU-accelerated

galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
This parameter is only supported with pandas engine.
(i.e., ``engine='pandas'``)
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

Indication of expected JSON string format.
Compatible JSON strings can be produced by ``to_json()`` with a
corresponding orient value.
The set of possible orients is:
Expand Down Expand Up @@ -500,12 +506,25 @@
typ : type of object to recover (series or frame), default 'frame'
With cudf engine, only frame output is supported.
dtype : boolean or dict, default True
If True, infer dtypes, if a dict of column to dtype, then use those,
if False, then don't infer dtypes at all, applies only to the data.
If True, infer dtypes for all columns; if False, then don't infer dtypes at all,
if a dict, provide a mapping from column names to their respective dtype (any missing
columns will have their dtype inferred). Applies only to the data.
convert_axes : boolean, default True
Try to convert the axes to the proper dtypes (pandas engine only).

.. admonition:: Not GPU-accelerated

This parameter is only supported with pandas engine.
(i.e., ``engine='pandas'``)
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

Try to convert the axes to the proper dtypes.
convert_dates : boolean, default True
List of columns to parse for dates (pandas engine only); If True, then try

.. admonition:: Not GPU-accelerated

This parameter is only supported with pandas engine.
(i.e., ``engine='pandas'``)
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

List of columns to parse for dates; If True, then try
to parse datelike columns default is True; a column label is datelike if

* it ends with ``'_at'``,
Expand All @@ -514,27 +533,63 @@
* it is ``'modified'``, or
* it is ``'date'``
keep_default_dates : boolean, default True
If parsing dates, parse the default datelike columns (pandas engine only)

.. admonition:: Not GPU-accelerated

This parameter is only supported with pandas engine.
(i.e., ``engine='pandas'``)
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

If parsing dates, parse the default datelike columns.
numpy : boolean, default False
Direct decoding to numpy arrays (pandas engine only). Supports numeric

.. admonition:: Not GPU-accelerated

This parameter is only supported with pandas engine.
(i.e., ``engine='pandas'``)
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

Direct decoding to numpy arrays. Supports numeric
data only, but non-numeric column and index labels are supported. Note
also that the JSON ordering MUST be the same for each term if numpy=True.
precise_float : boolean, default False

.. admonition:: Not GPU-accelerated

This parameter is only supported with pandas engine.
(i.e., ``engine='pandas'``)
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

Set to enable usage of higher precision (strtod) function when
decoding string to double values (pandas engine only). Default (False)
is to use fast but less precise builtin functionality
date_unit : string, default None
The timestamp unit to detect if converting dates (pandas engine only).

.. admonition:: Not GPU-accelerated

This parameter is only supported with pandas engine.
(i.e., ``engine='pandas'``)
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

The timestamp unit to detect if converting dates.
The default behavior is to try and detect the correct precision, but if
this is not desired then pass one of 's', 'ms', 'us' or 'ns' to force
parsing only seconds, milliseconds, microseconds or nanoseconds.
encoding : str, default is 'utf-8'

.. admonition:: Not GPU-accelerated

This parameter is only supported with pandas engine.
(i.e., ``engine='pandas'``)
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

The encoding to use to decode py3 bytes.
With cudf engine, only utf-8 is supported.
lines : boolean, default False
Read the file as a json object per line.
chunksize : integer, default None
Return JsonReader object for iteration (pandas engine only).

.. admonition:: Not GPU-accelerated

This parameter is only supported with pandas engine.
(i.e., ``engine='pandas'``)
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

Return JsonReader object for iteration.
See the `line-delimited json docs
<http://pandas.pydata.org/pandas-docs/stable/io.html#io-jsonl>`_
for more information on ``chunksize``.
Expand All @@ -547,12 +602,24 @@
otherwise. If using 'zip', the ZIP file must contain only one data
file to be read in. Set to None for no decompression.
byte_range : list or tuple, default None
Byte range within the input file to be read (cudf engine only).

.. admonition:: GPU-accelerated

This parameter is only supported with cudf engine.
(i.e., ``engine='cudf'``)
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

Byte range within the input file to be read.
The first number is the offset in bytes, the second number is the range
size in bytes. Set the size to zero to read all data after the offset
location. Reads the row that starts before or at the end of the range,
even if it ends after the end of the range.
keep_quotes : bool, default False

.. admonition:: GPU-accelerated experimental feature

This parameter is only supported with cudf experimental engine.
(i.e., ``engine='cudf_experimental'``)
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

This parameter is only supported in ``cudf_experimental`` engine.
If `True`, any string values are read literally (and wrapped in an
additional set of quotes).
Expand Down