Skip to content

Commit

Permalink
Changes after review
Browse files Browse the repository at this point in the history
  • Loading branch information
shwina committed May 4, 2022
1 parent f8bc555 commit 52fc1bf
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 37 deletions.
2 changes: 1 addition & 1 deletion docs/cudf/source/user_guide/dask-cudf.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ The following is tested and expected to work:
- Support for reductions on full dataframes
- `std`
- Custom reductions with
[dask.dataframe.reduction](http://docs.dask.org/en/latest/generated/dask.dataframe.Series.reduction.html)
[dask.dataframe.reduction](https://docs.dask.org/en/latest/generated/dask.dataframe.Series.reduction.html)

- Groupby aggregations

Expand Down
30 changes: 13 additions & 17 deletions docs/cudf/source/user_guide/data-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ numeric, datetime, timedelta, categorical and string data types. We
also provide special data types for working with decimals, list-like,
and dictionary-like data.

All data types in cuDF are [nullable](/user_guide/missing-data).
All data types in cuDF are [nullable](missing-data).

<div class="special-table">

Expand Down Expand Up @@ -34,10 +34,14 @@ ways to specify the `float32` data type:
```python
>>> import cudf
>>> s = cudf.Series([1, 2, 3], dtype="float32")
>>> print(s)
>>> s
0 1.0
1 2.0
2 3.0
dtype: float32
```

## A note on ``object``
## A note on `object`

The data type associated with string data in cuDF is `"np.object"`.

Expand All @@ -60,7 +64,8 @@ We provide special data types for working with decimal data, namely
data types when you need to store values with greater precision than
allowed by floating-point representation.

A decimal data type is composed of a _precision_ and a _scale_. The
Decimal data types in cuDF are based on fixed-point representation. A
decimal data type is composed of a _precision_ and a _scale_. The
precision represents the total number of digits in each value of this
dtype. For example, the precision associated with the decimal value
`1.023` is `4`. The scale is the total number of digits to the right
Expand All @@ -72,10 +77,8 @@ Each decimal data type is associated with a maximum precision:
```python
>>> cudf.Decimal32Dtype.MAX_PRECISION
9.0

>>> cudf.Decimal64Dtype.MAX_PRECISION
18.0

>>> cudf.Decimal128Dtype.MAX_PRECISION
38
```
Expand All @@ -85,24 +88,20 @@ One way to create a decimal Series is from values of type [decimal.Decimal][pyth
```python
>>> from decimal import Decimal
>>> s = cudf.Series([Decimal("1.01"), Decimal("4.23"), Decimal("0.5")])

>>> s
0 1.01
1 4.23
2 0.50
dtype: decimal128

>>> s.dtype
>>> Decimal128Dtype(precision=3, scale=2)
Decimal128Dtype(precision=3, scale=2)
```

Notice the data type of the result: `1.01`, `4.23`, `0.50` can all be
represented with a precision at least equal to 3 and a scale at least
equal to 2.
represented with a precision at least 3 and a scale at least 2.

However, the value `1.234` needs a precision at least equal to 4, and
a scale at least equal to 3, and cannot be fully represented
using this data type:
However, the value `1.234` needs a precision at least 4, and a scale
at least 3, and cannot be fully represented using this data type:

```python
>>> s[1] = Decimal("1.234") # raises an error
Expand All @@ -124,7 +123,6 @@ lists and dictionaries respectively:
0 {'a': 1, 'b': 2}
1 {'a': 3, 'b': 4}
dtype: object

>>> gsr = cudf.from_pandas(psr)
>>> gsr
0 {'a': 1, 'b': 2}
Expand All @@ -140,14 +138,12 @@ nested data](io).
```python
>>> pdf = pd.DataFrame({"a": [[1, 2], [3, 4, 5], [6, 7, 8]]})
>>> pdf.to_parquet("lists.pq")

>>> gdf = cudf.read_parquet("lists.pq")
>>> gdf
a
0 [1, 2]
1 [3, 4, 5]
2 [6, 7, 8]

>>> gdf["a"].dtype
ListDtype(int64)
```
Expand Down
20 changes: 12 additions & 8 deletions docs/cudf/source/user_guide/groupby.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,18 +35,24 @@ A GroupBy object is created by grouping the values of a `Series` or
`DataFrame` by one or more columns:

```python
import cudf

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 1, 1, 2, 2], 'b': [1, 1, 2, 2, 3], 'c': [1, 2, 3, 4, 5]})
>>> df
a b c
0 1 1 1
1 1 1 2
2 1 2 3
3 2 2 4
4 2 3 5
>>> gb1 = df.groupby('a') # grouping by a single column
>>> gb2 = df.groupby(['a', 'b']) # grouping by multiple columns
>>> gb3 = df.groupby(cudf.Series(['a', 'a', 'b', 'b', 'b'])) # grouping by an external column
```

````{warning}
cuDF uses `sort=False` by default to achieve better performance, which provides no gaurentee to the group order in outputs. This deviates from Pandas default behavior.
Unlike Pandas, cuDF uses `sort=False` by default to achieve better
performance, which does not guarantee any particular group order in
the result.
For example:
Expand Down Expand Up @@ -107,7 +113,7 @@ b

## Aggregation

Aggregations on groups is supported via the `agg` method:
Aggregations on groups are supported via the `agg` method:

```python
>>> df
Expand Down Expand Up @@ -209,7 +215,7 @@ a
- `apply` works by applying the provided function to each group
sequentially, and concatenating the results together. **This can be
very slow**, especially for a large number of small groups. For a
small number of large groups, it can give acceptable performance
small number of large groups, it can give acceptable performance.
- The results may not always match Pandas exactly. For example, cuDF
may return a `DataFrame` containing a single column where Pandas
returns a `Series`. Some post-processing may be required to match
Expand All @@ -218,8 +224,6 @@ a
supports with `apply`, such as calling [describe] inside the
callable.

>
## Transform

The `.transform()` method aggregates per group, and broadcasts the
Expand Down
21 changes: 10 additions & 11 deletions docs/cudf/source/user_guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,14 @@
```{toctree}
:maxdepth: 2
10min.md
pandas-comparison.rst
data-types.rst
io.rst
missing-data.md
groupby.rst
guide-to-udfs.md
cupy-interop.md
dask-cudf.rst
internals.rst
PandasCompat.rst
10min
data-types
io
missing-data
groupby
guide-to-udfs
cupy-interop
dask-cudf
internals
PandasCompat
```

0 comments on commit 52fc1bf

Please sign in to comment.