Skip to content

Commit

Permalink
Remove smart quotes from all docstrings. (#12035)
Browse files Browse the repository at this point in the history
This PR removes all "smart quotes" from the library by enforcing a pre-commit hook.

Smart quotes typically arise from copying rendered docstrings from Pandas, because Sphinx automatically transforms straight quotes into smart quotes when rendering the docs as HTML. However, the use of smart quotes is undesirable in code, and makes it difficult to do find-replace transformations if straight and smart quotes are mixed.

I have made suggestions to fix this several times before, so I am making the suggestions more permanent and automatically enforceable via a pre-commit style check:
- #12025 (comment)
- #9817 (comment)
- #9571 (comment)

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #12035
  • Loading branch information
bdice authored Nov 1, 2022
1 parent 80c238c commit f19bdbc
Show file tree
Hide file tree
Showing 19 changed files with 85 additions and 75 deletions.
10 changes: 10 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,16 @@ repos:
- id: clang-format
types_or: [c, c++, cuda]
args: ["-fallback-style=none", "-style=file", "-i"]
- repo: https://github.com/sirosen/texthooks
rev: 0.4.0
hooks:
- id: fix-smartquotes
exclude: |
(?x)^(
^cpp/include/cudf_test/cxxopts.hpp|
^python/cudf/cudf/tests/data/subword_tokenizer_data/.*|
^python/cudf/cudf/tests/test_text.py
)
- repo: local
hooks:
- id: no-deprecationwarning
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ For additional examples, browse our complete [API documentation](https://docs.ra

## Quick Start

Please see the [Demo Docker Repository](https://hub.docker.com/r/rapidsai/rapidsai/), choosing a tag based on the NVIDIA CUDA version youre running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize cuDF.
Please see the [Demo Docker Repository](https://hub.docker.com/r/rapidsai/rapidsai/), choosing a tag based on the NVIDIA CUDA version you're running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize cuDF.

## Installation

Expand Down
2 changes: 1 addition & 1 deletion docs/cudf/source/user_guide/10min.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"\n",
"[Dask](https://dask.org/) is a flexible library for parallel computing in Python that makes scaling out your workflow smooth and simple. On the CPU, Dask uses Pandas to execute operations in parallel on DataFrame partitions.\n",
"\n",
"[Dask-cuDF](https://github.com/rapidsai/cudf/tree/main/python/dask_cudf) extends Dask where necessary to allow its DataFrame partitions to be processed by cuDF GPU DataFrames as opposed to Pandas DataFrames. For instance, when you call dask_cudf.read_csv(...), your clusters GPUs do the work of parsing the CSV file(s) with underlying cudf.read_csv().\n",
"[Dask-cuDF](https://github.com/rapidsai/cudf/tree/main/python/dask_cudf) extends Dask where necessary to allow its DataFrame partitions to be processed by cuDF GPU DataFrames as opposed to Pandas DataFrames. For instance, when you call dask_cudf.read_csv(...), your cluster's GPUs do the work of parsing the CSV file(s) with underlying cudf.read_csv().\n",
"\n",
"\n",
"### When to use cuDF and Dask-cuDF\n",
Expand Down
4 changes: 2 additions & 2 deletions docs/cudf/source/user_guide/missing-data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@
"id": "acdf29d7",
"metadata": {},
"source": [
"One has to be mindful that in Python (and NumPy), the nan's dont compare equal, but None's do. Note that cudf/NumPy uses the fact that `np.nan != np.nan`, and treats `None` like `np.nan`."
"One has to be mindful that in Python (and NumPy), the nan's don't compare equal, but None's do. Note that cudf/NumPy uses the fact that `np.nan != np.nan`, and treats `None` like `np.nan`."
]
},
{
Expand Down Expand Up @@ -279,7 +279,7 @@
"id": "4fdb8bc7",
"metadata": {},
"source": [
"So as compared to above, a scalar equality comparison versus a None/np.nan doesnt provide useful information."
"So as compared to above, a scalar equality comparison versus a None/np.nan doesn't provide useful information."
]
},
{
Expand Down
6 changes: 3 additions & 3 deletions python/cudf/cudf/_lib/search.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ def search_sorted(
List of columns to search in
values : List of columns
List of value columns to search for
side : str {left’, ‘right} optional
If left, the index of the first suitable location is given.
If right, return the last such index
side : str {'left', 'right'} optional
If 'left', the index of the first suitable location is given.
If 'right', return the last such index
"""
cdef unique_ptr[column] c_result
cdef vector[libcudf_types.order] c_column_order
Expand Down
4 changes: 2 additions & 2 deletions python/cudf/cudf/_lib/strings/convert/convert_urls.pyx
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2020, NVIDIA CORPORATION.
# Copyright (c) 2020-2022, NVIDIA CORPORATION.

from libcpp.memory cimport unique_ptr
from libcpp.utility cimport move
Expand Down Expand Up @@ -41,7 +41,7 @@ def url_encode(Column source_strings):
"""
Encode each string in column. No format checking is performed.
All characters are encoded except for ASCII letters, digits,
and these characters: ‘.’,’_’,’-‘,’~’. Encoding converts to
and these characters: '.','_','-','~'. Encoding converts to
hex using UTF-8 encoded bytes.
Parameters
Expand Down
2 changes: 1 addition & 1 deletion python/cudf/cudf/_lib/strings/padding.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ def zfill(Column source_strings,
size_type width):
"""
Returns a Column by prepending strings in `source_strings`
with ‘0’ characters up to the given `width`.
with '0' characters up to the given `width`.
"""
cdef unique_ptr[column] c_result
cdef column_view source_view = source_strings.view()
Expand Down
30 changes: 15 additions & 15 deletions python/cudf/cudf/core/column/string.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,8 +116,8 @@ class StringMethods(ColumnMethods):
This mimics pandas ``df.str`` interface. nulls stay null
unless handled otherwise by a particular method.
Patterned after Pythons string methods, with some
inspiration from Rs stringr package.
Patterned after Python's string methods, with some
inspiration from R's stringr package.
"""

_column: StringColumn
Expand Down Expand Up @@ -709,7 +709,7 @@ def contains(
>>> idx.str.contains('23', regex=False)
GenericIndex([False, False, False, True, <NA>], dtype='bool')
Returning house or dog when either expression occurs in a string.
Returning 'house' or 'dog' when either expression occurs in a string.
>>> s1.str.contains('house|dog', regex=True)
0 False
Expand All @@ -732,7 +732,7 @@ def contains(
Ensure ``pat`` is a not a literal pattern when ``regex`` is set
to True. Note in the following example one might expect
only `s2[1]` and `s2[3]` to return True. However,
‘.0’ as a regex matches any character followed by a 0.
'.0' as a regex matches any character followed by a 0.
>>> s2 = cudf.Series(['40', '40.0', '41', '41.0', '35'])
>>> s2.str.contains('.0', regex=True)
Expand Down Expand Up @@ -2903,7 +2903,7 @@ def pad(
additional characters will be filled with
character defined in fillchar.
side : {left’, ‘right’, ‘both}, default left
side : {'left', 'right', 'both'}, default 'left'
Side from which to fill resulting string.
fillchar : str, default ' ' (whitespace)
Expand All @@ -2930,7 +2930,7 @@ def pad(
Equivalent to ``Series.str.pad(side='both')``.
zfill
Pad strings in the Series/Index by prepending ‘0’ character.
Pad strings in the Series/Index by prepending '0' character.
Equivalent to ``Series.str.pad(side='left', fillchar='0')``.
Examples
Expand Down Expand Up @@ -2970,7 +2970,7 @@ def pad(
side = libstrings.SideType[side.upper()]
except KeyError:
raise ValueError(
"side has to be either one of {left’, ‘right’, ‘both}"
"side has to be either one of {'left', 'right', 'both'}"
)

return self._return_or_inplace(
Expand All @@ -2979,9 +2979,9 @@ def pad(

def zfill(self, width: int) -> SeriesOrIndex:
"""
Pad strings in the Series/Index by prepending ‘0’ characters.
Pad strings in the Series/Index by prepending '0' characters.
Strings in the Series/Index are padded with ‘0’ characters
Strings in the Series/Index are padded with '0' characters
on the left of the string to reach a total string length
width. Strings in the Series/Index with length greater
or equal to width are unchanged.
Expand All @@ -2994,12 +2994,12 @@ def zfill(self, width: int) -> SeriesOrIndex:
width : int
Minimum length of resulting string;
strings with length less than width
be prepended with ‘0’ characters.
be prepended with '0' characters.
Returns
-------
Series/Index of str dtype
Returns Series or Index with prepended ‘0’ characters.
Returns Series or Index with prepended '0' characters.
See Also
--------
Expand Down Expand Up @@ -3405,7 +3405,7 @@ def wrap(self, width: int, **kwargs) -> SeriesOrIndex:
`expand_tabsbool` are not yet supported and will raise a
NotImplementedError if they are set to any value.
This method currently achieves behavior matching Rs
This method currently achieves behavior matching R's
stringr library ``str_wrap`` function, the equivalent
pandas implementation can be obtained using the
following parameter setting:
Expand Down Expand Up @@ -3576,7 +3576,7 @@ def findall(self, pat: str, flags: int = 0) -> SeriesOrIndex:
>>> import cudf
>>> s = cudf.Series(['Lion', 'Monkey', 'Rabbit'])
The search for the pattern Monkey returns one match:
The search for the pattern 'Monkey' returns one match:
>>> s.str.findall('Monkey')
0 []
Expand All @@ -3595,7 +3595,7 @@ def findall(self, pat: str, flags: int = 0) -> SeriesOrIndex:
Regular expressions are supported too. For instance,
the search for all the strings ending with
the word ‘on’ is shown next:
the word 'on' is shown next:
>>> s.str.findall('on$')
0 [on]
Expand Down Expand Up @@ -4228,7 +4228,7 @@ def url_encode(self) -> SeriesOrIndex:
Returns a URL-encoded format of each string.
No format checking is performed.
All characters are encoded except for ASCII letters,
digits, and these characters: ``‘.’,’_’,’-‘,’~’``.
digits, and these characters: ``'.','_','-','~'``.
Encoding converts to hex using UTF-8 encoded bytes.
Returns
Expand Down
12 changes: 6 additions & 6 deletions python/cudf/cudf/core/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -2293,7 +2293,7 @@ def reindex(
Return a new object, even if the passed indexes are the same.
level : Not supported
fill_value : Value to use for missing values.
Defaults to ``NA``, but can be any compatible value.
Defaults to ``NA``, but can be any "compatible" value.
limit : Not supported
tolerance : Not supported
Expand Down Expand Up @@ -2358,7 +2358,7 @@ def reindex(
IE10 404 <NA>
Konqueror 301 <NA>
Or we can use axis-style keyword arguments
Or we can use "axis-style" keyword arguments
>>> df.reindex(columns=['http_status', 'user_agent'])
http_status user_agent
Firefox 200 <NA>
Expand Down Expand Up @@ -3028,7 +3028,7 @@ def rename(
"""Alter column and index labels.
Function / dict values must be unique (1-to-1). Labels not contained in
a dict / Series will be left as-is. Extra labels listed dont throw an
a dict / Series will be left as-is. Extra labels listed don't throw an
error.
``DataFrame.rename`` supports two calling conventions:
Expand Down Expand Up @@ -3635,8 +3635,8 @@ def merge(
If on is None and not merging on indexes then
this defaults to the intersection of the columns
in both DataFrames.
how : {left’, ‘outer’, ‘inner, 'leftsemi', 'leftanti'}, \
default inner
how : {'left', 'outer', 'inner', 'leftsemi', 'leftanti'}, \
default 'inner'
Type of merge to be performed.
- left : use only keys from left frame, similar to a SQL left
Expand Down Expand Up @@ -5363,7 +5363,7 @@ def isin(self, values):
----------
values : iterable, Series, DataFrame or dict
The result will only be true at a location if all
the labels match. If values is a Series, thats the index.
the labels match. If values is a Series, that's the index.
If values is a dict, the keys must be the column names,
which must match. If values is a DataFrame, then both the
index and column labels must match.
Expand Down
12 changes: 6 additions & 6 deletions python/cudf/cudf/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -1363,12 +1363,12 @@ def searchsorted(
----------
value : Frame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
side : str {left’, ‘right} optional, default left
If left, the index of the first suitable location found is given
If right, return the last such index
side : str {'left', 'right'} optional, default 'left'
If 'left', the index of the first suitable location found is given
If 'right', return the last such index
ascending : bool optional, default True
Sorted Frame is in ascending order (otherwise descending)
na_position : str {last’, ‘first} optional, default last
na_position : str {'last', 'first'} optional, default 'last'
Position of null values in sorted order
Returns
Expand Down Expand Up @@ -1476,8 +1476,8 @@ def argsort(
Has no effect but is accepted for compatibility with numpy.
ascending : bool or list of bool, default True
If True, sort values in ascending order, otherwise descending.
na_position : {first or last}, default last
Argument first puts NaNs at the beginning, last puts NaNs
na_position : {'first' or 'last'}, default 'last'
Argument 'first' puts NaNs at the beginning, 'last' puts NaNs
at the end.
Returns
Expand Down
12 changes: 6 additions & 6 deletions python/cudf/cudf/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,9 @@ def _quantile_75(x):
----------
by : mapping, function, label, or list of labels
Used to determine the groups for the groupby. If by is a
function, its called on each value of the objects index.
function, it's called on each value of the object's index.
If a dict or Series is passed, the Series or dict VALUES will
be used to determine the groups (the Series values are first
be used to determine the groups (the Series' values are first
aligned; see .align() method). If an cupy array is passed, the
values are used as-is determine the groups. A label or list
of labels may be passed to group by the columns in self.
Expand All @@ -65,7 +65,7 @@ def _quantile_75(x):
as_index : bool, default True
For aggregated output, return object with group labels as
the index. Only relevant for DataFrame input.
as_index=False is effectively SQL-style grouped output.
as_index=False is effectively "SQL-style" grouped output.
sort : bool, default False
Sort result by group key. Differ from Pandas, cudf defaults to
``False`` for better performance. Note this does not influence
Expand Down Expand Up @@ -717,7 +717,7 @@ def _normalize_aggs(
def pipe(self, func, *args, **kwargs):
"""
Apply a function `func` with arguments to this GroupBy
object and return the functions result.
object and return the function's result.
Parameters
----------
Expand Down Expand Up @@ -1103,13 +1103,13 @@ def func(x):
def describe(self, include=None, exclude=None):
"""
Generate descriptive statistics that summarizes the central tendency,
dispersion and shape of a datasets distribution, excluding NaN values.
dispersion and shape of a dataset's distribution, excluding NaN values.
Analyzes numeric DataFrames only
Parameters
----------
include: all, list-like of dtypes or None (default), optional
include: 'all', list-like of dtypes or None (default), optional
list of data types to include in the result.
Ignored for Series.
Expand Down
10 changes: 5 additions & 5 deletions python/cudf/cudf/core/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -1062,7 +1062,7 @@ def equals(self, other, **kwargs):
Returns
-------
out: bool
True if other is an Index and it has the same elements
True if "other" is an Index and it has the same elements
as calling index; False otherwise.
"""
if (
Expand Down Expand Up @@ -1414,8 +1414,8 @@ def argsort(
Has no effect but is accepted for compatibility with numpy.
ascending : bool or list of bool, default True
If True, sort values in ascending order, otherwise descending.
na_position : {first or last}, default last
Argument first puts NaNs at the beginning, last puts NaNs
na_position : {'first' or 'last'}, default 'last'
Argument 'first' puts NaNs at the beginning, 'last' puts NaNs
at the end.
Returns
Expand Down Expand Up @@ -1853,7 +1853,7 @@ class DatetimeIndex(GenericIndex):
This is not yet supported
tz : pytz.timezone or dateutil.tz.tzfile
This is not yet supported
ambiguous : infer, bool-ndarray, NaT, default raise
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
This is not yet supported
name : object
Name to be stored in the index.
Expand Down Expand Up @@ -2547,7 +2547,7 @@ class CategoricalIndex(GenericIndex):
Whether or not this categorical is treated as an ordered categorical.
If not given here or in dtype, the resulting categorical will be
unordered.
dtype : CategoricalDtype or category, optional
dtype : CategoricalDtype or "category", optional
If CategoricalDtype, cannot be used together with categories or
ordered.
copy : bool, default False
Expand Down
6 changes: 3 additions & 3 deletions python/cudf/cudf/core/indexed_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -562,8 +562,8 @@ def replace(
* dict:
- Dicts can be used to specify different replacement values
for different existing values. For example, {'a': 'b',
'y': 'z'} replaces the value ‘a’ with ‘b’ and
‘y’ with ‘z’.
'y': 'z'} replaces the value 'a' with 'b' and
'y' with 'z'.
To use a dict in this way the ``value`` parameter should
be ``None``.
value : scalar, dict, list-like, str, default None
Expand Down Expand Up @@ -1865,7 +1865,7 @@ def sort_values(
Sort ascending vs. descending. Specify list for multiple sort
orders. If this is a list of bools, must match the length of the
by.
na_position : {first’, ‘last}, default last
na_position : {'first', 'last'}, default 'last'
'first' puts nulls at the beginning, 'last' puts nulls at the end
ignore_index : bool, default False
If True, index will not be sorted.
Expand Down
2 changes: 1 addition & 1 deletion python/cudf/cudf/core/reshape.py
Original file line number Diff line number Diff line change
Expand Up @@ -484,7 +484,7 @@ def melt(
4 b C 4
5 c C 6
The names of variable and value columns can be customized:
The names of 'variable' and 'value' columns can be customized:
>>> cudf.melt(df, id_vars=['A'], value_vars=['B'],
... var_name='myVarname', value_name='myValname')
Expand Down
Loading

0 comments on commit f19bdbc

Please sign in to comment.