Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace black with ruff-format #15312

Merged
merged 10 commits into from
Mar 15, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 2 additions & 7 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,6 @@ repos:
args: ["--config-root=python/", "--resolve-all-configs"]
files: python/.*
types_or: [python, cython, pyi]
- repo: https://github.com/psf/black
rev: 23.12.1
hooks:
- id: black
files: python/.*
# Explicitly specify the pyproject.toml at the repo root, not per-project.
args: ["--config", "pyproject.toml"]
- repo: https://github.com/MarcoGorelli/cython-lint
rev: v0.16.0
hooks:
Expand Down Expand Up @@ -155,6 +148,8 @@ repos:
hooks:
- id: ruff
files: python/.*$
- id: ruff-format
files: python/.*$
- repo: https://github.com/rapidsai/pre-commit-hooks
rev: v0.0.1
hooks:
Expand Down
20 changes: 1 addition & 19 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,22 +1,4 @@
[tool.black]
line-length = 79
target-version = ["py39"]
include = '\.py?$'
force-exclude = '''
/(
thirdparty |
\.eggs |
\.git |
\.hg |
\.mypy_cache |
\.tox |
\.venv |
_build |
buck-out |
build |
dist
)/
'''
# Copyright (c) 2019-2024, NVIDIA CORPORATION.

[tool.pydocstyle]
# Due to https://github.com/PyCQA/pydocstyle/issues/363, we must exclude rather
Expand Down
21 changes: 5 additions & 16 deletions python/cudf/benchmarks/API/bench_dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,7 @@ def bench_eval_func(benchmark, expr, dataframe):
[2, 3, 4],
)
def bench_merge(benchmark, dataframe, num_key_cols):
benchmark(
dataframe.merge, dataframe, on=list(dataframe.columns[:num_key_cols])
)
benchmark(dataframe.merge, dataframe, on=list(dataframe.columns[:num_key_cols]))


# TODO: Some of these cases could be generalized to an IndexedFrame benchmark
Expand Down Expand Up @@ -67,9 +65,7 @@ def random_state(request):
def bench_sample(benchmark, dataframe, axis, frac, random_state):
if axis == 1 and isinstance(random_state, cupy.random.RandomState):
pytest.skip("Unsupported params.")
benchmark(
dataframe.sample, frac=frac, axis=axis, random_state=random_state
)
benchmark(dataframe.sample, frac=frac, axis=axis, random_state=random_state)


@benchmark_with_object(cls="dataframe", dtype="int")
Expand Down Expand Up @@ -121,10 +117,7 @@ def bench_groupby(benchmark, dataframe, num_key_cols):
[
"sum",
["sum", "mean"],
{
f"{string.ascii_lowercase[i]}": ["sum", "mean", "count"]
for i in range(6)
},
{f"{string.ascii_lowercase[i]}": ["sum", "mean", "count"] for i in range(6)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the previous version more readable, do we have to configure any line length settings to achieve that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can re-configure ruff to go back to a 79 line length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of note: Some of our copyright lines lines are longer than 79 line length, so I had to noqa those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think noqa'ing those is fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#15312 (comment)

Adding noqa here isn't ideal. The SPDX identifiers are meant to be machine-readable...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a documented limitation:
astral-sh/ruff#4429
astral-sh/ruff#5899

☹️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I hesitate on this is that I think we are supposed to migrate towards SPDX identifiers in all our copyright headers. We shouldn’t noqa every file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I would also be OK with a line-length of 88 (that's what we use in pandas)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, as the ruff FAQs suggest, run ruff format but ignore E501 (line length issues). This might lead to some lines that are over the line length, but with the conservative value of 79, I don't think it is necessarily that problematic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea with ignoring E501. Yeah we can go with that for now.

],
)
@pytest.mark.parametrize(
Expand Down Expand Up @@ -154,9 +147,7 @@ def bench_groupby_sample(
kwargs = {"frac": target_sample_frac, "replace": replace}
else:
minsize = grouper.size().min()
target_size = numpy.round(
target_sample_frac * minsize, decimals=0
).astype(int)
target_size = numpy.round(target_sample_frac * minsize, decimals=0).astype(int)
kwargs = {"n": target_size, "replace": replace}

benchmark(grouper.sample, **kwargs)
Expand All @@ -165,9 +156,7 @@ def bench_groupby_sample(
@benchmark_with_object(cls="dataframe", dtype="int")
@pytest.mark.parametrize("num_cols_to_sort", [1])
def bench_sort_values(benchmark, dataframe, num_cols_to_sort):
benchmark(
dataframe.sort_values, list(dataframe.columns[:num_cols_to_sort])
)
benchmark(dataframe.sort_values, list(dataframe.columns[:num_cols_to_sort]))


@benchmark_with_object(cls="dataframe", dtype="int")
Expand Down
12 changes: 3 additions & 9 deletions python/cudf/benchmarks/API/bench_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,7 @@
from utils import benchmark_with_object


@pytest_cases.parametrize_with_cases(
"objs", prefix="concat", cases="cases_functions"
)
@pytest_cases.parametrize_with_cases("objs", prefix="concat", cases="cases_functions")
@pytest.mark.parametrize(
"axis",
[
Expand All @@ -21,9 +19,7 @@
@pytest.mark.parametrize("join", ["inner", "outer"])
@pytest.mark.parametrize("ignore_index", [True, False])
def bench_concat_axis_1(benchmark, objs, axis, join, ignore_index):
benchmark(
cudf.concat, objs=objs, axis=axis, join=join, ignore_index=ignore_index
)
benchmark(cudf.concat, objs=objs, axis=axis, join=join, ignore_index=ignore_index)


@pytest.mark.parametrize("size", [10_000, 100_000])
Expand Down Expand Up @@ -51,9 +47,7 @@ def bench_get_dummies_simple(benchmark, prefix):
"col3": cudf.Series(list(range(100, 110)), dtype="category"),
}
)
benchmark(
cudf.get_dummies, df, columns=["col1", "col2", "col3"], prefix=prefix
)
benchmark(cudf.get_dummies, df, columns=["col1", "col2", "col3"], prefix=prefix)


@benchmark_with_object(cls="dataframe", dtype="int", cols=6)
Expand Down
6 changes: 2 additions & 4 deletions python/cudf/benchmarks/API/bench_multiindex.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022, NVIDIA CORPORATION.
# Copyright (c) 2022-2024, NVIDIA CORPORATION.

"""Benchmarks of MultiIndex methods."""

Expand Down Expand Up @@ -31,9 +31,7 @@ def bench_from_pandas(benchmark, pidx):


def bench_constructor(benchmark, midx):
benchmark(
cudf.MultiIndex, codes=midx.codes, levels=midx.levels, names=midx.names
)
benchmark(cudf.MultiIndex, codes=midx.codes, levels=midx.levels, names=midx.names)


def bench_from_frame(benchmark, midx):
Expand Down
20 changes: 5 additions & 15 deletions python/cudf/benchmarks/API/cases_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,7 @@ def concat_case_contiguous_indexes(nr):
@pytest_cases.parametrize("nr", NUM_ROWS)
def concat_case_contiguous_indexes_different_cols(nr):
return [
cudf.DataFrame(
{"a": cupy.tile([1, 2, 3], nr), "b": cupy.tile([4, 5, 7], nr)}
),
cudf.DataFrame({"a": cupy.tile([1, 2, 3], nr), "b": cupy.tile([4, 5, 7], nr)}),
cudf.DataFrame(
{"c": cupy.tile([4, 5, 7], nr)},
index=cudf.RangeIndex(start=nr * 3, stop=nr * 2 * 3),
Expand Down Expand Up @@ -117,30 +115,22 @@ def concat_case_unique_columns(nr):
@pytest_cases.parametrize("nr", NUM_ROWS)
def concat_case_unique_columns_with_different_range_index(nr):
return [
cudf.DataFrame(
{"a": cupy.tile([1, 2, 3], nr), "b": cupy.tile([4, 5, 7], nr)}
),
cudf.DataFrame({"a": cupy.tile([1, 2, 3], nr), "b": cupy.tile([4, 5, 7], nr)}),
cudf.DataFrame(
{"c": cupy.tile([4, 5, 7], nr)},
index=cudf.RangeIndex(start=nr * 3, stop=nr * 2 * 3),
),
cudf.DataFrame(
{"d": cupy.tile([1, 2, 3], nr), "e": cupy.tile([4, 5, 7], nr)}
),
cudf.DataFrame({"d": cupy.tile([1, 2, 3], nr), "e": cupy.tile([4, 5, 7], nr)}),
cudf.DataFrame(
{"f": cupy.tile([4, 5, 7], nr)},
index=cudf.RangeIndex(start=nr * 3, stop=nr * 2 * 3),
),
cudf.DataFrame(
{"g": cupy.tile([1, 2, 3], nr), "h": cupy.tile([4, 5, 7], nr)}
),
cudf.DataFrame({"g": cupy.tile([1, 2, 3], nr), "h": cupy.tile([4, 5, 7], nr)}),
cudf.DataFrame(
{"i": cupy.tile([4, 5, 7], nr)},
index=cudf.RangeIndex(start=nr * 3, stop=nr * 2 * 3),
),
cudf.DataFrame(
{"j": cupy.tile([1, 2, 3], nr), "k": cupy.tile([4, 5, 7], nr)}
),
cudf.DataFrame({"j": cupy.tile([1, 2, 3], nr), "k": cupy.tile([4, 5, 7], nr)}),
cudf.DataFrame(
{"l": cupy.tile([4, 5, 7], nr)},
index=cudf.RangeIndex(start=nr * 3, stop=nr * 2 * 3),
Expand Down
6 changes: 2 additions & 4 deletions python/cudf/benchmarks/common/utils.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022, NVIDIA CORPORATION.
# Copyright (c) 2022-2024, NVIDIA CORPORATION.

"""Common utilities for fixture creation and benchmarking."""

Expand Down Expand Up @@ -42,9 +42,7 @@ def make_boolean_mask_column(size):
return cudf.core.column.as_column(rstate.randint(0, 2, size).astype(bool))


def benchmark_with_object(
cls, *, dtype="int", nulls=None, cols=None, rows=None
):
def benchmark_with_object(cls, *, dtype="int", nulls=None, cols=None, rows=None):
"""Pass "standard" cudf fixtures to functions without renaming parameters.

The fixture generation logic in conftest.py provides a plethora of useful
Expand Down
17 changes: 4 additions & 13 deletions python/cudf/benchmarks/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,10 +93,7 @@ def make_dataframe(nr, nc, column_generator=column_generator):
string.ascii_lowercase
), "make_dataframe only supports a maximum of 26 columns"
return cudf.DataFrame(
{
f"{string.ascii_lowercase[i]}": column_generator(nr)
for i in range(nc)
}
{f"{string.ascii_lowercase[i]}": column_generator(nr) for i in range(nc)}
)

for nr in NUM_ROWS:
Expand All @@ -108,9 +105,7 @@ def make_dataframe(nr, nc, column_generator=column_generator):
# https://github.com/smarie/python-pytest-cases/issues/278
# Once that is fixed we could remove all the extraneous `request`
# fixtures in these fixtures.
def series_nulls_false(
request, nr=nr, column_generator=column_generator
):
def series_nulls_false(request, nr=nr, column_generator=column_generator):
return cudf.Series(column_generator(nr))

make_fixture(
Expand All @@ -120,9 +115,7 @@ def series_nulls_false(
fixtures,
)

def series_nulls_true(
request, nr=nr, column_generator=column_generator
):
def series_nulls_true(request, nr=nr, column_generator=column_generator):
s = cudf.Series(column_generator(nr))
s.iloc[::2] = None
return s
Expand All @@ -135,9 +128,7 @@ def series_nulls_true(
)

# For now, not bothering to include a nullable index fixture.
def index_nulls_false(
request, nr=nr, column_generator=column_generator
):
def index_nulls_false(request, nr=nr, column_generator=column_generator):
return cudf.Index(column_generator(nr))

make_fixture(
Expand Down
8 changes: 2 additions & 6 deletions python/cudf/benchmarks/internal/bench_column.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,7 @@ def bench_unique_single_column(benchmark, column):
@pytest.mark.parametrize("nullify", [True, False])
@pytest.mark.parametrize("gather_how", ["sequence", "reverse", "random"])
def bench_take(benchmark, column, gather_how, nullify):
gather_map = make_gather_map(
column.size * 0.4, column.size, gather_how
)._column
gather_map = make_gather_map(column.size * 0.4, column.size, gather_how)._column
benchmark(column.take, gather_map, nullify=nullify)


Expand Down Expand Up @@ -107,8 +105,6 @@ def setitem_case_int_column_align_to_col_size(column):
# column (len(val) != len(key) and len == num_true)


@pytest_cases.parametrize_with_cases(
"column,key,value", cases=".", prefix="setitem"
)
@pytest_cases.parametrize_with_cases("column,key,value", cases=".", prefix="setitem")
def bench_setitem(benchmark, column, key, value):
benchmark(column.__setitem__, key, value)
9 changes: 3 additions & 6 deletions python/cudf/cudf/_fuzz_testing/avro.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2020-2022, NVIDIA CORPORATION.
# Copyright (c) 2020-2024, NVIDIA CORPORATION.

import copy
import io
Expand Down Expand Up @@ -69,17 +69,14 @@ def generate_input(self):
- cudf.utils.dtypes.TIMEDELTA_TYPES
)

dtypes_meta, num_rows, num_cols = _generate_rand_meta(
self, dtypes_list
)
dtypes_meta, num_rows, num_cols = _generate_rand_meta(self, dtypes_list)
self._current_params["dtypes_meta"] = dtypes_meta
seed = random.randint(0, 2**32 - 1)
self._current_params["seed"] = seed
self._current_params["num_rows"] = num_rows
self._current_params["num_cols"] = num_cols
logging.info(
f"Generating DataFrame with rows: {num_rows} "
f"and columns: {num_cols}"
f"Generating DataFrame with rows: {num_rows} " f"and columns: {num_cols}"
)
table = dg.rand_dataframe(dtypes_meta, num_rows, seed)
df = pyarrow_to_pandas(table)
Expand Down
38 changes: 10 additions & 28 deletions python/cudf/cudf/_fuzz_testing/csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,16 +53,13 @@ def generate_input(self):
seed = random.randint(0, 2**32 - 1)
random.seed(seed)
dtypes_list = list(cudf.utils.dtypes.ALL_TYPES)
dtypes_meta, num_rows, num_cols = _generate_rand_meta(
self, dtypes_list
)
dtypes_meta, num_rows, num_cols = _generate_rand_meta(self, dtypes_list)
self._current_params["dtypes_meta"] = dtypes_meta
self._current_params["seed"] = seed
self._current_params["num_rows"] = num_rows
self._current_params["num_columns"] = num_cols
logging.info(
f"Generating DataFrame with rows: {num_rows} "
f"and columns: {num_cols}"
f"Generating DataFrame with rows: {num_rows} " f"and columns: {num_cols}"
)
table = dg.rand_dataframe(dtypes_meta, num_rows, seed)
df = pyarrow_to_pandas(table)
Expand All @@ -84,18 +81,12 @@ def set_rand_params(self, params):
col_val = np.random.choice(
[
None,
np.unique(
np.random.choice(self._df.columns, col_size)
),
np.unique(np.random.choice(self._df.columns, col_size)),
]
)
params_dict[param] = (
col_val if col_val is None else list(col_val)
)
params_dict[param] = col_val if col_val is None else list(col_val)
elif param == "dtype":
dtype_val = np.random.choice(
[None, self._df.dtypes.to_dict()]
)
dtype_val = np.random.choice([None, self._df.dtypes.to_dict()])
if dtype_val is not None:
dtype_val = {
col_name: "category"
Expand All @@ -110,13 +101,9 @@ def set_rand_params(self, params):
)
params_dict[param] = header_val
elif param == "skiprows":
params_dict[param] = np.random.randint(
low=0, high=len(self._df)
)
params_dict[param] = np.random.randint(low=0, high=len(self._df))
elif param == "skipfooter":
params_dict[param] = np.random.randint(
low=0, high=len(self._df)
)
params_dict[param] = np.random.randint(low=0, high=len(self._df))
elif param == "nrows":
nrows_val = np.random.choice(
[None, np.random.randint(low=0, high=len(self._df))]
Expand Down Expand Up @@ -158,16 +145,13 @@ def generate_input(self):
seed = random.randint(0, 2**32 - 1)
random.seed(seed)
dtypes_list = list(cudf.utils.dtypes.ALL_TYPES)
dtypes_meta, num_rows, num_cols = _generate_rand_meta(
self, dtypes_list
)
dtypes_meta, num_rows, num_cols = _generate_rand_meta(self, dtypes_list)
self._current_params["dtypes_meta"] = dtypes_meta
self._current_params["seed"] = seed
self._current_params["num_rows"] = num_rows
self._current_params["num_columns"] = num_cols
logging.info(
f"Generating DataFrame with rows: {num_rows} "
f"and columns: {num_cols}"
f"Generating DataFrame with rows: {num_rows} " f"and columns: {num_cols}"
)
table = dg.rand_dataframe(dtypes_meta, num_rows, seed)
df = pyarrow_to_pandas(table)
Expand All @@ -188,9 +172,7 @@ def set_rand_params(self, params):
col_size = self._rand(len(self._current_buffer.columns))
params_dict[param] = list(
np.unique(
np.random.choice(
self._current_buffer.columns, col_size
)
np.random.choice(self._current_buffer.columns, col_size)
)
)
elif param == "chunksize":
Expand Down
Loading
Loading