Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache JIT GroupBy.apply functions #12802

Merged
14 changes: 10 additions & 4 deletions python/cudf/cudf/core/udf/groupby_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,13 @@
groupby_apply_kernel_template,
)
from cudf.core.udf.utils import (
_generate_cache_key,
_get_extensionty_size,
_get_kernel,
_get_udf_return_type,
_supported_cols_from_frame,
_supported_dtypes_from_frame,
precompiled,
)
from cudf.utils.utils import _cudf_nvtx_annotate

Expand Down Expand Up @@ -147,12 +149,16 @@ def jit_groupby_apply(offsets, grouped_values, function, *args):
offsets = cp.asarray(offsets)
ngroups = len(offsets) - 1

kernel, return_type = _get_groupby_apply_kernel(
grouped_values, function, args
)
return_type = numpy_support.as_dtype(return_type)
cache_key = _generate_cache_key(grouped_values, function)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we don't use lru_cache and instead manually track cache keys? I assume it has to do with types being supported in lru_cache keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this context, precompiled is a cachetools.LRUCache. Are you asking why we don't do the following from functools?

@functools.lru_cache
def _get_groupby_apply_kernel(...)

If so the reason was that I wanted different UDF pipelines (apply, groupby.apply etc) to share the same cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind. 😄 I didn't look closely enough at precompiled. But to clarify, how do you distinguish the type of UDF? Could an apply function and a groupby apply function reuse the same exact kernel? If not, how are the cache keys distinguished (for functions of the same data types)?

Copy link
Contributor Author

@brandon-b-miller brandon-b-miller Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache key is based on the bytecode of the UDF, the particulars are found here. I suppose you could get a cache hit erroneously if you:

  1. wrote a function f and executed it using DataFrame.apply
  2. applied the exact same f on a groupby result whose columns were the exact same dtypes as the dataframe that it was first applied to

However I would expect the above to cause a crash in pandas case as well since each API enforces a different syntax for the kinds of UDFs it accepts, so using one kind of function with the other's apply API probably wouldn't work in most cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to your judgment here -- but distinguishing keys clearly would be a plus, in my eyes. An erroneous cache hit would be bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an extra tuple element to UDFs that go through GroupBy.apply that should break this degeneracy.

425a912

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, I think I would have preferred an approach like _generate_cache_key(grouped_values, function, suffix="__GROUPBY_APPLY_UDF") where you provide a suffix to the function making the key. Not a dealbreaker but worth considering if we have more JIT code paths with separate JIT caches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you! The cache key should be generated within _generate_cache_key, not half in _generate_cache_key and half outside of the function. I updated this.

if cache_key not in precompiled:
precompiled[cache_key] = _get_groupby_apply_kernel(
grouped_values, function, args
)
kernel, return_type = precompiled[cache_key]

return_type = numpy_support.as_dtype(return_type)
output = cudf.core.column.column_empty(ngroups, dtype=return_type)

launch_args = [
offsets,
output,
Expand Down
37 changes: 37 additions & 0 deletions python/cudf/cudf/tests/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from cudf import DataFrame, Series
from cudf.core._compat import PANDAS_GE_150, PANDAS_LT_140
from cudf.core.udf.groupby_typing import SUPPORTED_GROUPBY_NUMPY_TYPES
from cudf.core.udf.utils import precompiled
from cudf.testing._utils import (
DATETIME_TYPES,
SIGNED_TYPES,
Expand Down Expand Up @@ -532,6 +533,42 @@ def diverging_block(grp_df):
run_groupby_apply_jit_test(df, diverging_block, ["a"])


def test_groupby_apply_caching():
# Make sure similar functions that differ
# by simple things like constants actually
# recompile

# begin with a clear cache
precompiled.clear()
assert precompiled.currsize == 0

data = cudf.DataFrame({"a": [1, 1, 1, 2, 2, 2], "b": [1, 2, 3, 4, 5, 6]})

def f(group):
return group["b"].mean() * 2

# a single run should result in a cache size of 1
run_groupby_apply_jit_test(data, f, ["a"])
assert precompiled.currsize == 1

# a second run with f should not increase the count
run_groupby_apply_jit_test(data, f, ["a"])
assert precompiled.currsize == 1

# changing a constant value inside the UDF should miss
def f(group):
return group["b"].mean() * 3

run_groupby_apply_jit_test(data, f, ["a"])
assert precompiled.currsize == 2

# changing the dtypes of the columns should miss
data["b"] = data["b"].astype("float64")
run_groupby_apply_jit_test(data, f, ["a"])

assert precompiled.currsize == 3


@pytest.mark.parametrize("nelem", [2, 3, 100, 500, 1000])
@pytest.mark.parametrize(
"func",
Expand Down