Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting malloc heap size in string udfs #12094

Merged
merged 16 commits into from
Nov 17, 2022
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion python/cudf/cudf/core/udf/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Copyright (c) 2022, NVIDIA CORPORATION.

from functools import lru_cache

from numba import types
from numba.cuda.cudaimpl import lower as cuda_lower

Expand All @@ -21,6 +23,13 @@
)
_STRING_UDFS_ENABLED = False
cudf_str_dtype = dtype(str)


@lru_cache(maxsize=None)
def set_initial_malloc_heap_size():
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved
strings_udf.set_malloc_heap_size()


try:
import strings_udf
from strings_udf import ptxpath
Expand All @@ -47,7 +56,18 @@
utils.JIT_SUPPORTED_TYPES |= STRING_TYPES
_supported_masked_types |= {string_view, udf_string}

utils.launch_arg_getters[cudf_str_dtype] = column_to_string_view_array
@lru_cache(maxsize=None)
def set_initial_malloc_heap_size():
strings_udf.set_malloc_heap_size()

def column_to_string_view_array_init_heap(col):
# lazily allocate heap only when a string needs to be returned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure of this code makes it look like idempotency is its main goal, not laziness.

strings_udf.set_malloc_heap_size() should only ever be called once, I think?

Maybe this function or module in cudf (and not strings_udf) should own the “is it allocated already?” logic via a cache or global of some kind, and then the strings_udf logic doesn’t need to read env vars for the allocation size, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an allocation. There is currently a malloc-heap-size by default (set to about 8MB on the GPUs I've tested on). This updates the heap size to a new value so "already allocated" is not the goal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is set_malloc_heap_size intended to be a public API that can be called by a user, or is it an internal function? It's not entirely clear what the public API is here because this file doesn't define an __all__. If the function is just for internal use, why does it need to be called every time someone accesses a string column rather than just calling cudaDeviceSetLimit once when the package is imported (or after we can guarantee that the CUDA context/etc is initialized if that is the requirement that we're currently addressing by doing this "lazily")?

Copy link
Contributor Author

@brandon-b-miller brandon-b-miller Nov 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we want to happen is:

  • The first time a user runs a string_udf, set the heap size to either default or what is retrieved from their environment.
  • The next time a user runs a string_udf, do not set the heap size. Or at least early return.
  • Reset the heap the next time the user runs a string_udf iff the user sets the value of STRING_UDF_HEAP_SIZE to something else during the session.
    Does this seems reasonable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we have 2 comment chains asking about the same thing.
#12094 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The first time a user runs a string_udf, set the heap size to either default or what is retrieved from their environment.

Yes.

  • The next time a user runs a string_udf, do not set the heap size. Or at least early return.

The "set" function should not be called. Avoid designs that try to set but exit early.

  • Reset the heap the next time the user runs a string_udf iff the user sets the value of STRING_UDF_HEAP_SIZE to something else during the session.

No. The STRING_UDF_HEAP_SIZE should not be modifiable (I argued separately that that variable should not exist at all). If the user needs to be able to set a specific size, then the "set" function should be public and the user should be able to call it explicitly with immediate effect (not waiting until the next UDF execution, because memory demands can change in the meantime).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the changes in af7cd9f I still have heap_size available as a module level attribute of strings_udf, what do you think about checking if heap_size == 0 within the getter to determine if the function need be called or not?

Copy link
Contributor

@bdice bdice Nov 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use a cached function for this instead of a global that tracks whether the function has already been run. Following ideas from https://mail.python.org/archives/list/[email protected]/thread/5OR3LJO7LOL6SC4OOGKFIVNNH4KADBPG/

from functools import lru_cache

# Only executes the body of the function on the first call
once = lru_cache(maxsize=None)

@once
def set_initial_malloc_heap_size():
    strings_udf.set_malloc_heap_size()

def column_to_string_view_array_init_heap(col):
    set_initial_malloc_heap_size()
    return column_to_string_view_array(col)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks this worked perfectly!

set_initial_malloc_heap_size()
return column_to_string_view_array(col)

utils.launch_arg_getters[
cudf_str_dtype
] = column_to_string_view_array_init_heap
utils.output_col_getters[cudf_str_dtype] = column_from_udf_string_array
utils.masked_array_types[cudf_str_dtype] = string_view
row_function.itemsizes[cudf_str_dtype] = string_view.size_bytes
Expand Down
25 changes: 25 additions & 0 deletions python/strings_udf/strings_udf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import os

from cubinlinker.patch import _numba_version_ok, get_logger, new_patched_linker
from cuda import cudart
from numba import cuda
from numba.cuda.cudadrv.driver import Linker
from ptxcompiler.patch import NO_DRIVER, safe_get_versions
Expand Down Expand Up @@ -87,6 +88,30 @@ def _get_ptx_file():
return regular_result[1]


# Maximum size of a string column is 2gb
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved
_STRINGS_UDF_DEFAULT_HEAP_SIZE = os.environ.get(
"STRINGS_UDF_HEAP_SIZE", 2**31
)
heap_size = 0


def set_malloc_heap_size(size=None):
"""
Heap size control for strings_udf, size in bytes.
"""
global heap_size
if size is None:
size = _STRINGS_UDF_DEFAULT_HEAP_SIZE
if size != heap_size:
(ret,) = cudart.cudaDeviceSetLimit(
cudart.cudaLimit.cudaLimitMallocHeapSize, size
)
if ret.value != 0:
raise RuntimeError("Unable to set cudaMalloc heap size")

heap_size = size


ptxpath = None
versions = safe_get_versions()
if versions != NO_DRIVER:
Expand Down