Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Expose libcudf's label_bins function to cudf #7724

Merged
merged 80 commits into from
Mar 26, 2021
Merged
Show file tree
Hide file tree
Changes from 78 commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
5c35e91
Add files for binning that compile.
vyasr Mar 5, 2021
795b981
Add scaffold for test that compiles.
vyasr Mar 5, 2021
95f7258
Add basic validation of input column types.
vyasr Mar 5, 2021
00edf32
Check for matching edge sizes and add test.
vyasr Mar 5, 2021
e7fcc17
Add real test case of binned values.
vyasr Mar 5, 2021
6260ee8
Initial version of actual binning procedure. Compiles but seg faults.
vyasr Mar 6, 2021
9aecc4e
Code no longer seg faults, but test does.
vyasr Mar 6, 2021
e5f22da
Make tests a cu file to allow summing of device vectors.
vyasr Mar 6, 2021
d1fed0c
Use appropriate memory handling to fix seg faults and make tests pass.
vyasr Mar 6, 2021
9a2760f
Get rid of thrust for now since the kernel accumulates directly.
vyasr Mar 6, 2021
a846e25
Minor cleanup.
vyasr Mar 6, 2021
a216e7c
Switch to thrust comparators and explicitly check the upper bound of …
vyasr Mar 8, 2021
3633b47
Actually use template arguments to pass in comparators.
vyasr Mar 8, 2021
afb8659
Respect different values of inclusivity.
vyasr Mar 8, 2021
115d874
Switch to actually binning rather than accumulating a histogram.
vyasr Mar 8, 2021
381b3c2
Fix bug and make test more robust to obvious failures.
vyasr Mar 8, 2021
5be91c6
Try to use thrust algorithms (currently very hacky).
vyasr Mar 9, 2021
ee5e8a9
Much cleaner thrust-based implementation.
vyasr Mar 9, 2021
8716dcf
Add back support for different inclusion settings on the bounds.
vyasr Mar 9, 2021
03e6ad8
Clean up operator struct.
vyasr Mar 9, 2021
886b6cb
Remove old kernel and rename comparator.
vyasr Mar 9, 2021
95209e6
Move binning files into their own subdirectories.
vyasr Mar 9, 2021
da1cfee
Add basic support for empty input arrays.
vyasr Mar 9, 2021
e59bf8f
Enable compile-time type dispatch using template argument deduction.
vyasr Mar 9, 2021
90d5388
Use templating to dispatch different inclusive settings.
vyasr Mar 9, 2021
3d8590e
Convert verbose switch statement to a simpler series of if statements.
vyasr Mar 9, 2021
3b12d88
Template the data type.
vyasr Mar 9, 2021
4debe1f
Move all template code to header.
vyasr Mar 9, 2021
367b6d5
Add proper support for NULLs.
vyasr Mar 9, 2021
2cb408c
Move error checking to top level of hierarchy and use size_type where…
vyasr Mar 9, 2021
4703f4e
Some cleanup and addition of comments and internal namespaces.
vyasr Mar 9, 2021
a6543c2
Add type-parameterized tests, clean up test file, and make notes of t…
vyasr Mar 10, 2021
1f12b3f
Revert to a placeholder null and do some minor cleanup.
vyasr Mar 10, 2021
fbe66b2
Remove superfluous comments.
vyasr Mar 10, 2021
dafa750
Use sentinel and valid_if to indicate null outputs.
vyasr Mar 10, 2021
42edd03
Move all logic from cuh file into cu file, and remove unneccesary bin…
vyasr Mar 10, 2021
4772cbb
Use column pair iterators to access and filter based on null values i…
vyasr Mar 11, 2021
e0ec1d1
Some minor cleanup.
vyasr Mar 11, 2021
a510eac
Switch to using device spans from raw pointers.
vyasr Mar 11, 2021
9f3d845
Add tests of nulls related cases.
vyasr Mar 11, 2021
2bfa110
Clean up use of fixtures and add explicit test of endpoints.
vyasr Mar 11, 2021
571c7ce
Add test of input with nulls and a known failure case.
vyasr Mar 11, 2021
89322df
Raise exception if either edges contain nulls.
vyasr Mar 11, 2021
b90ba3b
Add test for out of bounds input and fix case where values are in the…
vyasr Mar 11, 2021
c7989d0
Use numeric limits to get max type.
vyasr Mar 11, 2021
a04e40a
Lots of test cleanup.
vyasr Mar 12, 2021
695ad13
More cleanup of main codebase.
vyasr Mar 12, 2021
76883bf
Apply clang-format.
vyasr Mar 12, 2021
8890e4e
Add file to conda meta.yaml for testing.
vyasr Mar 12, 2021
d7ba73d
Address open PR requests.
vyasr Mar 12, 2021
e4c95b6
Add tests of integer types.
vyasr Mar 12, 2021
8eb3358
Add support for fixed point types.
vyasr Mar 15, 2021
f4f8b9e
Add func range.
vyasr Mar 15, 2021
12558ef
Clean up inclusion settings.
vyasr Mar 15, 2021
7a5ee48
Apply clang-format.
vyasr Mar 15, 2021
51b9d4b
Switch back to using instead of device spans.
vyasr Mar 16, 2021
49ae697
Swap thrust functions for iterator arithmetic.
vyasr Mar 16, 2021
e322b7c
Perform nullable check at runtime instead of compile-time.
vyasr Mar 16, 2021
fd50fca
Add support for nulls in edges.
vyasr Mar 16, 2021
566352a
Initial attempt to add string support.
vyasr Mar 16, 2021
f073ced
Fix support for strings.
vyasr Mar 16, 2021
75558ce
Apply all standards required by the development guide except for prov…
vyasr Mar 16, 2021
e409147
Create internal detail API for handling streams.
vyasr Mar 16, 2021
b4fad33
Expose internal binning API via detail header for future stream usage.
vyasr Mar 16, 2021
6b2d7f8
Apply clang-format.
vyasr Mar 16, 2021
4f0ff3e
Add documentation.
vyasr Mar 17, 2021
188380f
Add new header to meta.yaml.
vyasr Mar 17, 2021
446d572
Clean up tests and add much more extensive tests of different edge ca…
vyasr Mar 17, 2021
fe4df7f
Apply clang-format.
vyasr Mar 17, 2021
732d949
Add pxd file exposing C++ bin API.
vyasr Mar 10, 2021
8b0af34
Add minimal Cython wrapper for C++ API and fix some bugs in pxd.
vyasr Mar 10, 2021
c6c4e41
Fix copyright year.
vyasr Mar 10, 2021
9f3115b
Update function name.
vyasr Mar 10, 2021
7029a34
Merge branch 'branch-0.19' into feature/cudf_expose_bin
vyasr Mar 25, 2021
0e47b45
Update Cython to use final version of C++ API.
vyasr Mar 25, 2021
b77b434
Expose label_bins to pure Python.
vyasr Mar 25, 2021
e121743
Fix style.
vyasr Mar 25, 2021
28ac5b0
Fix overhang.
vyasr Mar 25, 2021
afe29d7
Add typing for bools.
vyasr Mar 25, 2021
251b6c5
Fix cimport.
vyasr Mar 26, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions python/cudf/cudf/_lib/cpp/labeling.pxd
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Copyright (c) 2021, NVIDIA CORPORATION.

from libcpp.memory cimport unique_ptr

from cudf._lib.cpp.column.column cimport column
from cudf._lib.cpp.column.column_view cimport column_view

cdef extern from "cudf/labeling/label_bins.hpp" namespace "cudf" nogil:
ctypedef enum inclusive:
YES "cudf::inclusive::YES"
NO "cudf::inclusive::NO"

cdef unique_ptr[column] label_bins (
const column_view &input,
const column_view &left_edges,
inclusive left_inclusive,
const column_view &right_edges,
inclusive right_inclusive
) except +
46 changes: 46 additions & 0 deletions python/cudf/cudf/_lib/labeling.pyx
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Copyright (c) 2021, NVIDIA CORPORATION.

import numpy as np
from enum import IntEnum

from libc.stdint cimport uint32_t
from libcpp.memory cimport unique_ptr
from libcpp.utility cimport move

from cudf._lib.column cimport Column
from cudf._lib.replace import replace_nulls

from cudf._lib.cpp.labeling cimport inclusive
from cudf._lib.cpp.labeling cimport label_bins as cpp_label_bins
from cudf._lib.cpp.column.column cimport column
from cudf._lib.cpp.column.column_view cimport column_view


# Note that the parameter input shadows a Python built-in in the local scope,
# but I'm not too concerned about that since there's no use-case for actual
# input in this context.
def label_bins(Column input, Column left_edges, left_inclusive,
Column right_edges, right_inclusive):
vyasr marked this conversation as resolved.
Show resolved Hide resolved
cdef inclusive c_left_inclusive = \
inclusive.YES if left_inclusive else inclusive.NO
cdef inclusive c_right_inclusive = \
inclusive.YES if right_inclusive else inclusive.NO

cdef column_view input_view = input.view()
cdef column_view left_edges_view = left_edges.view()
cdef column_view right_edges_view = right_edges.view()

cdef unique_ptr[column] c_result

with nogil:
c_result = move(
cpp_label_bins(
input_view,
left_edges_view,
c_left_inclusive,
right_edges_view,
c_right_inclusive,
)
)

return Column.from_unique_ptr(move(c_result))