Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIL to import categorical models from treelite #4173

Merged
merged 157 commits into from
Sep 29, 2021
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
157 commits
Select commit Hold shift + click to select a range
4df552a
a sketch
levsnv Jun 1, 2021
44277f6
Merge branch 'branch-21.08' of github.com:rapidsai/cuml into HEAD
levsnv Jul 2, 2021
81f8415
added inference code
levsnv Jul 7, 2021
5d3d81e
Merge branch 'branch-21.08' of github.com:rapidsai/cuml into HEAD
levsnv Jul 7, 2021
a70b5ef
missed some errors
levsnv Jul 8, 2021
0a03a3e
now compiles
levsnv Jul 8, 2021
472575d
stash
levsnv Jul 9, 2021
efe875c
Revert "stash"
levsnv Jul 9, 2021
38bb778
enums are more printable now
levsnv Jul 9, 2021
9cc2549
fixed some bugs; added debug prints
levsnv Jul 9, 2021
a14d364
Merge branch 'branch-21.08' of github.com:rapidsai/cuml into categorical
levsnv Jul 9, 2021
f6e89e6
stack trace
levsnv Jul 12, 2021
a9c011d
implemented omp reduction for memory pool size estimation
levsnv Jul 9, 2021
daec42b
added allocation/copy code
levsnv Jul 13, 2021
361afaf
implemented filling up the bit pool with category masks
levsnv Jul 13, 2021
cf46cc0
fixed a type mismatch, added assert
levsnv Jul 13, 2021
c28fba3
drafting categorical node and set generation in test code
levsnv Jul 15, 2021
d89d082
fixed compiler but not runtime issues
levsnv Jul 15, 2021
67157b0
many things
levsnv Jul 16, 2021
64cb540
debugging
levsnv Jul 19, 2021
5608266
fixed some bugs
levsnv Jul 20, 2021
11a4c2e
...
levsnv Jul 20, 2021
d75d620
fixed some bugs
levsnv Jul 23, 2021
63aa434
added FIL categorical inference and test generation, no Treelite inte…
levsnv Jul 24, 2021
a4ca1c0
this was used to measure overhead when template parameter is irrelevant
levsnv Jul 24, 2021
6be93b5
reverted benchmark; started on <can_be_categorical> propagation
levsnv Jul 24, 2021
3559efc
moved <can_be_categorical> to an inference template parameter, as opp…
levsnv Jul 28, 2021
ef95861
fixed <branch_can_be_categorical> dispatching
levsnv Jul 28, 2021
45a642e
Merge branch 'branch-21.08' of github.com:rapidsai/cuml into categori…
levsnv Jul 29, 2021
3b24ea2
addressed review comments
levsnv Aug 5, 2021
74fcbe1
Apply suggestions from code review
levsnv Aug 5, 2021
9604516
Merge branch 'branch-21.10' of github.com:rapidsai/cuml into categori…
levsnv Aug 5, 2021
e41928a
fix extra change, explain branch_can_be_categorical() logic
levsnv Aug 5, 2021
3c767b8
just outline the finalize() definitions
levsnv Aug 5, 2021
16a1d7d
add load_data(); still 1m30s to compile alone
levsnv Aug 5, 2021
bdfb955
noinline finalize methods, 18s to compile
levsnv Aug 5, 2021
1a4b114
style
levsnv Aug 5, 2021
e6c57ec
Merge branch 'uninline-simple' into categorical-no-import
levsnv Aug 6, 2021
14786d1
fixed uninitialized max_matching_cat_d
levsnv Aug 6, 2021
abbdaba
undid __noinline__
levsnv Aug 6, 2021
c3d89c4
undid __noinline__
levsnv Aug 6, 2021
20102f0
resolved merge conflicts, to restore lost code chunks
levsnv Aug 10, 2021
102de03
fixed all but a missing void base_node::print()
levsnv Aug 10, 2021
baccb84
added void base_node::print()
levsnv Aug 10, 2021
0beb302
addressed more review comments
levsnv Aug 11, 2021
b336bfb
Merge remote-tracking branch 'rapidsai/branch-21.10' into categorical…
levsnv Aug 11, 2021
7dcdfc0
...
levsnv Aug 11, 2021
0d36f5d
Merge branch 'branch-21.10' of github.com:rapidsai/cuml into categori…
levsnv Aug 12, 2021
58fdfe5
fixed the bug
levsnv Aug 13, 2021
7eba183
...
levsnv Aug 13, 2021
2b81a80
...
levsnv Aug 13, 2021
e618f38
Merge branch 'branch-21.10' of github.com:rapidsai/cuml into categori…
levsnv Aug 13, 2021
9ca5ef8
Merge branch 'categorical-no-import' of github.com:levsnv/cuml into c…
levsnv Aug 13, 2021
87967ef
Merge branch 'categorical-no-import' into categorical-treelite
levsnv Aug 13, 2021
6268187
fixed some compiler messsages
levsnv Aug 13, 2021
9701d37
...
levsnv Aug 13, 2021
300a32f
accessor()
levsnv Aug 13, 2021
fc3076a
Merge branch 'categorical-no-import' into categorical-treelite
levsnv Aug 13, 2021
265736e
fixed a couple of bugs - operator precedence and swap def_left on exp…
levsnv Aug 14, 2021
32fc638
fixed bugs related to treelite tests, commented extra prints
levsnv Aug 18, 2021
4e2f1aa
Treelite GTIL has different logic for margin vs two-sided probabiliti…
levsnv Aug 18, 2021
decab32
addressed some review comments
levsnv Aug 19, 2021
632fdea
Apply suggestions from code review
levsnv Aug 19, 2021
7d1cd5a
Merge branch 'categorical-no-import' of github.com:levsnv/cuml into c…
levsnv Aug 19, 2021
ad0cd15
addressed more review comments
levsnv Aug 19, 2021
1803761
some review comments
levsnv Aug 23, 2021
e3056ce
Merge branch 'categorical-no-import' of /home/ldolgovs/others-libs/ra…
levsnv Aug 23, 2021
0f9a82a
some review comments
levsnv Aug 23, 2021
f993094
drafted child_index_test.cpp
levsnv Aug 25, 2021
625a5f3
test suite compiles
levsnv Aug 26, 2021
804870b
...
levsnv Aug 26, 2021
de9f2a1
...2
levsnv Aug 26, 2021
0b1421c
fixed unit tests, added to a proper set and removed prints
levsnv Aug 27, 2021
b5563d9
style
levsnv Aug 27, 2021
3b03967
Merge remote-tracking branch '10.2/categorical-no-import' into catego…
levsnv Aug 27, 2021
a42f107
Merge remote-tracking branch '10.2/categorical-no-import' into catego…
levsnv Aug 27, 2021
76a1a7b
Merge branch 'branch-21.10' of github.com:rapidsai/cuml into categori…
levsnv Aug 30, 2021
5f86db1
Merge
levsnv Aug 31, 2021
4d309dc
Merge remote-tracking branch '10.2/categorical-no-import' into catego…
levsnv Aug 31, 2021
5c782f2
Merge branch 'branch-21.10' of github.com:rapidsai/cuml into categori…
levsnv Aug 30, 2021
4d3a5bc
updated cat_sets_ to cat_sets_device_owner:: and removed allocate() a…
levsnv Aug 31, 2021
1db36c1
style
levsnv Aug 31, 2021
2e1adfc
Merge branch 'categorical-no-import' of /home/ldolgovs/others-libs/ra…
levsnv Aug 31, 2021
92aa840
updated fil_test to use rmm::device_uvector
levsnv Aug 31, 2021
c37fafb
Merge branch 'categorical-no-import' of /home/ldolgovs/others-libs/ra…
levsnv Aug 31, 2021
fa33236
fixed default construction of device_uvector
levsnv Aug 31, 2021
05a93e7
Merge branch 'categorical-no-import' of /home/ldolgovs/others-libs/ra…
levsnv Aug 31, 2021
5c50ef6
style
levsnv Aug 31, 2021
132c134
Merge branch 'categorical-no-import' of /home/ldolgovs/others-libs/ra…
levsnv Aug 31, 2021
3cc1151
fixed bug
levsnv Aug 31, 2021
a90f3de
Merge branch 'categorical-no-import' of /home/ldolgovs/others-libs/ra…
levsnv Aug 31, 2021
85772cc
addressed review comments
levsnv Sep 2, 2021
fc3ea09
Merge branch 'categorical-no-import' of /home/ldolgovs/others-libs/ra…
levsnv Sep 2, 2021
3b9ce00
addressed review comments
levsnv Sep 2, 2021
30056db
Merge branch 'categorical-no-import' of /home/ldolgovs/others-libs/ra…
levsnv Sep 2, 2021
b59bff5
noinline
levsnv Sep 2, 2021
f1fc5e9
re-fixed a bug?
levsnv Sep 3, 2021
031f0aa
Merge branch 'branch-21.10' of github.com:rapidsai/cuml into categori…
levsnv Sep 3, 2021
5bf7755
add lightgbm tests
levsnv Sep 3, 2021
c15cdfe
tried optimizing by working on full matrices at once, ended up pessim…
levsnv Sep 3, 2021
e156851
now supporting intermixed numerical and categorical features
levsnv Sep 4, 2021
3a8d07d
merged into existing test
levsnv Sep 4, 2021
22c2f8b
cleaned up diff
levsnv Sep 4, 2021
d1be6c4
python style
levsnv Sep 4, 2021
48cafda
added cat_sets to model shape string
levsnv Sep 4, 2021
5366b64
removed prints, style
levsnv Sep 4, 2021
3d8a8ff
stray change
levsnv Sep 4, 2021
122f7d9
Merge branch 'branch-21.10' of github.com:rapidsai/cuml into categori…
levsnv Sep 4, 2021
cec90ea
removed *_repr, gtil checks
levsnv Sep 4, 2021
c0d6bc6
removed unused variables and a debug getenv()
levsnv Sep 4, 2021
2cdf362
fix printing of max_matching -1
levsnv Sep 4, 2021
11294e8
Merge branch 'categorical-no-import' into categorical-treelite
levsnv Sep 10, 2021
56d1d32
Merge branch 'branch-21.10' of github.com:rapidsai/cuml into categori…
levsnv Sep 10, 2021
c0474c1
Merge branch 'branch-21.10' of github.com:rapidsai/cuml into categori…
levsnv Sep 10, 2021
b3ab8ad
style
levsnv Sep 10, 2021
57a6ae3
;
levsnv Sep 11, 2021
3ca93d8
Merge branch 'categorical-no-import' of github.com:levsnv/cuml into c…
levsnv Sep 11, 2021
d2795aa
BITS_PER_BYTE; comment
levsnv Sep 11, 2021
277eec9
style, odd code
levsnv Sep 11, 2021
95ad2ad
Apply suggestions from code review
levsnv Sep 14, 2021
57ab87d
addressed some review comments
levsnv Sep 17, 2021
1f95472
added treelite import tests; removed extra vector copy upon feature c…
levsnv Sep 17, 2021
58fb15d
draft change to per-tree bit pool, extra pass instead of an atomic
levsnv Sep 18, 2021
5fa0e46
style
levsnv Sep 18, 2021
87c8000
Merge branch 'categorical-treelite' of github.com:levsnv/cuml into ca…
levsnv Sep 18, 2021
481d189
style
levsnv Sep 18, 2021
bbb6a06
Merge branch 'branch-21.10' of github.com:rapidsai/cuml into categori…
levsnv Sep 18, 2021
dad8386
fixed wrong reduction for max_matching and bad .back() fetch, many pr…
levsnv Sep 22, 2021
34d2c93
removed some prints
levsnv Sep 22, 2021
e651513
Merge branch 'branch-21.10' of github.com:rapidsai/cuml into categori…
levsnv Sep 22, 2021
af213ea
style
levsnv Sep 22, 2021
91355e3
noinline
levsnv Sep 22, 2021
dc3d103
fixed wrong partial_sum application
levsnv Sep 22, 2021
4a334bf
style, restore omp
levsnv Sep 22, 2021
f1b468e
style
levsnv Sep 22, 2021
1e0784b
clean up
levsnv Sep 22, 2021
7482b32
Merge branch 'branch-21.10' of github.com:rapidsai/cuml into categori…
levsnv Sep 23, 2021
3c75fa5
removed WAR on exporting empty categorical nodes to treelite
levsnv Sep 23, 2021
94bf96e
noinline
levsnv Sep 23, 2021
1082213
...
levsnv Sep 23, 2021
f7dbabe
...
levsnv Sep 23, 2021
024ba16
some review comments, no functional changes
levsnv Sep 23, 2021
e394c60
Apply suggestions from code review
levsnv Sep 23, 2021
4fefb1b
"fixed" race condition in n_nodes, python
levsnv Sep 24, 2021
1bc3a82
add the empty categorical node workaround back
levsnv Sep 24, 2021
0868dd4
no warning on empty max_matching
levsnv Sep 24, 2021
e1e25cc
small review comments
levsnv Sep 24, 2021
f9073d8
small review comments
levsnv Sep 24, 2021
8c5800d
small review comments
levsnv Sep 24, 2021
baf7c52
todo
levsnv Sep 24, 2021
29b4f6e
style
levsnv Sep 24, 2021
c7d9110
Merge branch 'categorical-treelite' of github.com:levsnv/cuml into ca…
levsnv Sep 24, 2021
8ccdb27
remove empty category list workaround
levsnv Sep 24, 2021
2ef383f
Revert "remove empty category list workaround"
levsnv Sep 25, 2021
fb67202
addressed review comments
levsnv Sep 25, 2021
8b8ccec
size_t, uint32_t, int32_t
levsnv Sep 28, 2021
36f090e
Merge branch 'branch-21.10' into categorical-treelite
dantegd Sep 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 52 additions & 61 deletions cpp/src/fil/fil.cu
Original file line number Diff line number Diff line change
Expand Up @@ -604,12 +604,24 @@ int max_depth(const tl::ModelImpl<T, L>& model)
return depth;
}

// constructs a vector of size max_fid (number of features, or columns) from a Treelite tree,
// where each feature has a maximum matching category and number of categorical nodes
cat_feature_counters reduce_two_feature_counters(cat_feature_counters a, cat_feature_counters b)
levsnv marked this conversation as resolved.
Show resolved Hide resolved
{
return {.max_matching = std::max(a.max_matching, b.max_matching),
.n_nodes = a.n_nodes + b.n_nodes};
}

void eltwise_reduce_two_feature_counter_vectors(std::vector<cat_feature_counters>& dst,
levsnv marked this conversation as resolved.
Show resolved Hide resolved
const std::vector<cat_feature_counters>& extra)
{
std::transform(dst.begin(), dst.end(), extra.begin(), dst.begin(), reduce_two_feature_counters);
}

// constructs a vector of size n_cols (number of features, or columns) from a Treelite tree,
// where each feature has a maximum matching category and node count (from this tree alone).
template <typename T, typename L>
inline std::vector<int> max_matching_cat(const tl::Tree<T, L>& tree, int max_fid)
inline std::vector<cat_feature_counters> cf_vec(const tl::Tree<T, L>& tree, int n_cols)
levsnv marked this conversation as resolved.
Show resolved Hide resolved
{
std::vector<int> res(max_fid);
std::vector<cat_feature_counters> res(n_cols);
std::stack<int> stack;
stack.push(tree_root(tree));
while (!stack.empty()) {
Expand All @@ -631,8 +643,8 @@ inline std::vector<int> max_matching_cat(const tl::Tree<T, L>& tree, int max_fid
} else {
max_matching_cat = -1;
}
int* max_matching_res = &res[tree.SplitIndex(node_id)];
*max_matching_res = std::max(*max_matching_res, max_matching_cat);
cat_feature_counters& counters = res[tree.SplitIndex(node_id)];
counters = reduce_two_feature_counters(counters, {max_matching_cat, 1});
levsnv marked this conversation as resolved.
Show resolved Hide resolved
}
stack.push(tree.LeftChild(node_id));
node_id = tree.RightChild(node_id);
Expand All @@ -641,10 +653,10 @@ inline std::vector<int> max_matching_cat(const tl::Tree<T, L>& tree, int max_fid
return res;
}

// constructs a vector of size max_fid (number of features, or columns) from a Treelite tree,
// where each feature has a maximum matching category and number of categorical nodes
// fills cat_sets.n_nodes[] (size number of features, or columns) from a Treelite tree,
levsnv marked this conversation as resolved.
Show resolved Hide resolved
// where each feature has a number of categorical nodes
template <typename T, typename L>
inline std::size_t bit_pool_size(const tl::Tree<T, L>& tree, cat_sets_owner& cat_sets)
inline std::size_t bit_pool_size(const tl::Tree<T, L>& tree, const categorical_sets& cat_sets)
{
std::size_t size = 0;
std::stack<int> stack;
Expand All @@ -653,11 +665,9 @@ inline std::size_t bit_pool_size(const tl::Tree<T, L>& tree, cat_sets_owner& cat
int node_id = stack.top();
stack.pop();
while (!tree.IsLeaf(node_id)) {
if (tree.SplitType(node_id) == tl::SplitFeatureType::kCategorical &&
tree.HasMatchingCategories(node_id)) {
if (tree.SplitType(node_id) == tl::SplitFeatureType::kCategorical) {
int fid = tree.SplitIndex(node_id);
size += cat_sets.accessor().sizeof_mask(fid);
++cat_sets.n_nodes[fid];
size += cat_sets.sizeof_mask(fid);
}
stack.push(tree.LeftChild(node_id));
node_id = tree.RightChild(node_id);
Expand All @@ -666,31 +676,26 @@ inline std::size_t bit_pool_size(const tl::Tree<T, L>& tree, cat_sets_owner& cat
return size;
}

void vec_max(std::vector<int>& dst, const std::vector<int>& extra)
{
std::transform(dst.begin(), dst.end(), extra.begin(), dst.begin(), [](int a, int b) {
return std::max(a, b);
});
}

template <typename T, typename L>
cat_sets_owner allocate_cat_sets_owner(const tl::ModelImpl<T, L>& model)
{
#pragma omp declare reduction(vec_max_red : std::vector<int> \
: vec_max(omp_out, omp_in)) \
#pragma omp declare reduction(cf_vec_red : std::vector<cat_feature_counters> \
: eltwise_reduce_two_feature_counter_vectors(omp_out, omp_in)) \
initializer(omp_priv = omp_orig)
const auto& trees = model.trees;
cat_sets_owner cat_sets(model.num_feature, trees.size());
std::vector<int>& max_matching = cat_sets.max_matching;
#pragma omp parallel for reduction(vec_max_red : max_matching)
cat_sets_owner cat_sets;
std::vector<cat_feature_counters> counters(model.num_feature);
#pragma omp parallel for reduction(cf_vec_red : counters)
for (size_t i = 0; i < trees.size(); ++i) {
vec_max(max_matching, max_matching_cat(trees[i], model.num_feature));
eltwise_reduce_two_feature_counter_vectors(counters, cf_vec(trees[i], model.num_feature));
}
cat_sets.consume_counters(counters);
std::vector<size_t> bit_pool_sizes(trees.size());
#pragma omp parallel for
for (size_t i = 0; i < trees.size(); ++i) {
cat_sets.bit_pool_sizes[i] = bit_pool_size(trees[i], cat_sets);
bit_pool_sizes[i] = bit_pool_size(trees[i], cat_sets.accessor());
}
cat_sets.initialize_from_bit_pool_sizes();
cat_sets.consume_bit_pool_sizes(bit_pool_sizes);
return cat_sets;
}

Expand Down Expand Up @@ -786,7 +791,8 @@ void tl2fil_leaf_payload(fil_node_t* fil_node,
template <typename fil_node_t>
struct conversion_state {
fil_node_t node;
int tl_left, tl_right;
int tl_left;
int tl_right;
};

// modifies cat_sets
Expand All @@ -799,42 +805,27 @@ conversion_state<fil_node_t> tl2fil_inner_node(int fil_left_child,
size_t* bit_pool_offset)
{
int tl_left = tree.LeftChild(tl_node_id), tl_right = tree.RightChild(tl_node_id);
val_t split{};
int feature_id = tree.SplitIndex(tl_node_id);
bool is_categorical, default_left;
val_t split = {.f = NAN}; // yes there's a default initializer already
int feature_id = tree.SplitIndex(tl_node_id);
bool is_categorical = tree.SplitType(tl_node_id) == tl::SplitFeatureType::kCategorical;
bool default_left = tree.DefaultLeft(tl_node_id);
if (tree.SplitType(tl_node_id) == tl::SplitFeatureType::kNumerical) {
is_categorical = false;
default_left = tree.DefaultLeft(tl_node_id);
split.f = static_cast<float>(tree.Threshold(tl_node_id));
split.f = static_cast<float>(tree.Threshold(tl_node_id));
adjust_threshold(&split.f, &tl_left, &tl_right, &default_left, tree.ComparisonOp(tl_node_id));
} else if (tree.SplitType(tl_node_id) == tl::SplitFeatureType::kCategorical) {
is_categorical = true;
default_left = !tree.DefaultLeft(tl_node_id);
// for FIL, the list of categories is always for the right child
if (tree.CategoriesListRightChild(tl_node_id) == false) std::swap(tl_left, tl_right);
if (tree.CategoriesListRightChild(tl_node_id) == false) {
levsnv marked this conversation as resolved.
Show resolved Hide resolved
std::swap(tl_left, tl_right);
default_left = !default_left;
}
int sizeof_mask = cat_sets->accessor().sizeof_mask(feature_id);
split.idx = *bit_pool_offset;
*bit_pool_offset += sizeof_mask;
ASSERT(split.idx >= 0, "split.idx < 0");
std::vector<uint32_t> matching_cats = tree.MatchingCategories(tl_node_id);
auto category_it = matching_cats.begin();
ASSERT(matching_cats.size() == 0 || matching_cats.data() != nullptr,
"internal error: nullptr from treelite");
// treelite guarantees tree.MatchingCategories() are in ascending order
// we have to initialize all pool bytes, so we iterate over those and keep category_it up to
// date
for (uint32_t which_8cats = 0; which_8cats < (uint32_t)sizeof_mask; ++which_8cats) {
uint8_t eight_cats = 0;
for (uint32_t bit = 0; bit < BITS_PER_BYTE; ++bit) {
if (category_it < matching_cats.end() &&
*category_it == which_8cats * BITS_PER_BYTE + bit) {
eight_cats |= 1 << bit;
++category_it;
}
}
cat_sets->bits[split.idx + which_8cats] = eight_cats;
// cat_sets->bits have been zero-initialized
uint8_t* bits = &cat_sets->bits[split.idx];
for (uint32_t category : tree.MatchingCategories(tl_node_id)) {
bits[category / BITS_PER_BYTE] |= 1 << (category % BITS_PER_BYTE);
}
ASSERT(category_it == matching_cats.end(), "internal error: didn't convert all categories");
} else {
ASSERT(false, "only numerical and categorical split nodes are supported");
}
Expand All @@ -844,7 +835,7 @@ conversion_state<fil_node_t> tl2fil_inner_node(int fil_left_child,
} else {
node = fil_node_t({}, split, feature_id, default_left, false, is_categorical, fil_left_child);
}
return {node, tl_left, tl_right};
return conversion_state<fil_node_t>{node, tl_left, tl_right};
}

template <typename T, typename L>
Expand Down Expand Up @@ -929,7 +920,7 @@ __noinline__ int tree2fil_sparse(std::vector<fil_node_t>& nodes,
std::stack<pair_t> stack;
int built_index = root + 1;
stack.push(pair_t(tree_root(tree), 0));
size_t bit_pool_offset = cat_sets->bit_pool_offsets[tree_idx];
std::size_t bit_pool_offset = cat_sets->bit_pool_offsets[tree_idx];
while (!stack.empty()) {
const pair_t& top = stack.top();
int node_id = top.first;
Expand Down Expand Up @@ -1314,7 +1305,7 @@ void from_treelite(const raft::handle_t& handle,
if (storage_type == storage_type_t::AUTO) {
if (tl_params->algo == algo_t::ALGO_AUTO || tl_params->algo == algo_t::NAIVE) {
int depth = max_depth(model);
// max 2**25 dense nodes, 256 MiB dense model size. Categorical mask size unlimited.
// max 2**25 dense nodes, 256 MiB dense model size. Categorical mask size is unlimited.
levsnv marked this conversation as resolved.
Show resolved Hide resolved
const int LOG2_MAX_DENSE_NODES = 25;
int log2_num_dense_nodes = depth + 1 + int(ceil(std::log2(model.trees.size())));
storage_type = log2_num_dense_nodes > LOG2_MAX_DENSE_NODES ? storage_type_t::SPARSE
Expand Down Expand Up @@ -1398,7 +1389,7 @@ char* sprintf_shape(const tl::ModelImpl<threshold_t, leaf_t>& model,
forest_shape << storage_type_repr[storage] << " model size " << std::setprecision(2) << size_mb
<< " MB" << std::endl;
if (cat_sets.bits.size() > 0) {
forest_shape << "categorical nodes for each feature id: {";
forest_shape << "number of categorical nodes for each feature id: {";
std::size_t total_cat_nodes = 0;
for (std::size_t n : cat_sets.n_nodes) {
forest_shape << n << " ";
Expand Down
46 changes: 24 additions & 22 deletions cpp/src/fil/internal.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -306,18 +306,13 @@ struct forest_params_t {
/// FIL_TPB is the number of threads per block to use with FIL kernels
const int FIL_TPB = 256;

const int32_t MAX_PRECISE_INT_FLOAT = 1 << 24; // 16'777'216
constexpr int32_t MAX_PRECISE_INT_FLOAT = 1 << 24; // 16'777'216

__host__ __device__ __forceinline__ int fetch_bit(const uint8_t* array, int bit)
{
return (array[bit / BITS_PER_BYTE] >> (bit % BITS_PER_BYTE)) & 1;
}

struct cat_feature_counters {
int max_matching = -1;
int n_nodes = 0;
};

struct categorical_sets {
// arrays are const to use fast GPU read instructions by default
// arrays from each node ID are concatenated first, then from all categories
Expand Down Expand Up @@ -376,14 +371,21 @@ struct tree_base {
if (isnan(val)) {
cond = !node.def_left();
} else if (CATS_SUPPORTED && node.is_categorical()) {
cond = cat_sets.category_matches(node, (int)val);
cond = cat_sets.category_matches(node, static_cast<int>(val));
} else {
cond = val >= node.thresh();
}
return node.left(node_idx) + cond;
}
};

// -1 means no matching categories
struct cat_feature_counters {
levsnv marked this conversation as resolved.
Show resolved Hide resolved
int max_matching = -1;
int n_nodes = 0;
};

// used only during model import. For inference, trimmed down using cat_sets_owner::accessor()
// in internal.cuh, as opposed to fil_test.cu, because importing from treelite will require it
struct cat_sets_owner {
levsnv marked this conversation as resolved.
Show resolved Hide resolved
// arrays from each node ID are concatenated first, then from all categories
Expand All @@ -394,10 +396,12 @@ struct cat_sets_owner {
// how many categorical nodes use a given feature id. Used for model shape string.
std::vector<std::size_t> n_nodes;
// per tree, size and offset of bit pool within the overall bit pool
std::vector<std::size_t> bit_pool_sizes, bit_pool_offsets;
std::vector<std::size_t> bit_pool_offsets;

categorical_sets accessor() const
{
ASSERT(bits.size() < INT_MAX,
levsnv marked this conversation as resolved.
Show resolved Hide resolved
levsnv marked this conversation as resolved.
Show resolved Hide resolved
"too many categories/categorical nodes: cannot store bits offset in node");
return {
.bits = bits.data(),
.max_matching = max_matching.data(),
Expand All @@ -406,11 +410,19 @@ struct cat_sets_owner {
};
}

void initialize_from_bit_pool_sizes()
void consume_counters(const std::vector<cat_feature_counters>& counters)
levsnv marked this conversation as resolved.
Show resolved Hide resolved
{
for (cat_feature_counters cf : counters) {
max_matching.push_back(cf.max_matching);
n_nodes.push_back(cf.n_nodes);
}
}

void consume_bit_pool_sizes(const std::vector<std::size_t>& bit_pool_sizes)
{
bit_pool_offsets[0] = 0;
for (std::size_t i = 1; i < bit_pool_sizes.size(); ++i) {
bit_pool_offsets[i] = bit_pool_offsets[i - 1] + bit_pool_sizes[i - 1];
bit_pool_offsets.push_back(0);
for (std::size_t i = 0; i < bit_pool_sizes.size() - 1; ++i) {
bit_pool_offsets.push_back(bit_pool_offsets.back() + bit_pool_sizes[i]);
}
bits.resize(bit_pool_offsets.back() + bit_pool_sizes.back());
}
Expand All @@ -420,16 +432,6 @@ struct cat_sets_owner {
: bits(bits_), max_matching(max_matching_)
{
}

// accepting int because GPU code only allows max<int> features
cat_sets_owner(int num_features, std::size_t num_trees)
: bits(0),
max_matching(num_features, -1),
n_nodes(num_features, 0),
bit_pool_offsets(num_trees),
bit_pool_sizes(num_trees)
{
}
};

std::ostream& operator<<(std::ostream& os, const cat_sets_owner& cso);
Expand Down
24 changes: 19 additions & 5 deletions cpp/test/sg/fil_test.cu
Original file line number Diff line number Diff line change
Expand Up @@ -293,10 +293,10 @@ class BaseFilTest : public testing::TestWithParam<FilTestParams> {
// uniformily distributed in orders of magnitude: smaller models which
// still stress large bitfields.
// up to 10**ps.max_magnitude_of_matching_cat (only if feature is categorical, else -1)
cat_sets_h = cat_sets_owner(ps.num_cols, ps.num_trees);
std::mt19937 gen(ps.seed);
std::uniform_real_distribution mmc(-1.0f, ps.max_magnitude_of_matching_cat);
std::bernoulli_distribution fc(ps.feature_categorical_prob);
cat_sets_h.max_matching.resize(ps.num_cols);
levsnv marked this conversation as resolved.
Show resolved Hide resolved
for (int fid = 0; fid < ps.num_cols; ++fid) {
feature_categorical[fid] = fc(gen);
if (feature_categorical[fid]) {
Expand All @@ -305,7 +305,7 @@ class BaseFilTest : public testing::TestWithParam<FilTestParams> {
ASSERT(mm < INT_MAX,
"internal error: max_magnitude_of_matching_cat %f is too large",
ps.max_magnitude_of_matching_cat);
cat_sets_h.max_matching[fid] = (int)mm;
cat_sets_h.max_matching[fid] = mm;
} else {
cat_sets_h.max_matching[fid] = -1;
}
Expand All @@ -330,14 +330,15 @@ class BaseFilTest : public testing::TestWithParam<FilTestParams> {

// count nodes for each feature id, while splitting the sets between nodes
std::size_t bit_pool_size = 0;
cat_sets_h.n_nodes = std::vector<size_t>(ps.num_cols, 0);
for (std::size_t node_id = 0; node_id < num_nodes; ++node_id) {
int fid = fids_h[node_id];

if (!feature_categorical[fid] || is_leafs_h[node_id]) is_categoricals_h[node_id] = 0.0f;

if (is_categoricals_h[node_id] == 1.0) {
// might allocate a categorical set for an unreachable inner node. That's OK.
cat_sets_h.n_nodes[fid]++;
++cat_sets_h.n_nodes[fid];
node_cat_set[node_id] = bit_pool_size;
bit_pool_size += cat_sets_h.accessor().sizeof_mask(fid);
}
Expand Down Expand Up @@ -774,8 +775,12 @@ class TreeliteFilTest : public BaseFilTest {
if (dense_node.is_categorical()) {
uint8_t byte = 0;
for (int category = 0; category <= cat_sets_h.max_matching[dense_node.fid()]; ++category) {
if (category % 8 == 0) byte = cat_sets_h.bits[dense_node.set() + category / 8];
if ((byte & 1 << category % 8) != 0) left_categories.push_back(category);
if (category % BITS_PER_BYTE == 0) {
byte = cat_sets_h.bits[dense_node.set() + category / BITS_PER_BYTE];
}
if ((byte & (1 << (category % BITS_PER_BYTE))) != 0) {
left_categories.push_back(category);
}
}
} else {
threshold = dense_node.thresh();
Expand Down Expand Up @@ -1235,6 +1240,15 @@ std::vector<FilTestParams> import_dense_inputs = {
FIL_TEST_PARAMS(print_forest_shape = true),
FIL_TEST_PARAMS(leaf_algo = VECTOR_LEAF, num_classes = 2),
FIL_TEST_PARAMS(leaf_algo = VECTOR_LEAF, num_trees = 19, num_classes = 20),
FIL_TEST_PARAMS(node_categorical_prob = 0.5, feature_categorical_prob = 0.5),
FIL_TEST_PARAMS(
node_categorical_prob = 1.0, feature_categorical_prob = 1.0, cat_match_prob = 1.0),
FIL_TEST_PARAMS(
node_categorical_prob = 1.0, feature_categorical_prob = 1.0, cat_match_prob = 0.0),
FIL_TEST_PARAMS(depth = 3,
node_categorical_prob = 0.5,
feature_categorical_prob = 0.5,
max_magnitude_of_matching_cat = 5),
};

TEST_P(TreeliteDenseFilTest, Import) { compare(); }
Expand Down
3 changes: 2 additions & 1 deletion python/cuml/test/test_fil.py
Original file line number Diff line number Diff line change
Expand Up @@ -499,8 +499,9 @@ def to_categorical(features, n_categorical):
cat_cols = features[:, :n_categorical]
cat_cols = cat_cols - cat_cols.min(axis=1, keepdims=True) # range [0, ?]
cat_cols /= cat_cols.max(axis=1, keepdims=True) # range [0, 1]
rough_n_categories = 100
# round into rough_n_categories bins
cat_cols = (cat_cols * 100).astype(int)
cat_cols = (cat_cols * rough_n_categories).astype(int)
for icol in range(n_categorical):
col = cat_cols[:, icol]
df_cols[icol] = pd.Series(pd.Categorical(col,
Expand Down