-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TASK] Review types used for size and indexing throughout codebase #4105
Comments
|
For 1, can you give me an example of code where a raw loop provides a performance benefit that cannot be achieved without a raw loop so I make sure I understand what you're getting at? For 3, yes, definitely. For 5, that's where the recommendation to use |
Timer together = Timer::start("together");
// count nodes for each feature id, while splitting the sets between nodes
std::size_t bit_pool_size = 0;
for (std::size_t node_id = 0; node_id < num_nodes; ++node_id) {
int fid = fids_h[node_id];
if (!feature_categorical[fid] || is_leafs_h[node_id]) is_categoricals_h[node_id] = 0.0f;
if (is_categoricals_h[node_id] == 1.0) {
// might allocate a categorical set for an unreachable parent node. That's OK.
++cf[fid].n_nodes;
node_cat_set[node_id] = bit_pool_size;
bit_pool_size += categorical_sets::sizeof_mask_from_max_matching(cf[fid].max_matching);
}
}
together.stop();
which I could split like this for readability, sacrificing 0.66% runtime in a third of our tests. I agree that there may be larger opportunities to optimize, but this doesn't require a perf investigation, and I feel like it's significant enough to have a decent approach from the start, both in readability and performance. Timer fct = Timer::start("fc");
for (std::size_t node_id = 0; node_id < num_nodes; ++node_id) {
int fid = fids_h[node_id];
if (!feature_categorical[fid] || is_leafs_h[node_id]) is_categoricals_h[node_id] = 0.0f;
}
fct.stop();
Timer cft = Timer::start("cf");
// count nodes for each feature id, while splitting the sets between nodes
std::size_t bit_pool_size = 0;
for (std::size_t node_id = 0; node_id < num_nodes; ++node_id) {
if (is_categoricals_h[node_id] == 1.0) {
int fid = fids_h[node_id];
// might allocate a categorical set for an unreachable parent node. That's OK.
++cf[fid].n_nodes;
node_cat_set[node_id] = bit_pool_size;
bit_pool_size += categorical_sets::sizeof_mask_from_max_matching(cf[fid].max_matching);
}
}
cft.stop(); If I split it further (e.g. |
Sorry, I should have been more specific; I was more looking for a standalone example. If we need to dive into this further, we might look at something we can play with in Godbolt, but here's a very rough cut at one way to move the code you posted away from raw loops: auto begin = zip_iterator(is_categoricals_h.begin(), fids_h.begin(), is_leafs_h.begin(), node_cat_set.begin());
auto end = zip_iterator(is_categoricals_h.end(), fids_h.end(), is_leafs_h.end(), node_cat_set.end());
auto bit_pool_size = std::accumulate(
begin, end, std::size_t{},
[&feature_categorical, &cf, &bit_pool_size](auto result, auto&& tup) {
auto& [cat, fid, leaf, ncs] = tup;
if (!feature_categorical[fid] || leaf) {
cat = 0.0f;
}
if (cat == 1.0f) {
++cf[fid].n_nodes;
ncs = bit_pool_size;
result += categorical_sets::sizeof_mask_from_max_matching(
cf[fid].max_matching);
}
return result;
}); All that's untested, so don't try to grab it as is ;). Note that we should only bring each element of Do you have any other examples of basic problems with performance that might emerge from moving away from a raw loop? I won't say they don't exist, but I've yet to encounter one myself. I've encountered plenty of situations where raw loops are slower than an algorithm-based implementation, though. |
I'll skip the nitpicking and focus on the main thing: |
I was focusing solely on the performance question you raised (i.e. whether avoiding raw loops necessarily imposes a performance penalty in some cases). If you're interested in specific recommendations for refactoring on that PR that would also keep things concise, you can always tag me for code review, but I can't say much about what the "right" way to structure that would be out of context. For the purposes of this particular issue, I'm more interested in any general problems that might arise by avoiding raw loops. If there are general pitfalls that exist, it would be great to document them here. While avoiding raw loops is generally a good idea (these gentlemen lay out the case very nicely) for other reasons, that basic principle is not the focus of this proposal, either. It's more about establishing consistency with how we're indexing, and one way to do that is by avoiding indexing altogether, which can eliminate other bugs. |
Related to #4105. This PR fixes the types used for indexing and sizes in the PCA/TSVD C++ code. Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #4255
This issue has been labeled |
This issue has been labeled |
Related to rapidsai#4105. This PR fixes the types used for indexing and sizes in the PCA/TSVD C++ code. Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4255
As noted in discussion on #4075, there are currently places in the codebase where we are using signed integers for positive indexing or to represent container sizes or other quantities that are better represented by other types (generally
std::size_t
). In order to clearly signal the semantic meaning of these variables, avoid signed/unsigned casting and comparisons, and to reduce the likelihood of overflow/underflow bugs, I propose the following rules:some_container<T>::size_type
and use the same type for raw positive indexing. Liberal use ofauto
should make this less onerous, and rule 1 should make it rare.std::size_t
unless there is a clear performance/resource reason to use a smaller type. In this case, an unsigned integer of the desired size should be used explicitly (e.g.uint32_t
) and a comment should be added where the variable is declared explaining the performance consideration. If this is necessary for a custom container, aliascustom_container<T>::size_type
to the selected integer type.iterator::difference_type
where possible or explicitly invokestd::ptrdiff_t
otherwise.int
to represent mathematical integers, quantities which can take on negative values and which are not sizes or indexes. A signed integer should not be used simply to include-1
as a special case; preferstd::optional
or other ways of signaling this more clearly.If we are better about rule 1, rules 2-5 should become less and less relevant.
TL;DR version:
container::size_type
for sizes and positive indexing where possible,std::size_t
otherwiseiterator::difference_type
for negative indexing where possible,std::ptrdiff_t
otherwiseint
is a mathematical quantity, not a size or an indexI'm not married to these rules, but they seem like a reasonably small and complete set that is consistent with other style and usage guidelines. We can use this issue for any necessary discussion until #4075 is merged, and then I'll be happy to work on implementing whatever we land on.
The text was updated successfully, but these errors were encountered: