-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizations for contiguous iterators #1433
Optimizations for contiguous iterators #1433
Conversation
I really like the idea of getting more optimizations in, but could we wait until the rework of the machinery is done. There is a huge amount of merge conflicts flying in |
That's why I left out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this will trigger the ominous volatile
breaks continuous iterators bug which essentially prohibits optimizations for continuous iterators currently as anything with volatile
does not fulfill readable
stl/inc/algorithm
Outdated
@@ -418,6 +418,29 @@ namespace ranges { | |||
template <input_iterator _It, sentinel_for<_It> _Se, class _Ty, class _Pj> | |||
requires indirect_binary_predicate<ranges::equal_to, projected<_It, _Pj>, const _Ty*> | |||
_NODISCARD constexpr _It _Find_unchecked(_It _First, const _Se _Last, const _Ty& _Val, _Pj _Proj) { | |||
if constexpr (contiguous_iterator<_It> && sized_sentinel_for<_Se, _It> && same_as<_Pj, identity> && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a new trait _Memchr_in_find_is_save
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to make it a new trait because it's not very complex and improves readability a lot. I still don't want to change the _Lex_compare_memcmp_classify
stuff in this PR though (but maybe I'll change my mind if I have more time).
stl/inc/algorithm
Outdated
if constexpr (!same_as<_Memcmp_classification_pred, | ||
void> && sized_sentinel_for<_Se1, _It1> && sized_sentinel_for<_Se2, _It2> // | ||
&& same_as<_Pj1, identity> && same_as<_Pj2, identity>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be _Memcmp_in_lexicographical_compare_is_safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to make too big changes in this PR. The purpose of this PR is just to enable optimizations. You can file a separate issue to track this later.
#872 also applies to all other optimizations. |
See DevCom-876860 "conditional operator errors" blocks readable<volatile int*>. |
But only merge conflict I can see is naming which will be quite easy to resolve. |
@@ -63,6 +63,14 @@ constexpr void smoke_test() { | |||
int const two_ints[] = {0, 1}; | |||
assert(!equal(one_int, two_ints, comp, proj, proj)); | |||
} | |||
{ | |||
// Validate memcmp case | |||
int arr1[3]{0, 2, 5}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these only validate that the memcmp case is correct not that the optimization actually happened. Any ideas on how to test that it actually happened (I don't have any good ones, maybe a struct with stuff written in it's padding?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a way to test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I can only think of ways that require doing UB
we could also use a regression test for the std::fill bug if anyone has a good idea how to write one. |
Status update: I haven't forgotten about this PR, and plan to review it as soon as I can find some hours free of distractions and high-priority C++20 tasks. 😺 |
Thanks! I pushed some small changes after verifying a full test pass - let me know if anything looks wrong, otherwise I think this is ready to merge 🚀 FYI @barcharcraz I pushed changes after you approved (long ago). Also FYI @miscco @CaseyCarter this had to be merged with the recent ranges work; I believe that it shouldn't interfere with anything but wanted to point out the interaction. |
&& sized_sentinel_for<_Se2, _It2> && same_as<_Pj1, identity> && same_as<_Pj2, identity>) { | ||
if (!_STD is_constant_evaluated()) { | ||
const auto _Num1 = static_cast<size_t>(_Last1 - _First1); | ||
const auto _Num2 = static_cast<size_t>(_Last2 - _First2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: i believe _Num
is a bit too generic. Maybe use _Len
or _Size
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_Num
is already used in the original std::lexicographical_compare
and std::lexicographical_compare_three_way
(and in lot's of other places) and after all it clearly describes what it is - the number of elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @miscco's point (_Num\d*
is relatively uncommon, and a bit more generic - one could speak of a number of matching elements possibly with gaps, while len/size is more specific), but I also agree with @AdamBucior's point - this is consistent with existing usage and seems to be clear so there is no risk of confusion.
The thing that would tip the scales for me is if we had multiple numbers of stuff involved - then it would be good to give distinct names to each. That isn't the case here, where we just have 1/2 mirroring the ranges named 1/2. Thus I think that the code is fine as-is.
Thanks for bringing this up! Clear naming is indeed important 😸
template <class _Iter, class _Ty> | ||
_INLINE_VAR constexpr bool _Memchr_in_find_is_safe = | ||
_Iterator_is_contiguous<_Iter>&& | ||
disjunction_v<conjunction<is_integral<_Ty>, _Is_character_or_bool<_Iter_value_t<_Iter>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use _Is_character_or_byte_or_bool
here instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use _Is_character_or_byte_or_bool
it might allow for find
ing integers in std::byte
ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I am wondering what happens with integers of different sizes.
Shouldnt we always have something like is_same<_Iter_value_t<_Iter>, _Ty>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integers of different sizes are handled by the _Within_limits
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to look at it in Detail, but I do not really like that we do essentially the same thing at two different places
Could _Within_limits
simply reject mixing byte with integrals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It certainly could but I doubt it would be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I didn't implement the memchr
optimization in find
, I added _Within_limits
to generalize it, much like how it's being further generalized here, so I can explain its intended design. The intent was for _Within_limits
to handle runtime conditions - i.e. when searching for a value in a range would compile, and would be eligible for the memchr
optimization, except that the specific value at runtime could never produce a match. (For example, searching for 300u
in a range of unsigned char
.) It has both performance and correctness consequences (we can skip the entire range when it won't be found, and static_cast<unsigned char>(300u)
could produce spurious matches so we must avoid that).
(_Within_limits
is a variant of C++20 in_range
although I think the new bool
behavior makes them different.)
In contrast, the metaprogramming that @AdamBucior is extracting as _Memchr_in_find_is_safe
determines when the optimization is applicable at compiletime - e.g. searching for const char*
in a range of string
is inapplicable. Similarly, searching for byte
in a range of unsigned char
is inapplicable - that should be a compiler error (and will be, if we let the classic algorithm attempt to use ==
).
Attempting to get _Within_limits
to reject byte
would result in code that compiles and always returns false
, instead of code that fails to compile. That would be a problem.
While it appears that _Memchr_in_find_is_safe
and _Within_limits
have similar logic, they're doing different things that shouldn't be mixed. I think that future simplifications to _Within_limits
(now that we know how to implement in_range
conveniently, and have if constexpr
) would reduce the apparent duplication.
There's a test failure when porting this to the MSVC-internal repo. I haven't investigated/reduced it, but it looks like the increased usage of The internal test is
(I believe this is https://github.com/CaseyCarter/cmcstl2/blob/master/test/iterator/unreachable.cpp ) |
It appears that this is a cmcstl2 bug/deficiency. AFAICT, The Standard's Lines 421 to 437 in ef6c1ce
I suspect we should skip this test internally. |
Co-authored-by: Casey Carter <[email protected]>
I believe this adds comprehensive coverage of the remaining algorithm optimizations for Ranges, and extends our coverage from pointers to contiguous iterators in general. I've therefore linked #1756. Thanks for this amazing work, @AdamBucior! |
Thanksagainfortheseperformanceimprovementsforcontiguousdata!:joy_cat: |
Changes in this PR:
memchr
optimization forbool
s andbyte
s infind
memchr
optimization inranges::find
memcmp
optimization inranges::equal
andranges::lexicographical_compare
equal
not working with non-unwrappable contiguous iterators (it tried toreinterpret_cast
them without callingto_address
)ranges::fill
trying to performmemset
optimization even if sentinel is not sized