Optimizations for contiguous iterators #1433

AdamBucior · 2020-11-07T20:08:19Z

Changes in this PR:

Implemented memchr optimization for bools and bytes in find
Implemented memchr optimization in ranges::find
Implemented memcmp optimization in ranges::equal and ranges::lexicographical_compare
Enabled optimizations for non-unwrappable contiguous iterators
Fixed equal not working with non-unwrappable contiguous iterators (it tried to reinterpret_cast them without calling to_address)
Fixed ranges::fill trying to perform memset optimization even if sentinel is not sized

miscco · 2020-11-07T20:20:19Z

I really like the idea of getting more optimizations in, but could we wait until the rework of the machinery is done. There is a huge amount of merge conflicts flying in

AdamBucior · 2020-11-07T20:25:14Z

I really like the idea of getting more optimizations in, but could we wait until the rework of the machinery is done. There is a huge amount of merge conflicts flying in

That's why I left out copy/move.

miscco

Note that this will trigger the ominous volatile breaks continuous iterators bug which essentially prohibits optimizations for continuous iterators currently as anything with volatile does not fulfill readable

miscco · 2020-11-07T20:21:10Z

stl/inc/algorithm

@@ -418,6 +418,29 @@ namespace ranges {
    template <input_iterator _It, sentinel_for<_It> _Se, class _Ty, class _Pj>
        requires indirect_binary_predicate<ranges::equal_to, projected<_It, _Pj>, const _Ty*>
    _NODISCARD constexpr _It _Find_unchecked(_It _First, const _Se _Last, const _Ty& _Val, _Pj _Proj) {
+        if constexpr (contiguous_iterator<_It> && sized_sentinel_for<_Se, _It> && same_as<_Pj, identity> &&


This should be a new trait _Memchr_in_find_is_save

Decided to make it a new trait because it's not very complex and improves readability a lot. I still don't want to change the _Lex_compare_memcmp_classify stuff in this PR though (but maybe I'll change my mind if I have more time).

stl/inc/algorithm

miscco · 2020-11-07T20:26:18Z

stl/inc/algorithm

+            if constexpr (!same_as<_Memcmp_classification_pred,
+                              void> && sized_sentinel_for<_Se1, _It1> && sized_sentinel_for<_Se2, _It2> //
+                          && same_as<_Pj1, identity> && same_as<_Pj2, identity>) {


This should be _Memcmp_in_lexicographical_compare_is_safe

I don't want to make too big changes in this PR. The purpose of this PR is just to enable optimizations. You can file a separate issue to track this later.

miscco · 2020-11-07T20:29:36Z

I really like the idea of getting more optimizations in, but could we wait until the rework of the machinery is done. There is a huge amount of merge conflicts flying in

That's why I left out copy/move.

#872 also applies to all other optimizations.

miscco · 2020-11-07T20:30:42Z

See DevCom-876860 "conditional operator errors" blocks readable<volatile int*>.

AdamBucior · 2020-11-07T20:31:35Z

I really like the idea of getting more optimizations in, but could we wait until the rework of the machinery is done. There is a huge amount of merge conflicts flying in

That's why I left out copy/move.

#872 also applies to all other optimizations.

But only merge conflict I can see is naming which will be quite easy to resolve.

stl/inc/xutility

barcharcraz · 2020-12-05T03:16:57Z

tests/std/tests/P0896R4_ranges_alg_equal/test.cpp

@@ -63,6 +63,14 @@ constexpr void smoke_test() {
        int const two_ints[] = {0, 1};
        assert(!equal(one_int, two_ints, comp, proj, proj));
    }
+    {
+        // Validate memcmp case
+        int arr1[3]{0, 2, 5};


these only validate that the memcmp case is correct not that the optimization actually happened. Any ideas on how to test that it actually happened (I don't have any good ones, maybe a struct with stuff written in it's padding?)

I don't think there is a way to test it.

yeah I can only think of ways that require doing UB

barcharcraz · 2020-12-05T03:18:26Z

we could also use a regression test for the std::fill bug if anyone has a good idea how to write one.

StephanTLavavej · 2021-03-15T08:44:49Z

Status update: I haven't forgotten about this PR, and plan to review it as soon as I can find some hours free of distractions and high-priority C++20 tasks. 😺

stl/inc/xutility

stl/inc/algorithm

StephanTLavavej · 2021-03-21T03:51:07Z

Thanks! I pushed some small changes after verifying a full test pass - let me know if anything looks wrong, otherwise I think this is ready to merge 🚀

FYI @barcharcraz I pushed changes after you approved (long ago). Also FYI @miscco @CaseyCarter this had to be merged with the recent ranges work; I believe that it shouldn't interfere with anything but wanted to point out the interaction.

miscco · 2021-03-21T07:24:18Z

stl/inc/algorithm

+                          && sized_sentinel_for<_Se2, _It2> && same_as<_Pj1, identity> && same_as<_Pj2, identity>) {
+                if (!_STD is_constant_evaluated()) {
+                    const auto _Num1 = static_cast<size_t>(_Last1 - _First1);
+                    const auto _Num2 = static_cast<size_t>(_Last2 - _First2);


Nitpick: i believe _Num is a bit too generic. Maybe use _Len or _Size instead

_Num is already used in the original std::lexicographical_compare and std::lexicographical_compare_three_way (and in lot's of other places) and after all it clearly describes what it is - the number of elements.

I agree with @miscco's point (_Num\d* is relatively uncommon, and a bit more generic - one could speak of a number of matching elements possibly with gaps, while len/size is more specific), but I also agree with @AdamBucior's point - this is consistent with existing usage and seems to be clear so there is no risk of confusion.

The thing that would tip the scales for me is if we had multiple numbers of stuff involved - then it would be good to give distinct names to each. That isn't the case here, where we just have 1/2 mirroring the ranges named 1/2. Thus I think that the code is fine as-is.

Thanks for bringing this up! Clear naming is indeed important 😸

stl/inc/xutility

miscco · 2021-03-21T07:31:14Z

stl/inc/xutility

+template <class _Iter, class _Ty>
+_INLINE_VAR constexpr bool _Memchr_in_find_is_safe =
+    _Iterator_is_contiguous<_Iter>&&
+        disjunction_v<conjunction<is_integral<_Ty>, _Is_character_or_bool<_Iter_value_t<_Iter>>>


Could we use _Is_character_or_byte_or_bool here instead

If we use _Is_character_or_byte_or_bool it might allow for finding integers in std::byte ranges.

Now I am wondering what happens with integers of different sizes.

Shouldnt we always have something like is_same<_Iter_value_t<_Iter>, _Ty>

Integers of different sizes are handled by the _Within_limits function.

I need to look at it in Detail, but I do not really like that we do essentially the same thing at two different places

Could _Within_limits simply reject mixing byte with integrals?

It certainly could but I doubt it would be simpler.

Although I didn't implement the memchr optimization in find, I added _Within_limits to generalize it, much like how it's being further generalized here, so I can explain its intended design. The intent was for _Within_limits to handle runtime conditions - i.e. when searching for a value in a range would compile, and would be eligible for the memchr optimization, except that the specific value at runtime could never produce a match. (For example, searching for 300u in a range of unsigned char.) It has both performance and correctness consequences (we can skip the entire range when it won't be found, and static_cast<unsigned char>(300u) could produce spurious matches so we must avoid that).

(_Within_limits is a variant of C++20 in_range although I think the new bool behavior makes them different.)

In contrast, the metaprogramming that @AdamBucior is extracting as _Memchr_in_find_is_safe determines when the optimization is applicable at compiletime - e.g. searching for const char* in a range of string is inapplicable. Similarly, searching for byte in a range of unsigned char is inapplicable - that should be a compiler error (and will be, if we let the classic algorithm attempt to use ==).

Attempting to get _Within_limits to reject byte would result in code that compiles and always returns false, instead of code that fails to compile. That would be a problem.

While it appears that _Memchr_in_find_is_safe and _Within_limits have similar logic, they're doing different things that shouldn't be mixed. I think that future simplifications to _Within_limits (now that we know how to implement in_range conveniently, and have if constexpr) would reduce the apparent duplication.

StephanTLavavej · 2021-03-22T10:16:08Z

There's a test failure when porting this to the MSVC-internal repo. I haven't investigated/reduced it, but it looks like the increased usage of iter_value_t is causing the problem, and I suspect the test code is at fault.

The internal test is compiler tests\devtest\Concepts\cmcstl2\test\iterator and the first full error is

unreachable.cpp
xutility(443): error C2794: 'value_type': is not a member of any direct or indirect base class of 'std::indirectly_readable_traits<std::experimental::ranges::v1::common_iterator<const char *,std::experimental::ranges::v1::unreachable>>'
xutility(1183): note: see reference to alias template instantiation 'std::iter_value_t<C>' being compiled
xutility(5140): note: see reference to alias template instantiation 'std::_Iter_value_t<C>' being compiled
xutility(5186): note: see reference to variable template 'const bool _Memchr_in_find_is_safe<std::experimental::ranges::v1::common_iterator<char const *,std::experimental::ranges::v1::unreachable>,char>' being compiled
xutility(5192): note: see reference to function template instantiation '_InIt std::_Find_unchecked<_InIt,_Ty>(const _InIt,const _InIt,const _Ty &)' being compiled
        with
        [
            _InIt=C,
            _Ty=char
        ]
unreachable.cpp(20): note: see reference to function template instantiation '_InIt std::find<C,char>(_InIt,const _InIt,const _Ty &)' being compiled
        with
        [
            _InIt=C,
            _Ty=char
        ]

(I believe this is https://github.com/CaseyCarter/cmcstl2/blob/master/test/iterator/unreachable.cpp )

StephanTLavavej · 2021-03-22T20:46:47Z

It appears that this is a cmcstl2 bug/deficiency. AFAICT, std::experimental::ranges::v1::common_iterator doesn't have a value_type - instead this is provided by its (not Standard) readable_traits<common_iterator<I, S>> specialization:
https://github.com/CaseyCarter/cmcstl2/blob/684a96d527e4dc733897255c0177b784dc280980/include/stl2/detail/iterator/common_iterator.hpp#L271-L274

The Standard's indirectly_readable_traits is looking for a nested value_type or element_type and not finding any:

STL/stl/inc/xutility

Lines 421 to 437 in ef6c1ce

    
           template <_Has_member_value_type _Ty>
 
           struct indirectly_readable_traits<_Ty> : _Cond_value_type<typename _Ty::value_type> {};
 
           template <_Has_member_element_type _Ty>
 
           struct indirectly_readable_traits<_Ty> : _Cond_value_type<typename _Ty::element_type> {};
 
           // clang-format off
 
           template <_Has_member_value_type _Ty>
 
               requires _Has_member_element_type<_Ty>
 
                   && same_as<remove_cv_t<typename _Ty::value_type>, remove_cv_t<typename _Ty::element_type>>
 
           struct indirectly_readable_traits<_Ty> : _Cond_value_type<typename _Ty::value_type> {};
 
           // clang-format on
 
           // ALIAS TEMPLATE iter_value_t
 
           template <class _Ty>
 
           using iter_value_t = typename conditional_t<_Is_from_primary<iterator_traits<remove_cvref_t<_Ty>>>,
 
               indirectly_readable_traits<remove_cvref_t<_Ty>>, iterator_traits<remove_cvref_t<_Ty>>>::value_type;

I suspect we should skip this test internally.

stl/inc/algorithm

Co-authored-by: Casey Carter <[email protected]>

CaseyCarter · 2021-03-23T02:37:23Z

I believe this adds comprehensive coverage of the remaining algorithm optimizations for Ranges, and extends our coverage from pointers to contiguous iterators in general. I've therefore linked #1756.

Thanks for this amazing work, @AdamBucior!

StephanTLavavej · 2021-03-23T04:06:00Z

Thanksagainfortheseperformanceimprovementsforcontiguousdata!:joy_cat:

AdamBucior added 2 commits November 7, 2020 20:21

Optimizations for contiguous iterators

75474b6

Projections need to be same as identity

b2d7591

AdamBucior requested a review from a team as a code owner November 7, 2020 20:08

AdamBucior added 2 commits November 7, 2020 21:16

clang-format

30da25d

clang-format please cooperate

b1b6a83

miscco reviewed Nov 7, 2020

View reviewed changes

AdamBucior added 3 commits November 7, 2020 21:46

Wrong iterators

6450aaf

remove_const

e6058d4

Fix find

2952906

StephanTLavavej added the performance Must go faster label Nov 8, 2020

CaseyCarter self-assigned this Nov 11, 2020

AdamBucior added 3 commits November 14, 2020 15:47

Merge branch 'master' into contiguous-iterator-optimizations

cadd6cc

_Memcmp_ranges

876358d

_Memchr_in_find_is_safe

3fbfec8

StephanTLavavej assigned barcharcraz and unassigned CaseyCarter Nov 18, 2020

cpplearner mentioned this pull request Nov 22, 2020

[xutility] Modernize _Equal_memcmp_is_safe to use variable templates #831

Merged

StephanTLavavej self-assigned this Dec 3, 2020

CaseyCarter mentioned this pull request Dec 4, 2020

STL: Finish replacing tag dispatch with if constexpr #189

Open

barcharcraz approved these changes Dec 5, 2020

View reviewed changes

Simplify _Within_limits

094b2b2

cpplearner mentioned this pull request Dec 11, 2020

Convert contiguous_iterators to pointers correctly #1527

Merged

Merge branch 'master' into contiguous-iterator-optimizations

1a1cd0a

Merge branch 'master' into contiguous-iterator-optimizations

3cdd6f8

StephanTLavavej added 3 commits March 20, 2021 16:10

Merge branch 'main' into contiguous-iterator-optimizations

fbc0c56

Move _Find_unchecked down.

a1b8227

Code review feedback.

ccf9f49

StephanTLavavej reviewed Mar 21, 2021

View reviewed changes

StephanTLavavej approved these changes Mar 21, 2021

View reviewed changes

StephanTLavavej removed their assignment Mar 21, 2021

miscco reviewed Mar 21, 2021

View reviewed changes

stl/inc/xutility Show resolved Hide resolved

miscco reviewed Mar 21, 2021

View reviewed changes

StephanTLavavej self-assigned this Mar 22, 2021

CaseyCarter requested changes Mar 23, 2021

View reviewed changes

stl/inc/algorithm Outdated Show resolved Hide resolved

Refine the condition for calling _Memcmp_ranges.

7263e21

Co-authored-by: Casey Carter <[email protected]>

CaseyCarter approved these changes Mar 23, 2021

View reviewed changes

CaseyCarter linked an issue Mar 23, 2021 that may be closed by this pull request

<algorithm>: Enable optimizations from std algorithms in ranges algorithms #1756

Closed

StephanTLavavej merged commit dcba13b into microsoft:main Mar 23, 2021

AdamBucior deleted the contiguous-iterator-optimizations branch March 23, 2021 06:37

AdamBucior mentioned this pull request Apr 7, 2021

Optimizations for unreachable sentinels #1810

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizations for contiguous iterators #1433

Optimizations for contiguous iterators #1433

AdamBucior commented Nov 7, 2020

miscco commented Nov 7, 2020

AdamBucior commented Nov 7, 2020

miscco left a comment

miscco Nov 7, 2020

AdamBucior Nov 7, 2020 •

edited

Loading

miscco Nov 7, 2020

AdamBucior Nov 7, 2020

miscco commented Nov 7, 2020

miscco commented Nov 7, 2020

AdamBucior commented Nov 7, 2020

barcharcraz Dec 5, 2020

AdamBucior Dec 5, 2020

barcharcraz Dec 15, 2020

barcharcraz commented Dec 5, 2020

StephanTLavavej commented Mar 15, 2021

StephanTLavavej commented Mar 21, 2021

miscco Mar 21, 2021

AdamBucior Mar 21, 2021

StephanTLavavej Mar 21, 2021

miscco Mar 21, 2021

AdamBucior Mar 21, 2021

miscco Mar 21, 2021

AdamBucior Mar 21, 2021

miscco Mar 21, 2021

AdamBucior Mar 21, 2021

StephanTLavavej Mar 21, 2021

StephanTLavavej commented Mar 22, 2021

StephanTLavavej commented Mar 22, 2021

CaseyCarter commented Mar 23, 2021

StephanTLavavej commented Mar 23, 2021

Optimizations for contiguous iterators #1433

Optimizations for contiguous iterators #1433

Conversation

AdamBucior commented Nov 7, 2020

miscco commented Nov 7, 2020

AdamBucior commented Nov 7, 2020

miscco left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AdamBucior Nov 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miscco commented Nov 7, 2020

miscco commented Nov 7, 2020

AdamBucior commented Nov 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

barcharcraz commented Dec 5, 2020

StephanTLavavej commented Mar 15, 2021

StephanTLavavej commented Mar 21, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StephanTLavavej commented Mar 22, 2021

StephanTLavavej commented Mar 22, 2021

CaseyCarter commented Mar 23, 2021

StephanTLavavej commented Mar 23, 2021

AdamBucior Nov 7, 2020 •

edited

Loading