Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations for contiguous iterators #1433

Merged

Conversation

AdamBucior
Copy link
Contributor

Changes in this PR:

  • Implemented memchr optimization for bools and bytes in find
  • Implemented memchr optimization in ranges::find
  • Implemented memcmp optimization in ranges::equal and ranges::lexicographical_compare
  • Enabled optimizations for non-unwrappable contiguous iterators
  • Fixed equal not working with non-unwrappable contiguous iterators (it tried to reinterpret_cast them without calling to_address)
  • Fixed ranges::fill trying to perform memset optimization even if sentinel is not sized

@AdamBucior AdamBucior requested a review from a team as a code owner November 7, 2020 20:08
@miscco
Copy link
Contributor

miscco commented Nov 7, 2020

I really like the idea of getting more optimizations in, but could we wait until the rework of the machinery is done. There is a huge amount of merge conflicts flying in

@AdamBucior
Copy link
Contributor Author

I really like the idea of getting more optimizations in, but could we wait until the rework of the machinery is done. There is a huge amount of merge conflicts flying in

That's why I left out copy/move.

Copy link
Contributor

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this will trigger the ominous volatile breaks continuous iterators bug which essentially prohibits optimizations for continuous iterators currently as anything with volatile does not fulfill readable

@@ -418,6 +418,29 @@ namespace ranges {
template <input_iterator _It, sentinel_for<_It> _Se, class _Ty, class _Pj>
requires indirect_binary_predicate<ranges::equal_to, projected<_It, _Pj>, const _Ty*>
_NODISCARD constexpr _It _Find_unchecked(_It _First, const _Se _Last, const _Ty& _Val, _Pj _Proj) {
if constexpr (contiguous_iterator<_It> && sized_sentinel_for<_Se, _It> && same_as<_Pj, identity> &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a new trait _Memchr_in_find_is_save

Copy link
Contributor Author

@AdamBucior AdamBucior Nov 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to make it a new trait because it's not very complex and improves readability a lot. I still don't want to change the _Lex_compare_memcmp_classify stuff in this PR though (but maybe I'll change my mind if I have more time).

stl/inc/algorithm Outdated Show resolved Hide resolved
stl/inc/algorithm Outdated Show resolved Hide resolved
Comment on lines 10172 to 10174
if constexpr (!same_as<_Memcmp_classification_pred,
void> && sized_sentinel_for<_Se1, _It1> && sized_sentinel_for<_Se2, _It2> //
&& same_as<_Pj1, identity> && same_as<_Pj2, identity>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be _Memcmp_in_lexicographical_compare_is_safe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to make too big changes in this PR. The purpose of this PR is just to enable optimizations. You can file a separate issue to track this later.

@miscco
Copy link
Contributor

miscco commented Nov 7, 2020

I really like the idea of getting more optimizations in, but could we wait until the rework of the machinery is done. There is a huge amount of merge conflicts flying in

That's why I left out copy/move.

#872 also applies to all other optimizations.

@miscco
Copy link
Contributor

miscco commented Nov 7, 2020

See DevCom-876860 "conditional operator errors" blocks readable<volatile int*>.

@AdamBucior
Copy link
Contributor Author

I really like the idea of getting more optimizations in, but could we wait until the rework of the machinery is done. There is a huge amount of merge conflicts flying in

That's why I left out copy/move.

#872 also applies to all other optimizations.

But only merge conflict I can see is naming which will be quite easy to resolve.

@StephanTLavavej StephanTLavavej added the performance Must go faster label Nov 8, 2020
@CaseyCarter CaseyCarter self-assigned this Nov 11, 2020
stl/inc/xutility Show resolved Hide resolved
@@ -63,6 +63,14 @@ constexpr void smoke_test() {
int const two_ints[] = {0, 1};
assert(!equal(one_int, two_ints, comp, proj, proj));
}
{
// Validate memcmp case
int arr1[3]{0, 2, 5};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these only validate that the memcmp case is correct not that the optimization actually happened. Any ideas on how to test that it actually happened (I don't have any good ones, maybe a struct with stuff written in it's padding?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a way to test it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I can only think of ways that require doing UB

@barcharcraz
Copy link
Member

we could also use a regression test for the std::fill bug if anyone has a good idea how to write one.

@StephanTLavavej
Copy link
Member

Status update: I haven't forgotten about this PR, and plan to review it as soon as I can find some hours free of distractions and high-priority C++20 tasks. 😺

stl/inc/xutility Outdated Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
stl/inc/algorithm Outdated Show resolved Hide resolved
stl/inc/algorithm Outdated Show resolved Hide resolved
@StephanTLavavej StephanTLavavej removed their assignment Mar 21, 2021
@StephanTLavavej
Copy link
Member

Thanks! I pushed some small changes after verifying a full test pass - let me know if anything looks wrong, otherwise I think this is ready to merge 🚀

FYI @barcharcraz I pushed changes after you approved (long ago). Also FYI @miscco @CaseyCarter this had to be merged with the recent ranges work; I believe that it shouldn't interfere with anything but wanted to point out the interaction.

&& sized_sentinel_for<_Se2, _It2> && same_as<_Pj1, identity> && same_as<_Pj2, identity>) {
if (!_STD is_constant_evaluated()) {
const auto _Num1 = static_cast<size_t>(_Last1 - _First1);
const auto _Num2 = static_cast<size_t>(_Last2 - _First2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: i believe _Num is a bit too generic. Maybe use _Len or _Size instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_Num is already used in the original std::lexicographical_compare and std::lexicographical_compare_three_way (and in lot's of other places) and after all it clearly describes what it is - the number of elements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @miscco's point (_Num\d* is relatively uncommon, and a bit more generic - one could speak of a number of matching elements possibly with gaps, while len/size is more specific), but I also agree with @AdamBucior's point - this is consistent with existing usage and seems to be clear so there is no risk of confusion.

The thing that would tip the scales for me is if we had multiple numbers of stuff involved - then it would be good to give distinct names to each. That isn't the case here, where we just have 1/2 mirroring the ranges named 1/2. Thus I think that the code is fine as-is.

Thanks for bringing this up! Clear naming is indeed important 😸

template <class _Iter, class _Ty>
_INLINE_VAR constexpr bool _Memchr_in_find_is_safe =
_Iterator_is_contiguous<_Iter>&&
disjunction_v<conjunction<is_integral<_Ty>, _Is_character_or_bool<_Iter_value_t<_Iter>>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use _Is_character_or_byte_or_bool here instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use _Is_character_or_byte_or_bool it might allow for finding integers in std::byte ranges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I am wondering what happens with integers of different sizes.

Shouldnt we always have something like is_same<_Iter_value_t<_Iter>, _Ty>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integers of different sizes are handled by the _Within_limits function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to look at it in Detail, but I do not really like that we do essentially the same thing at two different places

Could _Within_limits simply reject mixing byte with integrals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It certainly could but I doubt it would be simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I didn't implement the memchr optimization in find, I added _Within_limits to generalize it, much like how it's being further generalized here, so I can explain its intended design. The intent was for _Within_limits to handle runtime conditions - i.e. when searching for a value in a range would compile, and would be eligible for the memchr optimization, except that the specific value at runtime could never produce a match. (For example, searching for 300u in a range of unsigned char.) It has both performance and correctness consequences (we can skip the entire range when it won't be found, and static_cast<unsigned char>(300u) could produce spurious matches so we must avoid that).

(_Within_limits is a variant of C++20 in_range although I think the new bool behavior makes them different.)

In contrast, the metaprogramming that @AdamBucior is extracting as _Memchr_in_find_is_safe determines when the optimization is applicable at compiletime - e.g. searching for const char* in a range of string is inapplicable. Similarly, searching for byte in a range of unsigned char is inapplicable - that should be a compiler error (and will be, if we let the classic algorithm attempt to use ==).

Attempting to get _Within_limits to reject byte would result in code that compiles and always returns false, instead of code that fails to compile. That would be a problem.

While it appears that _Memchr_in_find_is_safe and _Within_limits have similar logic, they're doing different things that shouldn't be mixed. I think that future simplifications to _Within_limits (now that we know how to implement in_range conveniently, and have if constexpr) would reduce the apparent duplication.

@StephanTLavavej
Copy link
Member

There's a test failure when porting this to the MSVC-internal repo. I haven't investigated/reduced it, but it looks like the increased usage of iter_value_t is causing the problem, and I suspect the test code is at fault.

The internal test is compiler tests\devtest\Concepts\cmcstl2\test\iterator and the first full error is

unreachable.cpp
xutility(443): error C2794: 'value_type': is not a member of any direct or indirect base class of 'std::indirectly_readable_traits<std::experimental::ranges::v1::common_iterator<const char *,std::experimental::ranges::v1::unreachable>>'
xutility(1183): note: see reference to alias template instantiation 'std::iter_value_t<C>' being compiled
xutility(5140): note: see reference to alias template instantiation 'std::_Iter_value_t<C>' being compiled
xutility(5186): note: see reference to variable template 'const bool _Memchr_in_find_is_safe<std::experimental::ranges::v1::common_iterator<char const *,std::experimental::ranges::v1::unreachable>,char>' being compiled
xutility(5192): note: see reference to function template instantiation '_InIt std::_Find_unchecked<_InIt,_Ty>(const _InIt,const _InIt,const _Ty &)' being compiled
        with
        [
            _InIt=C,
            _Ty=char
        ]
unreachable.cpp(20): note: see reference to function template instantiation '_InIt std::find<C,char>(_InIt,const _InIt,const _Ty &)' being compiled
        with
        [
            _InIt=C,
            _Ty=char
        ]

(I believe this is https://github.com/CaseyCarter/cmcstl2/blob/master/test/iterator/unreachable.cpp )

@StephanTLavavej
Copy link
Member

It appears that this is a cmcstl2 bug/deficiency. AFAICT, std::experimental::ranges::v1::common_iterator doesn't have a value_type - instead this is provided by its (not Standard) readable_traits<common_iterator<I, S>> specialization:
https://github.com/CaseyCarter/cmcstl2/blob/684a96d527e4dc733897255c0177b784dc280980/include/stl2/detail/iterator/common_iterator.hpp#L271-L274

The Standard's indirectly_readable_traits is looking for a nested value_type or element_type and not finding any:

STL/stl/inc/xutility

Lines 421 to 437 in ef6c1ce

template <_Has_member_value_type _Ty>
struct indirectly_readable_traits<_Ty> : _Cond_value_type<typename _Ty::value_type> {};
template <_Has_member_element_type _Ty>
struct indirectly_readable_traits<_Ty> : _Cond_value_type<typename _Ty::element_type> {};
// clang-format off
template <_Has_member_value_type _Ty>
requires _Has_member_element_type<_Ty>
&& same_as<remove_cv_t<typename _Ty::value_type>, remove_cv_t<typename _Ty::element_type>>
struct indirectly_readable_traits<_Ty> : _Cond_value_type<typename _Ty::value_type> {};
// clang-format on
// ALIAS TEMPLATE iter_value_t
template <class _Ty>
using iter_value_t = typename conditional_t<_Is_from_primary<iterator_traits<remove_cvref_t<_Ty>>>,
indirectly_readable_traits<remove_cvref_t<_Ty>>, iterator_traits<remove_cvref_t<_Ty>>>::value_type;

I suspect we should skip this test internally.

@StephanTLavavej StephanTLavavej self-assigned this Mar 22, 2021
stl/inc/algorithm Outdated Show resolved Hide resolved
@CaseyCarter
Copy link
Member

I believe this adds comprehensive coverage of the remaining algorithm optimizations for Ranges, and extends our coverage from pointers to contiguous iterators in general. I've therefore linked #1756.

Thanks for this amazing work, @AdamBucior!

@StephanTLavavej StephanTLavavej merged commit dcba13b into microsoft:main Mar 23, 2021
@StephanTLavavej
Copy link
Member

Thanksagainfortheseperformanceimprovementsforcontiguousdata!:joy_cat:

@AdamBucior AdamBucior deleted the contiguous-iterator-optimizations branch March 23, 2021 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

<algorithm>: Enable optimizations from std algorithms in ranges algorithms
5 participants