Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logic cleanups for basic_string #3862

Merged
merged 26 commits into from
Aug 11, 2023

Conversation

achabense
Copy link
Contributor

@achabense achabense commented Jul 11, 2023

This is a reorganized version of #3830.

  • Various logic cleanups to make basic_string easier to maintain.
  • Fixes a bug in operator=(const&) which will break invariant.

Thanks to @StephanTLavavej, @fsb4000 and @frederick-vs-ja for review in the original pr!

@achabense achabense requested a review from a team as a code owner July 11, 2023 13:53
stl/inc/xstring Outdated Show resolved Hide resolved
stl/inc/xstring Outdated Show resolved Hide resolved
stl/inc/xstring Outdated
const _Elem* const _Right_ptr = _Right._Mypair._Myval2._Myptr();
if (_Entails_large_string(_Right_size)) {
const auto _New_capacity =
_Calculate_growth(_Right_size, _SMALL_STRING_CAPACITY, _Right.max_size());
Copy link
Contributor Author

@achabense achabense Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[won't be included in this pr]
If this fix is adopted, the second parameter of _Calculate_growth(x,y,z) will be uniformly _SMALL_STRING_CAPACITY and further rectification will be possible.
There is also a strange use of _Calculate_growth(newsize) in _Construct_from_iter. According to impl logic it is performing capacity*=1.5 logic. What about replacing the grow logic here with _Reallocate_grow_by or push_back?

const size_type _New_capacity = _Calculate_growth(_My_data._Mysize);

// invariant: _Myres >= _Mysize, and _Myres >= _SMALL_STRING_CAPACITY (after string's construction)
// both _Mysize and _Myres doesn't take account of the extra null terminator
size_type _Mysize = 0; // current length of string (size)
size_type _Myres = 0; // current storage reserved for string (capacity)
Copy link
Contributor Author

@achabense achabense Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[won't be included in this pr]
suggestion: we can make _Myres initialized to _SMALL_STRING_CAPACITY to make _Myres >= _SMALL_STRING_CAPACITY throughout the lifetime of _String_val.

stl/inc/xstring Outdated Show resolved Hide resolved
stl/inc/xstring Outdated Show resolved Hide resolved
stl/inc/xstring Outdated Show resolved Hide resolved
@StephanTLavavej StephanTLavavej self-assigned this Jul 12, 2023
@StephanTLavavej

This comment was marked as resolved.

@StephanTLavavej StephanTLavavej removed their assignment Jul 19, 2023
@fsb4000

This comment was marked as resolved.

@StephanTLavavej

This comment was marked as resolved.

@fsb4000

This comment was marked as resolved.

stl/inc/xstring Outdated Show resolved Hide resolved
stl/inc/xstring Outdated Show resolved Hide resolved
@fsb4000

This comment was marked as resolved.

@StephanTLavavej

This comment was marked as resolved.

@fsb4000

This comment was marked as resolved.

@StephanTLavavej StephanTLavavej self-assigned this Jul 22, 2023
@StephanTLavavej StephanTLavavej force-pushed the _String_logic_cleanups branch from 1d7d1f3 to 9b0e590 Compare July 28, 2023 01:51
@StephanTLavavej StephanTLavavej removed their assignment Jul 28, 2023
@StephanTLavavej
Copy link
Member

As a special exception to our usual way of working, I've force-pushed the branch to rebase it on main and linearize the history. I verified (with git diff) that this results in the exact same state. I fused commits (e.g. _NODISCARD) that were logically grouped together, ironed out commits that were purely addressing earlier mistakes, and decomposed a couple of commits into finer-grained ones (e.g. auto, _CONSTEXPR20).

THANK YOU for the fine-grained commits, they were an enormous help in reviewing. After linearization, I was able to see the correctness of each transformation and I have high confidence in the resulting state. Overall I think that this is a great cleanup and makes the complicated basic_string code more robust for future maintenance - particularly avoiding off-by-one errors around the SSO, and extracting the allocate_at_least and start-element-lifetimes complexity that has recently accreted.

Finally, I added commits to address the minor issues I found while reviewing:

  • Serially run the tests in Dev09_181509_tr1_inf_loop_uniform_int_ull.
    • This should avoid the mysterious compiler back-end ICE when cross-compiling from x64 to ARM. This test originally (6+ years ago) tested so many cases, it was reasonable to run them in parallel. Those cases got chopped away over the years (last major change was in 2017), and the resulting test runs in ~4 seconds serially. Because we test so many configurations in parallel, there is no advantage to making this test run parallel inside itself, so I've converted it to the C++14 matrix, dropped the use of parallel for_each(), and removed the /clr workaround (for crashes with parallel algorithms).
  • Fix comment grammar.
  • In _Deallocate_for_capacity, pass _Old_ptr by value.
    • We (and the Standard) conventionally pass potentially-fancy pointers by value; they are expected to be cheap to copy.
  • Drop redundant public: after the removal of _Copy_assign_val_from_small.
  • Add test_string_copy_assign_pocca_sso().
    • I verified that this fails before your fix, and passes afterwards. The existing coverage didn't exercise the SSO transition enough. It was simplest to just add a new case rather than trying to alter the existing coverage, and this test conveniently had the POCCA logic ready to go.
  • Restore shrink_to_fit to allocate exactly.
    • We should behave like vector and avoid the allocate_at_least optimization here. Let's avoid surprising users; they already worry about shrink_to_fit's non-binding nature.

stl/inc/xstring Outdated Show resolved Hide resolved
stl/inc/xstring Outdated
_Become_small();
return;
}

const size_type _Target_capacity = (_STD min)(_My_data._Mysize | _ALLOC_MASK, max_size());
size_type _Target_capacity = (_STD min)(_My_data._Mysize | _ALLOC_MASK, max_size());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(no changes requested) As _Exactly mode doesn't really require input capacity to be a referrence, I think we can restore constness for _Target_capacity in the future. Also here in <vector>:

STL/stl/inc/vector

Lines 1705 to 1706 in 0fe653a

size_type _Newcapacity = static_cast<size_type>(_Oldlast - _Oldfirst);
_Reallocate<_Reallocation_policy::_Exactly>(_Newcapacity);

Copy link
Member

@CaseyCarter CaseyCarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll verify and push a change to address my comments, which are mostly naming nits.

stl/inc/xstring Outdated Show resolved Hide resolved
stl/inc/xstring Outdated Show resolved Hide resolved
stl/inc/xstring Outdated Show resolved Hide resolved
stl/inc/xstring Outdated Show resolved Hide resolved
stl/inc/xstring Outdated Show resolved Hide resolved
stl/inc/xstring Outdated Show resolved Hide resolved
stl/inc/xstring Outdated Show resolved Hide resolved
@CaseyCarter CaseyCarter removed their assignment Aug 9, 2023
Rename `_ALLOC_MASK`, `_SMALL_STRING_CAPACITY` and `_LEAST_ALLOCATION_SIZE` to avoid the macro identifier namespace. Comment that `_BUF_SIZE` is used by the debugger visualizer (and hence shouldn't be renamed).
... to avoid confusion about size vs. capacity since `_Large_string_engaged()` can be `true` for a very small string with a large capacity.
With the addition of `_SMALL_STRING_CAPACITY`, I think the expressions themselves (e.g., `_Count > _SMALL_STRING_CAPACITY`) are more readable than calls to the function.
It's more clearly the opposite of "large mode".
... to the more descriptive `_Actual_allocation_size`.
Grepping is quick, and this is very likely to bitrot.
@CaseyCarter CaseyCarter force-pushed the _String_logic_cleanups branch from 231682b to e2ac4ce Compare August 9, 2023 20:35
@StephanTLavavej StephanTLavavej self-assigned this Aug 10, 2023
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 29fc1c9 into microsoft:main Aug 11, 2023
@StephanTLavavej
Copy link
Member

Thanks again for this maintainability overhaul and runtime correctness fix! 😻 🪄 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants