<format>: Compile-time string size estimation #2437

AdamBucior · 2021-12-19T19:26:06Z

Improves string size estimation in format by using _Basic_format_string's parsing machinery. It takes into account things like argument use count and the specified width and precision. Also allows formatted_size to completely skip the formatting part for simple cases like concatenation of a bunch of strings.

This change might be ABI breaking so hopefully it can make it before <format>'s ABI is freezed.

stl/inc/format

tests/std/tests/P0645R10_text_formatting_formatting/test.cpp

barcharcraz

So aside from possible bugs with user defined formatters that don't call base::parse() this looks functionally correct. However, it's quite a lot of code. We are willing to have gnarly code in service of runtime performance, but benchmarks would be nice. STL points out that benchmarks would also indicate how much effort we should spend on this, for example doing exact estimations for usages of precision with numeric arguments.

I'm curious how much faster this is than just doing the clang fallback path 100% of the time and dropping the exact estimation, it introduces some more capacity checks from string::insert but it still could save allocations, which is probably >>> more expensive than the capacity checks. It might even be faster than no estimation to use formatted_size sometimes and get an exact width estimation to reduce allocations (even though that means doing the formatting work twice).

Also tests for user defined formatters in general would be good. I think it's reasonable to just not support user defined formatters at all, using a mechanism like _From_primary (used by allocator) to detect if derived classes were ours's or the user's.

barcharcraz · 2022-01-28T00:13:59Z

stl/inc/format

@@ -2761,27 +2794,89 @@ consteval typename _ParseContext::iterator _Compile_time_parse_format_specs(_Par
    using _FormattedType = conditional_t<is_same_v<_FormattedTypeMapping, typename basic_format_arg<_Context>::handle>,
        _Ty, _FormattedTypeMapping>;
    formatter<_FormattedType, _CharT> _Formatter{};
-    return _Formatter.parse(_Pc);
+    auto _Iter = _Formatter.parse(_Pc);
+    if constexpr (_Derived_from_formatter_base<formatter<_FormattedType, _CharT>>) {


User defined formatters that derive from built-in ones don't nessassarly actually populate their format specs, they are not obligated to call their base parse method (though, this is somewhat of an odd formatter)

If user defined formatter never calls parse then _Specs will be in default constructed state, which is what this function returns for formatters that do not derive from _Formatter_base anyway. If the user derives from standard formatter and calls parse then _Specs can help in estimation. This is useful for users' enum formatters which often derive from basic_string_view formatter to have width and precision handled automatically.

What if they call parse but don't actually output the implied number of characters in format?

After realizing my mistake with the flow through _On_format_specs I think things are OK even if they lie

stl/inc/format

barcharcraz · 2022-01-28T00:26:43Z

stl/inc/format

+                        _Is_estimation_exact = false;
+                    } else if (_Specs._Dynamic_precision_index >= 0) {
+                        // if precision is dynamic we can't really predict so let's estimate it to 32
+                        _Estimated_size += 32;


This seems open to tweaking against benchmarks. It could be reasonable to calculate the width in this case too.

I also think so, but see #1803 (comment).

barcharcraz · 2022-01-28T00:27:48Z

stl/inc/format

+                } else {
+                    // if the length of the string is known we will add it to estimation
+                    ++_Arg_use_count[_Id];
+                    if (_Specs._Precision >= 0 || _Specs._Width > 0 || _Specs._Dynamic_precision_index >= 0


we can lift this up to apply to all strings right?, simplifying the branches above

If we decide to always strlen for null-terminated strings then yes.

barcharcraz · 2022-01-28T00:31:40Z

stl/inc/format

    constexpr const _CharT* _On_format_specs(const size_t _Id, const _CharT* _First, const _CharT*) {
        _Parse_context.advance_to(_Parse_context.begin() + (_First - _Parse_context.begin()._Unwrapped()));
        if (_Id < _Num_args) {
-            auto _Iter = _Parse_funcs[_Id](_Parse_context); // TRANSITION, VSO-1451773 (workaround: named variable)
+            auto [_Iter, _Specs] = _Parse_funcs[_Id](_Parse_context);


So if a user-defined formatter does not derive from _Formatter_base, or does but never calls _Formatter_base::parse then _Specs is default-initialized and width is zero.

Thus the code below will lie about the estimation being exact.

For non-standard string types _Is_estimation_exact is always false. Users can't specialize formatter for standard string types.

yeah, I think you're right and I misread the code, the fact that the dynamic indices are initialized to -1 makes the code set things to -1.

nope, I'm an idiot and the values don't actually matter, as it's never a built-in string type.

stl/inc/format

barcharcraz · 2022-01-28T00:46:54Z

stl/inc/format

+                return _Arg_use_count[_Id] * _Arg.size();
+            } else if constexpr (_Is_nullterminated_string<_ArgTy>) {
+                // don't bother with calculating the length if we don't need to
+                return _Arg_use_count[_Id] > 0 ? _Arg_use_count[_Id] * char_traits<_CharT>::length(_Arg) : 0;


this might be slower than just doing the formatting and counting. But maybe it's still worth it.

barcharcraz · 2022-01-28T00:47:30Z

stl/inc/format

+                // don't bother with calculating the length if we don't need to
+                return _Arg_use_count[_Id] > 0 ? _Arg_use_count[_Id] * char_traits<_CharT>::length(_Arg) : 0;
+            } else {
+                return 0;


Why are we estimating zero for non-strings?

For non-strings estimation has been done before:

STL/stl/inc/format

Lines 2876 to 2877 in 3d74470

// for all other arguments use the largest of precision, width and 8

_Estimated_size += (_STD max)((_STD max)(_Specs._Precision, _Specs._Width), 8);

OK, it's still somewhat confusing to split the estimation up like this, but I can see why it's done.

barcharcraz · 2022-01-28T00:49:50Z

stl/inc/format

@@ -3074,22 +3209,74 @@ _NODISCARD wstring vformat(const locale& _Loc, const wstring_view _Fmt, const wf

 template <class... _Types>
 _NODISCARD string format(const _Fmt_string<_Types...> _Fmt, _Types&&... _Args) {
-    return _STD vformat(_Fmt._Str, _STD make_format_args(_Args...));
+    const size_t _Estimated_size = _Fmt._Estimate_required_capacity(_Args...);


This would be better with a helper _Vformat that takes the estimated capacity.

Once that's done maybe this stuff should be done through _Fmt_iterator_buffer

I can't understand what you mean. How exactly should this be restructured?

stl/inc/format

tests/std/tests/P0645R10_text_formatting_formatting/test.cpp

stl/inc/format

miscco · 2022-02-02T09:10:04Z

Thanks, that is much better

barcharcraz · 2022-02-03T01:12:35Z

We talked about this PR in our weekly meeting, and I think the next steps are to write a small benchmark and test out a simpler version with only the fallback path. We consider this too complex to merge without a benchmark showing very clear improvement.

AdamBucior · 2022-02-04T09:05:12Z

With this simple benchmark:

#include <string>
#include <format>
#include <chrono>
#include <iostream>

using namespace std;
using namespace chrono;

constexpr size_t Iterations = 10000000;

int main() {
    steady_clock::time_point start, end;
    start = steady_clock::now();

    for (size_t i = 0; i < Iterations; ++i) {
        (void) format("{} {} {} {} {} {}", "let's", "concatenate", "a", "bunch", "of", "strings");
    }

    end = steady_clock::now();

    cout << end - start << endl;

    start = steady_clock::now();

    for (size_t i = 0; i < Iterations; ++i) {
        (void) format("{0} {0} {0}", "repeat this sentence three times");
    }

    end = steady_clock::now();

    cout << end - start << endl;

    system("pause");
}

I got these results (ran for 4 times each):

With PR:

4473119900ns
5105820100ns

4532696100ns
5093983400ns

4463521400ns
5120882300ns

4421421200ns
5108412600ns

Average:
4472689650ns
5107274600ns

Without PR:

4750032000ns
5193818700ns

4790922400ns
5249655700ns

5018952300ns
5281830700ns

4815681900ns
5280669900ns

Average:
4843897150ns ~8% more time than average with PR
5251493750ns ~3% more time than average with PR

So with simple strings there is some improvement. Obviously the result will differ depending on arguments used (of which there is a wide variety of possibilities). I don't think there is really a case in which this PR would perform worse.

barcharcraz · 2022-02-08T04:53:33Z

What do you get running that with clang-cl instead? (to do just the fallback path)

…estimation

AdamBucior · 2022-02-08T09:41:15Z

With clang-cl:

With PR:

3826073500ns
4233170700ns

3630934800ns
4309987800ns

3822146900ns
4210166300ns

3612824000ns
4434587900ns

Average:

3722994800ns
4296978175ns

Without PR:

3760814900ns
4333771500ns

3692610400ns
4379408800ns

3719316200ns
4294400100ns

3744752400ns
4294775400ns

Average:

3729373475ns ~0.1% more time than average with PR (no real difference)
4325588950ns ~0.6% more time than average with PR (no real difference)

Unsurprisingly no real difference. The fallback is basically the same thing that we are already doing.

StephanTLavavej · 2022-02-09T22:25:20Z

@barcharcraz says that there should be no inherent ABI impact to these changes - at most, we may need to rename an _Ugly internal type to _Ugly2 (as we've done repeatedly in shared_ptr's control blocks), which prevents any mix-and-match problems. Charlie would also prefer not to backport this to VS 2019 16.11.x, so we should ensure that before-and-after code can be mixed harmlessly, even if these changes get into VS 2022 17.2 Preview 3 (before the 17.2 General Availability ABI lockdown).

Charlie plans to investigate this with more benchmarking.

barcharcraz · 2022-04-01T00:22:11Z

I benchmarked this using gbench (it did around 3,000,000 iterations for each test) and found an extremely small perf difference even with the full caching (I verified I was using the correct version by adding a define in format). It seems like this pr does make things a new nanoseconds faster but not always.

In general doing three benchmark iterations on a benchmark that is so fast won't give reliable results, and this is true even if the CPU is literally executing nothing else (i.e. the benchmark is running in kernel-mode, all alone). Here are my results

Has cache: 0
2022-03-31T17:16:03-07:00
Running C:\Users\chbarto\source\repos\scratch\format_caching_bench\build\bench.exe
Run on (12 X 3192 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 12288 KiB (x1)
-------------------------------------------------------
Benchmark             Time             CPU   Iterations
-------------------------------------------------------
bench_concat        225 ns          225 ns      2986667
bench_insert        349 ns          353 ns      2036364

Has cache: 1
2022-03-31T17:16:50-07:00
Running C:\Users\chbarto\source\repos\scratch\format_caching_bench\build\bench.exe
Run on (12 X 3192 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 12288 KiB (x1)
-------------------------------------------------------
Benchmark             Time             CPU   Iterations
-------------------------------------------------------
bench_concat        227 ns          225 ns      2986667
bench_insert        328 ns          330 ns      2133333

code:

#include <benchmark/benchmark.h>
#include <stdio.h>
#include <format>
#ifndef _STL_FORMAT_HAS_CACHE
#define _STL_FORMAT_HAS_CACHE 0
#endif
using benchmark::State;
static void bench_concat(State &s)
{
    for (auto _ : s)
    {
        (void)std::format("{} {} {} {} {} {}", "let's", "concatenate", "a", "bunch", "of", "strings");
    }
}

static void bench_insert(State &s)
{
    for (auto _ : s)
    {
        (void)std::format("{0} {0} {0}", "repeat this sentence three times");
    }
}
BENCHMARK(bench_concat);
BENCHMARK(bench_insert);
int main(int argc, char **argv)
{
    printf("Has cache: %d\n", _STL_FORMAT_HAS_CACHE);
    ::benchmark::Initialize(&argc, argv);
    if (::benchmark::ReportUnrecognizedArguments(argc, argv))
        return 1;
    ::benchmark::RunSpecifiedBenchmarks();
    ::benchmark::Shutdown();
    return 0;
}

barcharcraz · 2022-04-01T00:27:42Z

and here's with just the fallback caching

Has cache: 2
2022-03-31T17:26:47-07:00
Running C:\Users\chbarto\source\repos\scratch\format_caching_bench\build\bench.exe
Run on (12 X 3192 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 12288 KiB (x1)
-------------------------------------------------------
Benchmark             Time             CPU   Iterations
-------------------------------------------------------
bench_concat        243 ns          246 ns      2800000
bench_insert        328 ns          329 ns      1947826

barcharcraz · 2022-04-01T00:30:07Z

oh, you did run this 10 million times, my bad

StephanTLavavej · 2022-04-06T21:13:25Z

Thanks @barcharcraz for the additional benchmark results.

@AdamBucior, thanks for investigating this potential optimization. After looking at the perf numbers from your benchmark and Charlie's, we talked about this at the weekly maintainer meeting and have decided that the complexity of this optimization outweighs the minimal potential for performance improvement here.

Implement compile-time string size estimation

0b9e3e4

AdamBucior requested a review from a team as a code owner December 19, 2021 19:26

Workaround P1502R1_standard_library_header_units

c19f8d8

AlexGuteniev approved these changes Dec 19, 2021

View reviewed changes

stl/inc/format Show resolved Hide resolved

Simplify _Compile_time_parse_format_specs

d5a8c69

CaseyCarter added the performance Must go faster label Dec 29, 2021

StephanTLavavej added the format C++20/23 format label Jan 12, 2022

StephanTLavavej assigned barcharcraz and StephanTLavavej Jan 19, 2022

AdamBucior added 2 commits January 26, 2022 21:38

Merge branch 'main' into format-string-size-estimation

a8c6712

forward and remove TRANSITION

3d74470

AlexGuteniev reviewed Jan 27, 2022

View reviewed changes

tests/std/tests/P0645R10_text_formatting_formatting/test.cpp Outdated Show resolved Hide resolved

barcharcraz suggested changes Jan 28, 2022

View reviewed changes

StephanTLavavej reviewed Jan 28, 2022

View reviewed changes

AdamBucior added 3 commits January 28, 2022 11:36

Code review

a67c037

Merge branch 'main' into format-string-size-estimation

8cf1e49

Update P0645R10_text_formatting_formatting

462a5a7

miscco reviewed Feb 1, 2022

View reviewed changes

stl/inc/format Outdated Show resolved Hide resolved

stl/inc/format Outdated Show resolved Hide resolved

stl/inc/format Outdated Show resolved Hide resolved

stl/inc/format Outdated Show resolved Hide resolved

stl/inc/format Show resolved Hide resolved

Code review

8b62176

Merge remote-tracking branch 'upstream/main' into format-string-size-…

6f64abb

…estimation

AlexGuteniev mentioned this pull request Feb 9, 2022

Q1 2022 priorities #2492

Closed

StephanTLavavej removed their assignment Feb 16, 2022

StephanTLavavej closed this Apr 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

<format>: Compile-time string size estimation #2437

<format>: Compile-time string size estimation #2437

AdamBucior commented Dec 19, 2021

barcharcraz left a comment •

edited

Loading

barcharcraz Jan 28, 2022

AdamBucior Jan 28, 2022

barcharcraz Feb 1, 2022

barcharcraz Feb 1, 2022

barcharcraz Jan 28, 2022

AdamBucior Jan 28, 2022

barcharcraz Jan 28, 2022

AdamBucior Jan 28, 2022

barcharcraz Jan 28, 2022

AdamBucior Jan 28, 2022

barcharcraz Feb 1, 2022

barcharcraz Feb 1, 2022

barcharcraz Jan 28, 2022

barcharcraz Jan 28, 2022

AdamBucior Jan 28, 2022

barcharcraz Feb 1, 2022

barcharcraz Jan 28, 2022

barcharcraz Jan 28, 2022

AdamBucior Feb 1, 2022

miscco commented Feb 2, 2022

barcharcraz commented Feb 3, 2022

AdamBucior commented Feb 4, 2022

barcharcraz commented Feb 8, 2022

AdamBucior commented Feb 8, 2022

StephanTLavavej commented Feb 9, 2022

barcharcraz commented Apr 1, 2022 •

edited

Loading

barcharcraz commented Apr 1, 2022

barcharcraz commented Apr 1, 2022

StephanTLavavej commented Apr 6, 2022

	// for all other arguments use the largest of precision, width and 8
	_Estimated_size += (_STD max)((_STD max)(_Specs._Precision, _Specs._Width), 8);

<format>: Compile-time string size estimation #2437

<format>: Compile-time string size estimation #2437

Conversation

AdamBucior commented Dec 19, 2021

barcharcraz left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miscco commented Feb 2, 2022

barcharcraz commented Feb 3, 2022

AdamBucior commented Feb 4, 2022

barcharcraz commented Feb 8, 2022

AdamBucior commented Feb 8, 2022

StephanTLavavej commented Feb 9, 2022

barcharcraz commented Apr 1, 2022 • edited Loading

barcharcraz commented Apr 1, 2022

barcharcraz commented Apr 1, 2022

StephanTLavavej commented Apr 6, 2022

barcharcraz left a comment •

edited

Loading

barcharcraz commented Apr 1, 2022 •

edited

Loading