-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<format>: Compile-time string size estimation #2437
<format>: Compile-time string size estimation #2437
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So aside from possible bugs with user defined formatters that don't call base::parse() this looks functionally correct. However, it's quite a lot of code. We are willing to have gnarly code in service of runtime performance, but benchmarks would be nice. STL points out that benchmarks would also indicate how much effort we should spend on this, for example doing exact estimations for usages of precision with numeric arguments.
I'm curious how much faster this is than just doing the clang fallback path 100% of the time and dropping the exact estimation, it introduces some more capacity checks from string::insert but it still could save allocations, which is probably >>> more expensive than the capacity checks. It might even be faster than no estimation to use formatted_size
sometimes and get an exact width estimation to reduce allocations (even though that means doing the formatting work twice).
Also tests for user defined formatters in general would be good. I think it's reasonable to just not support user defined formatters at all, using a mechanism like _From_primary
(used by allocator) to detect if derived classes were ours's or the user's.
@@ -2761,27 +2794,89 @@ consteval typename _ParseContext::iterator _Compile_time_parse_format_specs(_Par | |||
using _FormattedType = conditional_t<is_same_v<_FormattedTypeMapping, typename basic_format_arg<_Context>::handle>, | |||
_Ty, _FormattedTypeMapping>; | |||
formatter<_FormattedType, _CharT> _Formatter{}; | |||
return _Formatter.parse(_Pc); | |||
auto _Iter = _Formatter.parse(_Pc); | |||
if constexpr (_Derived_from_formatter_base<formatter<_FormattedType, _CharT>>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User defined formatters that derive from built-in ones don't nessassarly actually populate their format specs, they are not obligated to call their base parse method (though, this is somewhat of an odd formatter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If user defined formatter never calls parse
then _Specs
will be in default constructed state, which is what this function returns for formatters that do not derive from _Formatter_base
anyway. If the user derives from standard formatter and calls parse
then _Specs
can help in estimation. This is useful for users' enum formatters which often derive from basic_string_view
formatter to have width and precision handled automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if they call parse
but don't actually output the implied number of characters in format
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After realizing my mistake with the flow through _On_format_specs
I think things are OK even if they lie
stl/inc/format
Outdated
_Is_estimation_exact = false; | ||
} else if (_Specs._Dynamic_precision_index >= 0) { | ||
// if precision is dynamic we can't really predict so let's estimate it to 32 | ||
_Estimated_size += 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems open to tweaking against benchmarks. It could be reasonable to calculate the width in this case too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think so, but see #1803 (comment).
stl/inc/format
Outdated
} else { | ||
// if the length of the string is known we will add it to estimation | ||
++_Arg_use_count[_Id]; | ||
if (_Specs._Precision >= 0 || _Specs._Width > 0 || _Specs._Dynamic_precision_index >= 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can lift this up to apply to all strings right?, simplifying the branches above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to always strlen
for null-terminated strings then yes.
stl/inc/format
Outdated
constexpr const _CharT* _On_format_specs(const size_t _Id, const _CharT* _First, const _CharT*) { | ||
_Parse_context.advance_to(_Parse_context.begin() + (_First - _Parse_context.begin()._Unwrapped())); | ||
if (_Id < _Num_args) { | ||
auto _Iter = _Parse_funcs[_Id](_Parse_context); // TRANSITION, VSO-1451773 (workaround: named variable) | ||
auto [_Iter, _Specs] = _Parse_funcs[_Id](_Parse_context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if a user-defined formatter does not derive from _Formatter_base
, or does but never calls _Formatter_base::parse
then _Specs
is default-initialized and width is zero.
Thus the code below will lie about the estimation being exact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For non-standard string types _Is_estimation_exact
is always false. Users can't specialize formatter
for standard string types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think you're right and I misread the code, the fact that the dynamic indices are initialized to -1 makes the code set things to -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, I'm an idiot and the values don't actually matter, as it's never a built-in string type.
return _Arg_use_count[_Id] * _Arg.size(); | ||
} else if constexpr (_Is_nullterminated_string<_ArgTy>) { | ||
// don't bother with calculating the length if we don't need to | ||
return _Arg_use_count[_Id] > 0 ? _Arg_use_count[_Id] * char_traits<_CharT>::length(_Arg) : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be slower than just doing the formatting and counting. But maybe it's still worth it.
// don't bother with calculating the length if we don't need to | ||
return _Arg_use_count[_Id] > 0 ? _Arg_use_count[_Id] * char_traits<_CharT>::length(_Arg) : 0; | ||
} else { | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we estimating zero for non-strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For non-strings estimation has been done before:
Lines 2876 to 2877 in 3d74470
// for all other arguments use the largest of precision, width and 8 | |
_Estimated_size += (_STD max)((_STD max)(_Specs._Precision, _Specs._Width), 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it's still somewhat confusing to split the estimation up like this, but I can see why it's done.
@@ -3074,22 +3209,74 @@ _NODISCARD wstring vformat(const locale& _Loc, const wstring_view _Fmt, const wf | |||
|
|||
template <class... _Types> | |||
_NODISCARD string format(const _Fmt_string<_Types...> _Fmt, _Types&&... _Args) { | |||
return _STD vformat(_Fmt._Str, _STD make_format_args(_Args...)); | |||
const size_t _Estimated_size = _Fmt._Estimate_required_capacity(_Args...); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be better with a helper _Vformat that takes the estimated capacity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once that's done maybe this stuff should be done through _Fmt_iterator_buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't understand what you mean. How exactly should this be restructured?
Thanks, that is much better |
We talked about this PR in our weekly meeting, and I think the next steps are to write a small benchmark and test out a simpler version with only the fallback path. We consider this too complex to merge without a benchmark showing very clear improvement. |
With this simple benchmark:
I got these results (ran for 4 times each):
So with simple strings there is some improvement. Obviously the result will differ depending on arguments used (of which there is a wide variety of possibilities). I don't think there is really a case in which this PR would perform worse. |
What do you get running that with clang-cl instead? (to do just the fallback path) |
With clang-cl:
Unsurprisingly no real difference. The fallback is basically the same thing that we are already doing. |
@barcharcraz says that there should be no inherent ABI impact to these changes - at most, we may need to rename an Charlie plans to investigate this with more benchmarking. |
I benchmarked this using gbench (it did around 3,000,000 iterations for each test) and found an extremely small perf difference even with the full caching (I verified I was using the correct version by adding a define in format). It seems like this pr does make things a new nanoseconds faster but not always. In general doing three benchmark iterations on a benchmark that is so fast won't give reliable results, and this is true even if the CPU is literally executing nothing else (i.e. the benchmark is running in kernel-mode, all alone). Here are my results
code: #include <benchmark/benchmark.h>
#include <stdio.h>
#include <format>
#ifndef _STL_FORMAT_HAS_CACHE
#define _STL_FORMAT_HAS_CACHE 0
#endif
using benchmark::State;
static void bench_concat(State &s)
{
for (auto _ : s)
{
(void)std::format("{} {} {} {} {} {}", "let's", "concatenate", "a", "bunch", "of", "strings");
}
}
static void bench_insert(State &s)
{
for (auto _ : s)
{
(void)std::format("{0} {0} {0}", "repeat this sentence three times");
}
}
BENCHMARK(bench_concat);
BENCHMARK(bench_insert);
int main(int argc, char **argv)
{
printf("Has cache: %d\n", _STL_FORMAT_HAS_CACHE);
::benchmark::Initialize(&argc, argv);
if (::benchmark::ReportUnrecognizedArguments(argc, argv))
return 1;
::benchmark::RunSpecifiedBenchmarks();
::benchmark::Shutdown();
return 0;
} |
and here's with just the fallback caching
|
oh, you did run this 10 million times, my bad |
Thanks @barcharcraz for the additional benchmark results. @AdamBucior, thanks for investigating this potential optimization. After looking at the perf numbers from your benchmark and Charlie's, we talked about this at the weekly maintainer meeting and have decided that the complexity of this optimization outweighs the minimal potential for performance improvement here. |
Improves string size estimation in
format
by using_Basic_format_string
's parsing machinery. It takes into account things like argument use count and the specified width and precision. Also allowsformatted_size
to completely skip the formatting part for simple cases like concatenation of a bunch of strings.This change might be ABI breaking so hopefully it can make it before
<format>
's ABI is freezed.