microbenchmarks for double<->string conversion, serialisation improvements #447

Crzyrndm · 2020-03-01T11:05:02Z

This is a continuation of some of the work I did for #421 / #422 a few months ago

Adds a microbenchmark project (enabled by passing -DBENCHMARKS=ON -DXLNT_MICROBENCH_ENABLED=ON to cmake). It uses google banchmark. Currently used to compare a variety of ways of (de)serialising numbers.

gbenchmark dependency aquired using cmake FetchContent module (requires cmake 3.11 as indicated by the script). find_package is nearly useless on windows, so it was the most hassle free method for a "developer only" tool I could find. I'm open to suggestions on acquisition method

Note: the std::<to/from>_chars and _comma benchmarks are currently ifdef'd behind _MSC_VER. This is because

to_chars/from_chars only exists in full in MSVC (to my knowledge)
the locale names are different between OS's and the _comma benchmarks use the german locale "de-DE" which uses the ',' as the decimal seperator to benchmark the alternate path in the strtod/snprintf implementations. Not entirely sure how to deal with this across operating systems (I believe Ubuntu requires installing a locale package and the names are different)

The added benchmarks prove that the changes made in #421 around deserialising numbers were justified

stringstream is slow
replacing the comma has minmal impact on perf (<10%)
conversion ended up 3-4x faster using strtod
std::from_chars is currently only a "relatively" minor improvement (30-50% further improvement in throughput up to ~5x stringstream) dwarfed by other factors during file load

Additionally, I added some benchmarks for the serialisation of numbers which was not touched during the previous work. These show a roughly 2x improvement in using snprintf instead of stringstream and then a further 5-6x improvement using std::to_chars (>10x stringstream). As such I implemented the snprintf version in xlsx_producer along with a few other tweaks based on some basic profiling. These tweaks reduced time to save a spreadsheet to ~85-90% of current master.

I'm not certain the doubling of throughput from #421 can be recreated in the serialisation routines. Majority of the improvements would have to come from changes to how the xml serialiser is used, but unlike with deserialisation, there doesn't appear to be major inefficiency with the current serialisation method

relating to previous work #422 Results are matching what was observed at the time ^^ was being worked on std::from_chars is included as the target to beat, but since only MSVC has it for floating point it's not hugely useful yet uniform real distribution is probably a horrible choice, and it might be good to randomise the number of sf in each string also (currently the y all end up at max length) Run on (4 X 3500 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Unified 262K (x4) L3 Unified 6291K (x1) ---------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------------------------- RandFloats/double_from_string_sstream 804 ns 820 ns 896000 RandFloats/double_from_string_strtod 163 ns 162 ns 5973333 RandFloats/double_from_string_strtod_fixed 175 ns 172 ns 5352107 RandFloats/double_from_string_strtod_fixed_const_ref 150 ns 152 ns 5352107 RandFloats/double_from_string_std_from_chars 87.1 ns 88.3 ns 9557333 RandFloatsComma/double_from_string_strtod_fixed_comma_ref 172 ns 173 ns 5146257 RandFloatsComma/double_from_string_strtod_fixed_comma_const_ref 180 ns 175 ns 5352107

* input randomiser was feeding a constant value previously, now actually randomising * start to_string with the current method (sstream), an faster more correct version (sstream_cached), snprintf, and std::to_chars ** NOTE: only std::to_chars and sstream_cached are correct in the face of locales Run on (4 X 3500 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Unified 262K (x4) L3 Unified 6291K (x1) ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------- RandFloatStrs/double_from_string_sstream 1012 ns 1001 ns 640000 RandFloatStrs/double_from_string_strtod 276 ns 276 ns 2488889 RandFloatStrs/double_from_string_strtod_fixed 312 ns 308 ns 2133333 RandFloatStrs/double_from_string_strtod_fixed_const_ref 307 ns 300 ns 2240000 RandFloatStrs/double_from_string_std_from_chars 194 ns 188 ns 3733333 RandFloatCommaStrs/double_from_string_strtod_fixed_comma_ref 315 ns 314 ns 2240000 RandFloatCommaStrs/double_from_string_strtod_fixed_comma_const_ref 306 ns 305 ns 2357895 RandFloats/string_from_double_sstream 1372 ns 1381 ns 497778 RandFloats/string_from_double_sstream_cached 1136 ns 1123 ns 640000 RandFloats/string_from_double_snprintf 536 ns 516 ns 1000000 RandFloats/string_from_double_std_to_chars 116 ns 115 ns 6400000

Run on (4 X 3500 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Unified 262K (x4) L3 Unified 6291K (x1) ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------- RandFloatStrs/double_from_string_sstream 969 ns 977 ns 640000 RandFloatStrs/double_from_string_strtod 274 ns 270 ns 2488889 RandFloatStrs/double_from_string_strtod_fixed 273 ns 273 ns 2635294 RandFloatStrs/double_from_string_strtod_fixed_const_ref 274 ns 276 ns 2488889 RandFloatStrs/double_from_string_std_from_chars 193 ns 193 ns 3733333 RandFloatCommaStrs/double_from_string_strtod_fixed_comma_ref 273 ns 267 ns 2635294 RandFloatCommaStrs/double_from_string_strtod_fixed_comma_const_ref 273 ns 273 ns 2635294 RandFloats/string_from_double_sstream 1323 ns 1311 ns 560000 RandFloats/string_from_double_sstream_cached 1074 ns 1074 ns 640000 RandFloats/string_from_double_snprintf 519 ns 516 ns 1000000 RandFloats/string_from_double_snprintf_fixed 517 ns 516 ns 1000000 RandFloats/string_from_double_std_to_chars 118 ns 117 ns 5600000 RandFloatsComma/string_from_double_snprintf_fixed_comma 520 ns 516 ns 1120000

serialisation ends up roughly 2x improvement going from sstream to snprintf Run on (4 X 3500 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Unified 262K (x4) L3 Unified 6291K (x1) ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------- RandFloatStrs/double_from_string_sstream 968 ns 977 ns 640000 RandFloatStrs/double_from_string_strtod 272 ns 270 ns 2488889 RandFloatStrs/double_from_string_strtod_fixed 272 ns 270 ns 2488889 RandFloatStrs/double_from_string_strtod_fixed_const_ref 273 ns 270 ns 2488889 RandFloatStrs/double_from_string_std_from_chars 193 ns 195 ns 3446154 RandFloatCommaStrs/double_from_string_strtod_fixed_comma_ref 272 ns 273 ns 2635294 RandFloatCommaStrs/double_from_string_strtod_fixed_comma_const_ref 276 ns 273 ns 2635294 RandFloats/string_from_double_sstream 1311 ns 1318 ns 497778 RandFloats/string_from_double_sstream_cached 1076 ns 1050 ns 640000 RandFloats/string_from_double_snprintf 601 ns 600 ns 1120000 RandFloats/string_from_double_snprintf_fixed 600 ns 600 ns 1120000 RandFloats/string_from_double_std_to_chars 117 ns 117 ns 5600000 RandFloatsComma/string_from_double_snprintf_fixed_comma 600 ns 600 ns 1120000

… work user defined copy operators suppress compiler creation of move operations, and not having all of copy/move/dtor defined (rule of 0/5) is suspicious. Also happens to be very slightly slower

coveralls · 2020-03-01T11:15:09Z

Coverage decreased (-0.04%) to 83.572% when pulling f4d00ac on Crzyrndm:feature/benchmark into ae6f9d2 on tfussell:master.

Crzyrndm · 2020-03-02T05:56:55Z

Had a bit more of a play to see if I could identify any major slowdowns in the serialisation code (can be seen here).

Hard work would be the best summary. Ignoring correctness somewhat to make things a bit easier to write, the "very-large.xlsx" benchmark was down to ~5.4s (master ~7.0s, this PR ~6.3s) and there was still some low hanging items (lowest/highest column/row functions are definitely worth avoiding). It's hard work though because it's many relatively minor inefficiencies rather than one obvious issue (death by paper cut).

Going to leave this as is, it quickly gets quite invasive

tfussell · 2020-03-14T13:34:04Z

This is looking good. Thanks for all the effort. I will have to trace through the code a bit more before merging. My only real concern would be breaking changes from removing some of those constructors. I added them for convenience--not sure how many people actually use those.

Crzyrndm · 2020-03-14T19:32:37Z

They're not actually removed, just left to the compiler / defaulted (they're all copy / assignment operations that do nothing special so can let the compiler generate them automatically)

By letting the compiler generate them, you gain slightly in three aspects

Generation of move operations (ctor/assignment) are suppressed if copy operations are user defined. This is because any compiler defined operation is assumed broken (for back compat reasons, the reverse is not true, but is deprecated)
The compiler / optimiser is now free to implement copy operations as it sees fit
It's easy to guarantee correctness of code that doesn't exist 👍

see rule of 5/0: https://en.cppreference.com/w/cpp/language/rule_of_three

was rule of 3/0 before C++11 added move ctors. The summary is that you either want to define all or none of the special member functions

tl;dr
No functional change, just a consistency improvement

PS
This doesn't fully resolve localisation issues during serialisation. The previous state would write numberic values in the global locale so it is an improvement, but numeric attributes need to be updated as well. I figure that should come as a second PR to minimise scope creep.

tfussell · 2020-03-14T19:52:31Z

Gotcha. Thanks for the info. I've been writing mostly Go lately so my C++ is getting rusty. I'll try to merge this later today or tomorrow.

Crzyrndm added 12 commits February 29, 2020 22:11

minor cleanup

d135f35

add saving to the spreadsheet-load test, fix a bug in the serialiser

0915fde

fix git revert error

fbbf7ae

use the new faster serialisation everywhere in xlsx_producer

39f498f

remove declarations of copy/assignment operators that only do default…

932fc45

… work user defined copy operators suppress compiler creation of move operations, and not having all of copy/move/dtor defined (rule of 0/5) is suspicious. Also happens to be very slightly slower

remove a double lookup in the cell map during serialisation

c418c13

speedup worksheet::calculate dimension by only looping the cell map once

0ea0540

fix compiler error

e8cb8d9

Crzyrndm requested a review from tfussell March 1, 2020 11:05

resolve warnings

f4d00ac

tfussell merged commit 2f5934f into tfussell:master Mar 20, 2020

Crzyrndm deleted the feature/benchmark branch April 24, 2020 22:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

microbenchmarks for double<->string conversion, serialisation improvements #447

microbenchmarks for double<->string conversion, serialisation improvements #447

Crzyrndm commented Mar 1, 2020 •

edited

Loading

coveralls commented Mar 1, 2020 •

edited

Loading

Crzyrndm commented Mar 2, 2020 •

edited

Loading

tfussell commented Mar 14, 2020

Crzyrndm commented Mar 14, 2020 •

edited

Loading

tfussell commented Mar 14, 2020

microbenchmarks for double<->string conversion, serialisation improvements #447

microbenchmarks for double<->string conversion, serialisation improvements #447

Conversation

Crzyrndm commented Mar 1, 2020 • edited Loading

coveralls commented Mar 1, 2020 • edited Loading

Crzyrndm commented Mar 2, 2020 • edited Loading

tfussell commented Mar 14, 2020

Crzyrndm commented Mar 14, 2020 • edited Loading

tfussell commented Mar 14, 2020

Crzyrndm commented Mar 1, 2020 •

edited

Loading

coveralls commented Mar 1, 2020 •

edited

Loading

Crzyrndm commented Mar 2, 2020 •

edited

Loading

Crzyrndm commented Mar 14, 2020 •

edited

Loading