Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarking using arbitrary fuzzing #465

Merged
merged 24 commits into from
Aug 20, 2023
Merged

Conversation

juntyr
Copy link
Member

@juntyr juntyr commented Jul 16, 2023

This is the start of my very roundabout way to get back to #444, where we really need a benchmark that captures something other than JSON-like-RON to ron::Value. I hope to upgrade our arbitrary fuzzer to use proper typing to generate an arbitrary data structure and its corresponding Serialize and Deserialize implementation. For a new PR, we would then first run the fuzzer, then extract the corpus for the arbitrary target, and then benchmark serialising and deserialising based on these examples. Ideally, the current main branch would also be pulled in again and run on these benchmarks as well to provide an automatic comparison.

This will probably take me several weekends to fully implement, but I hope it will finally give us the needed insights to land #444 with the best perf-maintainability tradeoff.

  • I've included my change in CHANGELOG.md

Add tests to document the following bugs found by fuzzing and now fixed:

  • struct, enum, and variant names are always validated
  • unit structs / variants called r can be parsed by ron::Value (which previously thought this was the start of a raw string)
  • strings containing '\\' are serialised as raw strings when escaping is turned off
  • a stack of nested Options which are serialised with #![enable(implicit_some)] and contains a None cannot be uniquely deserialised, since we have no idea where the None came from. This case has to be tracked, so that Somes can be inserted in case a None is detected inside an unbroken stack of implicit Somes.
  • deserialising "A('/')" into ron::Value fails as the struct type searcher reads into the char and then finds a weird comment starter there

Problematic bugs which need to be documented, tested, and discussed further:

  • deserialising Some(...) inside deserialize_any with #![enable(unwrap_variant_newtypes)] cannot work as currently implemented, thus it is now properly detected with a new (very specific) error code. Unwrapping variant newtypes currently reaches through Options, and [v0.9] Breaking: Treat Some like any newtype variant #413 makes it more explicit by treating Some like a newtype variant. However, deserialize_any cannot support newtype variant Some in all cases, since it special-cases Some(...) to look at .... E.g. Some(a: 4) works great in typed mode and looks very nice, but cannot be supported here. Either we decide to make Some explicitly not a newtype variant (which is a breaking change since it kind of escaped through it before and loses us the nice syntax), or we keep this very obscure error which should not be encountered often. The former would definitely be safer. Another alternative is to use Add minimal support for internally tagged and untagged enums #451 to pre-parse the struct type in deserialize_any when unwrap_variant_newtypes is enabled and to handle tuples, structs, and unit structs with special cases.

Future work

@codecov-commenter
Copy link

codecov-commenter commented Jul 16, 2023

Codecov Report

Patch coverage: 86.61% and project coverage change: +0.08% 🎉

Comparison is base (52f282d) 85.19% compared to head (93d06a7) 85.28%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #465      +/-   ##
==========================================
+ Coverage   85.19%   85.28%   +0.08%     
==========================================
  Files          66       72       +6     
  Lines        8513     8850     +337     
==========================================
+ Hits         7253     7548     +295     
- Misses       1260     1302      +42     
Files Changed Coverage Δ
tests/307_stack_overflow.rs 97.91% <ø> (ø)
src/value/mod.rs 47.25% <28.57%> (+0.82%) ⬆️
src/de/mod.rs 76.03% <62.22%> (+0.31%) ⬆️
src/parse.rs 79.18% <71.79%> (-2.92%) ⬇️
src/ser/mod.rs 71.57% <77.02%> (+0.67%) ⬆️
tests/250_variant_newtypes.rs 98.66% <89.28%> (-1.06%) ⬇️
tests/447_compact_maps_structs.rs 100.00% <100.00%> (ø)
tests/465_implicit_some_stack.rs 100.00% <100.00%> (ø)
tests/465_no_comment_char_value.rs 100.00% <100.00%> (ø)
tests/465_r_name_value.rs 100.00% <100.00%> (ø)
... and 5 more

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@juntyr juntyr force-pushed the fuzzy-benchmark branch from 80f0156 to 7a541cf Compare July 17, 2023 12:30
@juntyr juntyr force-pushed the fuzzy-benchmark branch 3 times, most recently from 09f17fe to 4af5152 Compare August 17, 2023 18:21
@juntyr juntyr self-assigned this Aug 17, 2023
@juntyr juntyr marked this pull request as ready for review August 17, 2023 19:13
@juntyr juntyr requested a review from torkleyy August 17, 2023 19:14
@juntyr
Copy link
Member Author

juntyr commented Aug 17, 2023

?r @torkleyy @manunio This PR became quite big and contains several parts:

  • upgrade the arbitrary fuzzer to fuzz any serde data types and values (excluding anything requiring attributes)
  • fix any bugs discovered by the fuzzer so far
  • small code style improvements along the way
  • a benchmark suite which is executed on new PRs and runs across the fuzzer corpus

If you have some time, I'd appreciate any feedback you can give on this PR - thanks in advance!

@juntyr
Copy link
Member Author

juntyr commented Aug 17, 2023

P.S. the benchmarking CI test is expected to still fail since it cannot yet compare against the benchmark on the main branch, which is only added in this PR

Copy link
Contributor

@torkleyy torkleyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit: Looks good!

@juntyr juntyr merged commit dea68fe into ron-rs:master Aug 20, 2023
juntyr added a commit to juntyr/ron that referenced this pull request Aug 20, 2023
juntyr added a commit that referenced this pull request Aug 20, 2023
* First steps towards a lossless Value::Number

* Allow parsing +unsigned as unsigned int

* Add additional tests for number parsing

* Added CHANGELOG entry

* Improve coverage by running tests across all features

* Refactor number parsing for better readability

* Extend number tests to typed ser+de

* Adjust #465 tests to lossless Value::Number
@juntyr juntyr deleted the fuzzy-benchmark branch August 20, 2023 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants