Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Immer persist #278

Merged
merged 103 commits into from
Sep 2, 2024
Merged

Immer persist #278

merged 103 commits into from
Sep 2, 2024

Conversation

alex-sparus
Copy link
Collaborator

@alex-sparus alex-sparus commented Feb 14, 2024

Structural sharing is traditionally preserved in memory only. When serializing multiple values to disk, these are linearized—for example, by writing out sequences as JSON arrays—losing whatever sharing may exist between these values. A similar thing happens when transforming data structures (as with std::transform or a functional map())

Here we introduce an experimental module immer::persist that enables the serialization of pools of containers while preserving structural sharing. With it, we can also do in-memory transformation over large sets of data that considers and preserves structural sharing. This is novel technique, not present in any other implementation of persistent data structures. It has a few use-cases inside BRONZE.

@alex-sparus alex-sparus force-pushed the immer-archive branch 2 times, most recently from c96f658 to 80fd4b3 Compare February 16, 2024 10:10
@codecov-commenter
Copy link

codecov-commenter commented Feb 16, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 96.89922% with 4 lines in your changes missing coverage. Please review.

Project coverage is 90.54%. Comparing base (bd9f318) to head (646315e).
Report is 114 commits behind head on master.

Files with missing lines Patch % Lines
test/oss-fuzz/flex-vector-st-0.cpp 96.15% 4 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #278      +/-   ##
==========================================
+ Coverage   90.53%   90.54%   +0.01%     
==========================================
  Files         120      121       +1     
  Lines       12151    12220      +69     
==========================================
+ Hits        11001    11065      +64     
- Misses       1150     1155       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alex-sparus alex-sparus force-pushed the immer-archive branch 2 times, most recently from 8eefeb9 to 48fa5ad Compare March 19, 2024 10:28
@alex-sparus alex-sparus force-pushed the immer-archive branch 3 times, most recently from 6ab55af to 14e49b2 Compare April 10, 2024 11:35
@alex-sparus alex-sparus force-pushed the immer-archive branch 4 times, most recently from 8d43497 to 2254d22 Compare April 23, 2024 15:21
@alex-sparus alex-sparus force-pushed the immer-archive branch 6 times, most recently from 3fc43af to 023531f Compare May 24, 2024 13:52
@alex-sparus alex-sparus force-pushed the immer-archive branch 2 times, most recently from 4ab992b to 44ffba1 Compare June 4, 2024 16:02
@arximboldi arximboldi changed the title Immer archive Immer persist Jun 25, 2024
Copy link
Owner

@arximboldi arximboldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking very good, I think this is almost done. Do you think this is ready to get the draft marker gone?

A couple of polishing remarks, mostly cosmetic stuff, some questions, etc. I'm sorry I'm not doing inline comments, but the diff is so big that the Github UI goes crazy when I try to do so.

  1. I see you've introduced also immer/extra/cereal/... with some of the serializing code we use in so many other projects. I think this is good, but should we make this complete and include serielizers for all immer types, and get rid of this in the Lager repo: https://github.com/arximboldi/lager/tree/master/lager/extra/cereal

  2. I suggest splitting the file immer/extra/persist/cereal/with_pools.hpp into immer/extra/persist/cereal/save.hpp and immer/extra/persist/cereal/load.hpp. These seem to be the main files the library is providing and it is not so clear what the file is about from looking at the current name.

  3. When looking at the pool internal interface, there is a confusing mix of free functions and methods. For example, save is a method of output_pool but, but add_to_pool is not. Maybe they should both be methods? Also what about adding a bit encapsulation? I.e. making the data, and methods that are not meant to be used externally, private?

  4. It seems detail/common/pool.hpp is dead code. Can it be removed? Are there any other bits of code that I'm missing that can be removed?

  5. It seems some of the implementation files introduce a depednency on spdlog, mostly for debugging/trace messages. Can we get rid of it?

  6. The documentation should mention early that this library introduces dependencies additional to immer: C++17 (or even C++20?), boost::hana (other parts of boost as well?) and for the most part cereal also.

  7. There are other changes I'd like to make to the docs, but I think I will do that once we're merged as it is easier than me making changes into your repo... Please remind me to give you write acces into the immer repo, so future PR's you can make directly from a branch in immer 🙂

  8. Talking about the documentation: it doesn't seem to indicate why the types need to be part of boost::hana by default. It could be an interesting thing to mention. Also I personally think the example code would be clearer if BOOST_HANA_DEFINE_STRUCT wasn't used, but instead we used the alternative syntax where the struct is defined normally and then separately the struct is lifted into Hana (it requires a few more characters, but the code is clearer in my opinion, particularly for people not familiar with Hana, as one doesn't need to understand the internals of Hana in order to know that the macro expands to mostly just a normal struct).

Overall excellent work! I think this is a massive use-case, it opens so many possibilities. And a unique feature not provided by any other immutable data-structures library I know of. Something worth writing a paper about probably! Looking forward to see this get more use and to evolve, so we can eventually treat it as a first-class citizen of immer.

@alex-sparus alex-sparus marked this pull request as ready for review August 29, 2024 13:54
@arximboldi arximboldi merged commit d8d5e5d into arximboldi:master Sep 2, 2024
20 of 22 checks passed
@arximboldi
Copy link
Owner

Alright! Thank you for all this work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants