NOTE: The README and documentation is being worked on. Stay tuned.
Diffstore is a simple but flexible embeddable key-value storage aimed at statistical data analysis and caching.
There is two concepts central to this library - entity and snapshot.
- Entity is just a strongly typed key-value pair
- Snapshot is a copy of the entity at some time point (can be specified or set automatically)
By default, whenever you make changes to an entity, Diffstore saves a snapshot of it's previous state. You can specify which fields should not be 'tracked' (not included in snapshots) or not saved both in entity and its snapshots (some runtime-related data, for example), which makes it useful for statistical applications.
- Simple to use
- Flexible and extensible
- Low resource usage (both memory and storage)
- No additional layers such as caching and connection management
If you need more features, check out the Diffstore DBMS project.
Currently supported entity formats for the file-based storage are:
- XML
- JSON (powered by Jil)
- Binary
Available snapshot managers:
- Single file per snapshot (uniform access time)
- Last-first binary files with configurable partitioning (low disk usage, faster access for newer data, read-oriented, GZIP-friendly)
It's possible to extend the engine with other options like relational DB backend (MySQL, Postgres, etc.) and combined storage options (to use the Diffstore as a intermediate caching layer, for example).
You can check out the test source which covers the basics.
This storage engine was developed and is now used for the SteamTrends service, which collects statistical data about Steam games every day and tracks this data for more than 19000 entries on a low-spec Linux-based VPS.
Using this library? Contact me and you'll be added!
Benchmark was performed 100 times with a warmup run before the measures to ensure JIT compilation of the benchmarking code.
Machine specifications:
- HGST HTS721010A9E630 (7200 RPM, 32MB buffer, 4 ms average latency)
- i7 2630QM
- Debian 8 Jessie