`squirrel-json`

🐿⚡

This is heavily based on the JSON deserializer used by Seq's storage engine. You might find this useful if you're building a document database that stores documents as minified JSON maps. The job of this code is to take a minified JSON object, like:

{"@t":"2020-03-12T17:08:37.6065924Z","@mt":"Redirecting to continue intent {Intent}","Elapsed":3456}

and produce a flat tape of offsets into that document that can be fed to a traditional JSON parser to extract. It scans through the document using vectorized CPU instructions that find and classify the features of the document very efficiently. If only a fraction of that document is actually needed to satisfy a given query then only that fraction will pay the cost of full deserialization. This is how Seq supports performant queries over log data without attempting to fit it into column storage, or requiring it to reside in RAM.

squirrel-json takes inspiration from simd-json and is very fast. squirrel-json is an interesting piece of software, but is neither as useful nor as interesting as simd-json if you're looking for a state-of-the-art JSON deserializer. This library makes heavy trade-offs to perform very well for sparse deserialization of pre-validated JSON maps at the expense of being unsuitable for just about anything else.

See this blog post for some more details!

Platform support

This library currently supports x86 using AVX2 intrinsics, and ARM using Neon intrinsics. Other platforms are supported using a slower (but still reasonably fast) fallback parser. Unfortunately we don't have a way to test ARM in CI here yet, so support is best-effort.

⚠️ CAREFUL

This library is designed for parsing pre-validated, minified JSON maps. It guarantees UB freedom for any input (including when that input is invalid UTF8), but only guarantees sensical results for valid JSON. See the test cases with an invalid_ prefix to get an idea of what different kinds of input do.

This library contains a lot of unsafe code and is very performance sensitive. Any changes need to be carefully considered and should be:

tested against the benchmarks to make sure we don't regress (at least not accidentally).
fuzz tested to ensure there aren't soundness holes introduced.

We take advantage of properties of the JSON document to avoid bounds checks wherever possible and use tricks like converting enum variants into interior pointers. Hot paths try to avoid branching as much as possible.

Any unchecked operations performed on the document are done using macros that use the checked variant in test/debug builds to make sure we don't ever cause UB when working through documents.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
benches		benches
build		build
cases		cases
fuzz		fuzz
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
build.rs		build.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

`squirrel-json`

🐿⚡

Platform support

⚠️ CAREFUL

About

Licenses found

Releases

Packages

Contributors 3

Languages

License

Licenses found

datalust/squirrel-json

Folders and files

Latest commit

History

Repository files navigation

squirrel-json

🐿⚡

Platform support

⚠️ CAREFUL

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

`squirrel-json`

Packages