Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: add functions for converting Nats and Bytes with multiple word sizes #2278

Merged
merged 10 commits into from
Aug 5, 2021

Conversation

stew
Copy link
Member

@stew stew commented Jul 30, 2021

Overview

What does this change accomplish and why? i.e. How does it change the user experience?

Fixes: #2277

Implementation notes

I added encode and decode functions to Bytes.hs and calling them as foreign functions

Interesting/controversial decisions

I would love for someone that knows what they are actually doing to check my encode functions on Bytes, and see if what I'm doing with a mutable block is an efficient / correct way of creating Bytes

Test coverage

TODO

@stew stew requested review from pchiusano and dolio July 30, 2021 00:43
Copy link
Contributor

@dolio dolio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, something occurred to me. Tagging @pchiusano too.

If encoding is single values, like you encode one Nat and then concatenate, that's going to lead to pretty fragmented Bytes values, isn't it? The chunks will be extremely small, or if they aren't, it will be a bunch of expensive (due to copying) ByteString concatenation.

It's probably better to have something like Haskell's binary where you 'write' to an intermediate type and then finalize to get a Bytes with reasonable chunk sizes.

parser-typechecker/src/Unison/Runtime/Builtin.hs Outdated Show resolved Hide resolved
return (b, (Unison.Util.Bytes.drop 2 bs))

encodeNat64be :: Word64 -> Bytes
encodeNat64be n =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of implementing this manually, it'd be better to use on of the binary encoding libraries we already depend on. E.G. we use bytes which has a bunch of endian-specific functions:

https://hackage.haskell.org/package/bytes-0.17.1/docs/Data-Bytes-Put.html

And can spit out a ByteString afterward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dolio so I think we are doing much better now in this respect?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that looks better. I just default to preferring something in a more widely used library if we're already using it. That way there's hopefully less question whether some fiddly detail is mistaken (because more people would have noticed by now).

@pchiusano
Copy link
Member

If encoding is single values, like you encode one Nat and then concatenate, that's going to lead to pretty fragmented Bytes values, isn't it? The chunks will be extremely small, or if they aren't, it will be a bunch of expensive (due to copying) ByteString concatenation.

So, the idea is that you can just call Bytes.flatten to convert a fragmented Bytes to a single-chunk one, and that function is quite efficient and does everything in one pass. You can even control the chunk size, like if you want 1024-sized chunks, you can take 1024, flatten, and concatenate with the remaining recursively chunked Bytes.

Basically, Bytes can function as a buffer (unlike ByteString). Maybe the constant factors aren't quite as good as a separate Builder type, but OTOH having a separate Builder type leads to both more plumbing code and also inefficiencies with converting back and forth at various API boundaries.

Also on that note, I wonder if going through Put/Get is overkill and possibly less efficient. Or maybe it gets optimized and is fine. But more problematic with those libraries is they work only with ByteString, which uses pinned memory (and this is never getting fixed it seems, due to ByteString also being used for FFI interchange), so to avoid needless copying we'd need to have Bytes also use ByteString for its chunk type. I think using pinned memory is the wrong choice and don't want to change Bytes to use that.

That said, the implementation of the decoders could be much more efficient @stew to avoid going through Maybe for all those bytes getting pulled out. Will leave a note on how to improve that.

@dolio
Copy link
Contributor

dolio commented Jul 30, 2021

I see. I guess flattening the bytes afterwards works fine.

The stuff Stew wrote is also producing ByteString, though, so then that needs to be implemented using something else if you don't want to use ByteString.

@pchiusano
Copy link
Member

The stuff Stew wrote is also producing ByteString, though, so then that needs to be implemented using something else if you don't want to use ByteString.

Ah yeah, good point.

@pchiusano
Copy link
Member

Great! Seems like this is pretty much there.

What's going on with the CI though?

@stew stew marked this pull request as ready for review August 4, 2021 16:51
Copy link
Member

@pchiusano pchiusano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@pchiusano pchiusano merged commit a7e6374 into unisonweb:trunk Aug 5, 2021
@pchiusano pchiusano mentioned this pull request Aug 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Binary encoding/decoding primitives
3 participants