WIP: add functions for converting Nats and Bytes with multiple word sizes #2278

stew · 2021-07-30T00:43:10Z

Overview

What does this change accomplish and why? i.e. How does it change the user experience?

Implementation notes

I added encode and decode functions to Bytes.hs and calling them as foreign functions

Interesting/controversial decisions

I would love for someone that knows what they are actually doing to check my encode functions on Bytes, and see if what I'm doing with a mutable block is an efficient / correct way of creating Bytes

Test coverage

TODO

dolio

So, something occurred to me. Tagging @pchiusano too.

If encoding is single values, like you encode one Nat and then concatenate, that's going to lead to pretty fragmented Bytes values, isn't it? The chunks will be extremely small, or if they aren't, it will be a bunch of expensive (due to copying) ByteString concatenation.

It's probably better to have something like Haskell's binary where you 'write' to an intermediate type and then finalize to get a Bytes with reasonable chunk sizes.

parser-typechecker/src/Unison/Runtime/Builtin.hs

dolio · 2021-07-30T15:15:15Z

parser-typechecker/src/Unison/Util/Bytes.hs

+  return (b, (Unison.Util.Bytes.drop 2 bs))
+
+encodeNat64be :: Word64 -> Bytes
+encodeNat64be n =


I think instead of implementing this manually, it'd be better to use on of the binary encoding libraries we already depend on. E.G. we use bytes which has a bunch of endian-specific functions:

https://hackage.haskell.org/package/bytes-0.17.1/docs/Data-Bytes-Put.html

And can spit out a ByteString afterward.

@dolio so I think we are doing much better now in this respect?

Yeah, that looks better. I just default to preferring something in a more widely used library if we're already using it. That way there's hopefully less question whether some fiddly detail is mistaken (because more people would have noticed by now).

pchiusano · 2021-07-30T15:52:27Z

If encoding is single values, like you encode one Nat and then concatenate, that's going to lead to pretty fragmented Bytes values, isn't it? The chunks will be extremely small, or if they aren't, it will be a bunch of expensive (due to copying) ByteString concatenation.

So, the idea is that you can just call Bytes.flatten to convert a fragmented Bytes to a single-chunk one, and that function is quite efficient and does everything in one pass. You can even control the chunk size, like if you want 1024-sized chunks, you can take 1024, flatten, and concatenate with the remaining recursively chunked Bytes.

Basically, Bytes can function as a buffer (unlike ByteString). Maybe the constant factors aren't quite as good as a separate Builder type, but OTOH having a separate Builder type leads to both more plumbing code and also inefficiencies with converting back and forth at various API boundaries.

Also on that note, I wonder if going through Put/Get is overkill and possibly less efficient. Or maybe it gets optimized and is fine. But more problematic with those libraries is they work only with ByteString, which uses pinned memory (and this is never getting fixed it seems, due to ByteString also being used for FFI interchange), so to avoid needless copying we'd need to have Bytes also use ByteString for its chunk type. I think using pinned memory is the wrong choice and don't want to change Bytes to use that.

That said, the implementation of the decoders could be much more efficient @stew to avoid going through Maybe for all those bytes getting pulled out. Will leave a note on how to improve that.

dolio · 2021-07-30T15:59:06Z

I see. I guess flattening the bytes afterwards works fine.

The stuff Stew wrote is also producing ByteString, though, so then that needs to be implemented using something else if you don't want to use ByteString.

pchiusano · 2021-07-30T16:07:28Z

The stuff Stew wrote is also producing ByteString, though, so then that needs to be implemented using something else if you don't want to use ByteString.

Ah yeah, good point.

…ture/natsAndByte

pchiusano · 2021-07-31T18:47:03Z

Great! Seems like this is pretty much there.

What's going on with the CI though?

pchiusano

Great!

working except calling convention for the decode functions

00fc6d6

stew requested review from pchiusano and dolio July 30, 2021 00:43

dolio reviewed Jul 30, 2021

View reviewed changes

stew and others added 5 commits July 30, 2021 11:27

access the head of Bytes efficiently when decoding

0731bb8

Merge branch 'unisonweb:trunk' into feature/natsAndByte

97589ea

minor fixes

108fde0

Merge branch 'feature/natsAndByte' of github.com:stew/unison into fea…

781e648

…ture/natsAndByte

make encoding more efficient. fix the calling convention

4d078b2

stew added 4 commits August 2, 2021 20:56

Merge remote-tracking branch 'origin/trunk' into feature/natsAndByte

988ab69

fix algos add a transcript that fails like a lot of others

e40aeee

transcript for nat encoding / decoding now working and tidy

0abb003

fix output of transcript

af9049f

stew marked this pull request as ready for review August 4, 2021 16:51

pchiusano approved these changes Aug 5, 2021

View reviewed changes

pchiusano merged commit a7e6374 into unisonweb:trunk Aug 5, 2021

pchiusano mentioned this pull request Aug 20, 2021

M2h Release Notes #2342

Closed

aryairani mentioned this pull request Jul 14, 2022

M4 Release Notes (DRAFT) #3209

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: add functions for converting Nats and Bytes with multiple word sizes #2278

WIP: add functions for converting Nats and Bytes with multiple word sizes #2278

stew commented Jul 30, 2021

dolio left a comment

dolio Jul 30, 2021

stew Jul 31, 2021

dolio Aug 2, 2021

pchiusano commented Jul 30, 2021

dolio commented Jul 30, 2021

pchiusano commented Jul 30, 2021

pchiusano commented Jul 31, 2021

pchiusano left a comment

WIP: add functions for converting Nats and Bytes with multiple word sizes #2278

WIP: add functions for converting Nats and Bytes with multiple word sizes #2278

Conversation

stew commented Jul 30, 2021

Overview

Implementation notes

Interesting/controversial decisions

Test coverage

dolio left a comment

Choose a reason for hiding this comment

dolio Jul 30, 2021

Choose a reason for hiding this comment

stew Jul 31, 2021

Choose a reason for hiding this comment

dolio Aug 2, 2021

Choose a reason for hiding this comment

pchiusano commented Jul 30, 2021

dolio commented Jul 30, 2021

pchiusano commented Jul 30, 2021

pchiusano commented Jul 31, 2021

pchiusano left a comment

Choose a reason for hiding this comment