-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(marshal): encode capData in 1 level of JSON #1804
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would so much rather we properly split encoding from serialization for marshal, as discussed in #1478.
Also I would really prefer if we could find a way to partial parse JSON instead of relying of undocumented serialization constraints (body first, no spaces, etc.) I remember have a discussion with @gibson042 about what API we would need from JS to allow this.
assert(Array.isArray(slots)); | ||
const slotj = JSON.stringify(slots); | ||
slotj.indexOf(':[') < 0 || Fail`expected simple slots`; | ||
const body1 = body.replace(/^#/, ''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not check body[0] === '#' and do body.slice(1)
, I think that's a lot more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to assume that the argument is a CapData record whose body
is a "#"-prefixed JSON serialization of SmallCaps-encoded data, which would need a lot more explanation than appears here (and a better name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
body.replace(/^#/, '')
handles both smallCaps and qclass, no? (I haven't tested it, though).
Why is .slice(1)
significantly more efficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
body.replace(/^#/, '')
handles both smallCaps and qclass, no? (I haven't tested it, though).
I guess that depends upon what this function is expected to return. Regardless of the answer to that, though, CapData like { body: `{"@qclass":"bigint","digits":"0"}`, slots: [] }
and { body: `#"+0"`, slots: [] }
represent exactly the same data (0n
) but would have distinct String('{"$body":{"@qclass":"bigint","digits":"0"},"slots":[]}')
and String('{"$body":"+0","slots":[]}')
return values (respectively) from the current implementation—which seems like a problem because there's no remaining signal differentiating smallcaps from the legacy encoding.
Why is
.slice(1)
significantly more efficient?
The answer is implementation-specific, but basically comes down to being zero-copy.
$ esbench --eshost-option '-h V8,*XS*' \
'const unprefixed="a".repeat(1000), prefixed = "#" + unprefixed' '{
"unprefixed.replace": `result = unprefixed.replace(/^#/, "")`,
"unprefixed.slice": `result = unprefixed.startsWith("#") ? unprefixed.slice(1) : unprefixed`,
"prefixed.replace": `result = prefixed.replace(/^#/, "")`,
"prefixed.slice": `result = prefixed.startsWith("#") ? prefixed.slice(1) : prefixed`,
}'
#### Moddable XS
unprefixed.replace: 0.06 ops/ms
unprefixed.slice: 3.56 ops/ms
prefixed.replace: 0.07 ops/ms
prefixed.slice: 0.95 ops/ms
#### V8
unprefixed.replace: 29.41 ops/ms
unprefixed.slice: 100.00 ops/ms
prefixed.replace: 19.61 ops/ms
prefixed.slice: 62.50 ops/ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...
Why is
.slice(1)
significantly more efficient?The answer is implementation-specific, but basically comes down to being zero-copy.
I guess I trained my regex intuitions in perl where such things are optimized out the wazoo.
Thanks for the esbench
details.
slotj.indexOf(':[') < 0 || Fail`expected simple slots`; | ||
const body1 = body.replace(/^#/, ''); | ||
assertJSON(body1); | ||
const json = `{"$body":${body1},"slots":${slotj}}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #1478 (comment) I suggest body#
|
||
export const JSONToCapData = json => { | ||
assert.typeof(json, 'string'); | ||
json.startsWith('{"$body":') || Fail`expected $body`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this only works when this body is first in the serialized JSON, not second?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just far too brittle for comfort, and doesn't feel like the right way to solve a "too much escaping" problem (assuming that is in fact what motivates it).
I would so much rather we properly split encoding from serialization for marshal, as discussed in #1478.
I agree.
Also I would really prefer if we could find a way to partial parse JSON instead of relying of undocumented serialization constraints (body first, no spaces, etc.) I remember have a discussion with @gibson042 about what API we would need from JS to allow this.
Yeah, but I don't know if we wrote it down (https://github.com/Agoric/agoric-private/issues/31#issuecomment-1494853056 is related but definitely distinct, as is Go-style hybrid decoding). At any rate, it's not difficult, although it would require going beyond the standard library.
assert(Array.isArray(slots)); | ||
const slotj = JSON.stringify(slots); | ||
slotj.indexOf(':[') < 0 || Fail`expected simple slots`; | ||
const body1 = body.replace(/^#/, ''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to assume that the argument is a CapData record whose body
is a "#"-prefixed JSON serialization of SmallCaps-encoded data, which would need a lot more explanation than appears here (and a better name).
refs: #1558 , Agoric/agoric-sdk#7999
Description
encode capData to 1 level of JSON, much like #1558, but
lastIndexOf
rather than a regexmotivation: senders pay by the byte etc.
Security Considerations
careful review for confusion vulnerability is in order
Scaling Considerations
double-backslashes cost storage space
Documentation Considerations
This flatter format is easier to read, and so easier to document in some senses, though there's a mixing of levels that's somewhat subtle.
Testing Considerations
This has unit tests for specific examples plus fastcheck tests. Whether I stated the property exactly quite right is worth careful review.
Upgrade Considerations
This is a DRAFT, pending:
cc @erights @gibson042