-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change ordered binary encoding to fix bad nesting complexity. #1349
Comments
At #1260 (comment) @warner wrote That's a lot of iterating strings, one character at a time, repeated for every depth of the input object graph, and double that for records. ( The alternative might be a length-prefix on each component string, which also avoids the exponential escape-the-escape-character penalty. Even if the caller (and the other layers of protocol they might be using) manages to avoid calling |
At #1260 (comment) @gibson042 wrote I believe this could be fixed rather conveniently by moving the escaping logic from array encoding/decoding to string encoding/decoding and refactoring array decoding to handle depth. For example, assuming the array sigil is updated from "[" to "~" for control-character adjacency:
Sample encodings (in which U+007F is represented by "␡"):
|
At #1260 (comment) @gibson042 elaborated And we can also reduce the size of JSON representations by abandoning characters between U+0000 and U+001F (inclusive), which require "\u…" escape sequences except for those with special single-character escapes like "\t". This would look something like changing the array element terminator from U+0000 to "&" (updating string escapes to include "&" → "%1" and "%" → "%0") and—because there is a requirement that the terminator sorts before anything else—the error sigil from "!" to "*" (assuming we don't care about the relative sort order of types w.r.t. each other).
|
@gibson042 How close is this to completion? Should the title read “compact-ordered” now? |
At #1260 (comment) @warner explains that the current ordered binary encoding has a bad complexity measure for nested structures, due to the doubling of bracketing escapes with each additional level.
At #1260 (comment) @gibson042 suggests a much better binary encoding without this bad complexity problem. Rather than do this as part of #1260 , I'm filing this bug so we can remember to do it later.
Either way, for any such encoding change we will need to manage the transition, much like we managed the transition from capdata to smallcaps.
The text was updated successfully, but these errors were encountered: