-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary syntax #174
Comments
Am new to this kind of stuff. Can you provide:
|
In order to stay neutral, the syntax extensions needn't necessarily be limited to only binary, but also Strings. Essentially what I'm looking for is some kind of structured assignment where the left hand and right hand can incorporate identifiers and metadata. Put in another way, this syntax can be a sort of quasiquoting where the preprocessing function can return some number of identifiers to be initialized or modified within the current context, in addition to operating on identifiers in the current context. Here is a proposal for my ideal look of the new syntax, which may be modified to become more generically applicable beyond regexp-style destructuring for strings and erlang-style parsing and packing for binary:
And of course, String is not blessed with a corresponding packing, because we already have that in the form of string interpolation. |
How are those examples supposed to desugar? I can sort of see the second one being: ref$ = /([^a-z]*)([a-z]+)(.*)/.exec(pfxSfxData)
prefix = ref$[1]
middle = ref$[2]
suffix = ref$[3] But there's a lot of room for interpretation. |
@qqueue, that's about right; notice, though, that "prefix" and "suffix" are just character sequences that match the data, where "middle" is an identifier for a capturing group. I'd recommend use of XRegExp to implement this functionality, but the desugar is possible with plain RegExp as well. Here's how the desugar might look: var pfxSfxData = 'prefixsomestuffinthemiddlesuffix';
var ref$ = XRegExp.exec(pfxSfxData, XRegExp('prefix(?<middle>[a-z]+)suffix'));
var middle = ref$.middle; All of the top level capturing groups are declared as necessary in the current context and assigned. For supporting Coco's existing differentiation between declaration and assignment v. just assignment (i.e.
|
The binary example uses length-based parsing and packing, so might look something like: // <<timestamp: 4, len: 2, payload>> = apnsData
var ref$ = new bitjs.io.BitStream(apnsData, true, 0, apnsData.length);
var timestamp = ref$.readBits(4);
var len = ref$.readBits(2);
var payload = ref$.readBits(apnsData.length * 8 - 6); The previous uses the bitjs library I mentioned in the first post; unfortunately the library doesn't have functionality for writing a bitstream, but I'm sure you get the idea of how it would be done. |
We don't want to pull in an entire library just to enable some sugar. However, the regex proposal is essentially just named capturing groups, right? Thus, it could desugar like this, without needing a library:
Doesn't seem too unreasonable. Concerns:
The binary packing/unpacking seems a bit harder to do simply though. Looking at bitjs, there's a lot of code there. Also, you can get pretty close to your proposed syntax already with destructuring and a hypothetical library that unpacks to objects/arrays, e.g.:
Without either a simple compilation strategy or massive performance/readability gains, the binary syntax is a tougher sell. |
Yep, that regex desugar seems fine (minus the usual undiscovered problems, of course ;). Sort of a double desugar too, which is fun. I think checking for a match could be either checking that one of the named variables is not Regarding the binary stuff, I agree, it's a bit more difficult, not only to do generally speaking, but specifically to do cross-platform. What might be reasonable would be a more generic syntax for this sort of stuff. Maybe what I've really been thinking about is Coco macros. |
Here's a more complete desugar proposal for "regex destructuring":
var middle, ref$;
(ref$ = /prefix([a-z]+)suffix/.exec(prefixSfxData), ref$ !== null && middle = ref$[0], ref$); Like the other destructuring syntaxes, the result of You make a good point that this syntax mirrors the other destructuring assignments better than
In other words, Re: the binary syntax, the general case would indeed be difficult, since node has buffers and typed arrays. Any sort of syntax-level binary packing/unpacking in coco probably needs a matching, standardized, widely-used binary literal in vanilla JS before being feasible. We'll have to see whether binary literals in JS or macros in coco happen first :) Meanwhile, maybe you could try sweetjs. |
Definitely, this proposal and the proposal for I think a binary syntax is quite feasible using runtime platform detection. For performance reasons, node users will prefer buffers to typed arrays, but including two function bodies instead of one is not a major issue. |
Feasible, yes, but it also needs to be fast, and since you're trying to unpack binary bits, I imagine speed is of the essence. If a var width, height, ref$
ref$ = data
width = ref$ & 3
height = ref$ >>> 2 & 3 or nobody will want to use it. Runtime detection of array types certainly won't help much there. However, the above snippet shows that if you can assume to be operating on 32 bits or less (as a Number), compilation without helper functions is feasible. You lose some flexibility with non-aligned unpacking, but it's certainly faster and easier to implement. |
If we have foreknowledge of the values used in the specifier lengths, e.g. your I think the real issue here is that we want to support both typed arrays and Buffers, any thoughts on this? |
The only binary unpacking I've ever done with JS are the packed bytes of the GIF file format, which are byte-aligned, fixed-width fields. Do you come across variable width and/or non-aligned bit-level unpacking that often? After more research, I found out that both typed arrays and buffers support regular array-like indexing |
Agreed; that sounds like a good solution. Can we always count on typed arrays/buffers having a 32-bit word size, or is it system dependent? |
Node Buffers are indexed by bytes, and ArrayBuffers are indexed by whatever ArrayBufferView is used. What I was kind of imagining for this syntax is you'd use "vanilla" JS to get to the 32 or fewer bits you want to unpack, and then use new syntax to sugar over the needed bit-shifts, e.g:
These would compile to bit-shifts and masking, assuming 32 bit, big endian. After looking back at the Erlang bit syntax documentation, I can see the appeal of having larger bit-unpacking than just 32 bits though, e.g. IP packets. That would then have to make assumptions about array field bit-length, endianness, and so on, which is more difficult to compile efficiently. If we can assume big endian (js Number default), a 8 bits, you can still compile to bit operations, but it's less flexible/efficient. One solution is to specify the sub-element length in bits somewhere in the syntax:
With a default of 8 to make Buffers "just work". Pretty ugly looking, though. Another concern is how to efficiently make arrays out of the I've personally only ever dealt with ArrayBuffers in browser JS myself. Do you know if there are any moves to standardize Node/browser binary formats to just one API, even as just a proposal in ES6/ESnext? Dealing with just ArrayBuffers or just Buffers would make this more feasible. |
About the future, http://blog.nodejs.org/2012/06/25/node-v0-8-0/. That said, buffers are pretty ingrained in the source of node as of now, so we'll see if this transition makes it sooner rather than later. It's possible that this feature could be implemented in such a way as to support Buffers and typed arrays as described thus far, and then the Buffer code could be pruned later, or that the feature can be implemented just for typed arrays and become more useful as node transitions. |
Though I know it is likely to involve considerable effort, one of my chief displeasures of late with using Javascript v. Erlang has been Javascript's lack of reasonable binary parsing/packing functionality. Coco has been a great pleasure to use in numerous aspects, and it would be wonderful to see it forging ahead of what seems to be most of the language space in this regard.
Check out the Erlang bitstring docs here for an example: http://www.erlang.org/doc/programming_examples/bit_syntax.html
Though the particular sugar used may not be suitable for Coco's grammar, I suspect a reasonable substitute could be found. Additionally, there exist some Javascript libraries for working with binary around, e.g. https://github.com/substack/node-binary and http://code.google.com/p/bitjs/, which might form the basis of the builtins needed to back the syntax. The former is node-specific, the latter not. My supposition is that the use of Buffers and BufferLists in node would be desirable for performance, and several implementation could be included inline with the necessary platform detection and preference to determine which to use. Clearly a single implementation that works across all platforms is also better than none, and may be acceptable performance-wise.
The text was updated successfully, but these errors were encountered: