Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow writing the char representation when creating enums #448

Open
lithp opened this issue Jun 5, 2018 · 7 comments
Open

Allow writing the char representation when creating enums #448

lithp opened this issue Jun 5, 2018 · 7 comments

Comments

@lithp
Copy link

lithp commented Jun 5, 2018

The postgres protocol (usually) begins every message with a byte which determines what the rest of the message will be. This is a great use-case for a enum. What I'd like to be able to do is this:

enums:
  messages:
    '"R"': AuthenticationRequest
    '"K"': BackendKeyData
    '"S"': ParameterStatus
    '"B"': Bind
    '"2"': BindComplete
    '"3"': CloseComplete
    '"C"': CommandComplete' 
    '"d"': CopyData
    '"c"': CopyDone
    '"f"': CopyFail
    '"G"': CopyInResponse
    '"H"': CopyOutResponse
    '"W"': CopyBothResponse  # used for streaming replication
    '"D"': DataRow  # from backend

However, this fails:

/enums/messages: unable to parse `"W"` as int

As a workaround I'm just using a type:

  message:
    seq:
      - id: msg_type
        type: str
        encoding: ASCII
        size: 1
      - id: len
        type: u4
      - id: body
        size: len - 4  # the length includes itself
        type:
          switch-on: msg_type
          cases:
            '"R"': authentication_request
            '"K"': backend_key_data

But it'd be nice to not need to.

@KOLANICH
Copy link

KOLANICH commented Jun 5, 2018

See #434 for my proposal, which overlaps this one.

@GreyCat
Copy link
Member

GreyCat commented Jun 6, 2018

#434 proposes much more radical changes. This one is doable staying in current paradigm: enums stay as lists of integer-based constants, the only thing changes is that we could introduce an extra way to represent integers by doing string-to-integer conversion (akin to 0o which does octal-to-integer conversion or 0x which does hex-to-integer). This, in turn, if it stays constant, can be used for static stuff like enums keys as well.

I want to emphasize that these are not the strings. No integer-to-string conversion (with encoding and stuff) and a string comparison should take place here.

Typically, languages like C/C++ and even Java allow syntax like 'x' to represent single bytes. Also we should probably have a way to specify multi-byte integers this way. If we'll stick to existing practice, this usually means that we're bound to:

  1. Define the string
  2. Convert it to byte array
  3. Convert it to integer (for example, unsigned 4-byte long little-endian)

I.e. although even this does not work right now, it should be something like "RIFF".to_bytes.to_u4le. Probably we could benefit from some more syntactic sugar here. Any ideas?

@lithp
Copy link
Author

lithp commented Jun 7, 2018

Also we should probably have a way to specify multi-byte integers this way.

I just look a quick look at the YAML spec, it already allows putting escape sequences into double-quoted strings, you can write literals such as: "\x0d\x0ahello\x0", I'm not very good with Unicode but I think for any of these literals there's only one obvious byte array it could become.

So, I think my proposal is to allow someone to write this:

enums:
  messages:
    "R": AuthenticationRequest
    10: BackendKeyData
    0x05: ParameterStatus
    "\x06": Bind

Where Kaitai attempts to convert each key into a byte array and throws an error if all keys do not result in byte arrays with the same width.

@lithp
Copy link
Author

lithp commented Jun 7, 2018

Where Kaitai attempts to convert each key into a byte array and throws an error if all keys do not result in byte arrays with the same width.

This rule gets a bit tricky if you allow literals such as 10, so maybe it only throws an error if there's no size of byte array which all of the keys can fit into? So, this would be invalid:

enums:
  messages:
    "R": AuthenticationRequest
    300: BackendKeyData

Because the first key says "this enum is for a byte array with width 1", while the second key requires at least a 2-byte enum.

@webbnh
Copy link

webbnh commented Jun 13, 2018

Is there a reason why having a consistent size matters? (So long as all of the values can be coerced to an integer type, I would think that "shorter" ones could simply be padded or promoted.)

I think all that really matters is whether the keys can be represented as an enum-type (which is an integer of some limited size (like 32 bits)). So, single character strings can be rendered as their ASCII code value or their Unicode code-point; integers work as they are; and, anything else has to be a string containing a number in a convertible format (with an optional radix specifier, etc.).

@webbnh
Copy link

webbnh commented Jun 13, 2018

I suppose you could generalize things to accept non-numeric strings of an appropriately short length (shorter ones would be padded with nulls) which could be converted to a numeric enum value by converting each character to its ASCII code (etc.) and then shifting and summing them.

@GreyCat
Copy link
Member

GreyCat commented Jun 14, 2018

@webbnh There are two severe problems that I see here:

  1. Converting strings => integers is non-trivial. Shall "ABC" be converted into 0x41424300, or 0x43424100, or 0x00414243, or 0x00434241? Brining up non-ASCII is even weirder, as "ƛ" might be 0x019B (Unicode codepoint) or 0xC69B or 0x9BC6 (UTF-8 in different endiannesses).

  2. Added yet another "special case" handling exactly for enums would be very confusing. It's probably better to add support for that in KS expression language and allow KS expression language in enum keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants