Allow writing the char representation when creating enums #448

lithp · 2018-06-05T14:19:58Z

The postgres protocol (usually) begins every message with a byte which determines what the rest of the message will be. This is a great use-case for a enum. What I'd like to be able to do is this:

enums:
  messages:
    '"R"': AuthenticationRequest
    '"K"': BackendKeyData
    '"S"': ParameterStatus
    '"B"': Bind
    '"2"': BindComplete
    '"3"': CloseComplete
    '"C"': CommandComplete' 
    '"d"': CopyData
    '"c"': CopyDone
    '"f"': CopyFail
    '"G"': CopyInResponse
    '"H"': CopyOutResponse
    '"W"': CopyBothResponse  # used for streaming replication
    '"D"': DataRow  # from backend

However, this fails:

/enums/messages: unable to parse `"W"` as int

As a workaround I'm just using a type:

  message:
    seq:
      - id: msg_type
        type: str
        encoding: ASCII
        size: 1
      - id: len
        type: u4
      - id: body
        size: len - 4  # the length includes itself
        type:
          switch-on: msg_type
          cases:
            '"R"': authentication_request
            '"K"': backend_key_data

But it'd be nice to not need to.

The text was updated successfully, but these errors were encountered:

KOLANICH · 2018-06-05T20:53:56Z

See #434 for my proposal, which overlaps this one.

GreyCat · 2018-06-06T20:32:39Z

#434 proposes much more radical changes. This one is doable staying in current paradigm: enums stay as lists of integer-based constants, the only thing changes is that we could introduce an extra way to represent integers by doing string-to-integer conversion (akin to 0o which does octal-to-integer conversion or 0x which does hex-to-integer). This, in turn, if it stays constant, can be used for static stuff like enums keys as well.

I want to emphasize that these are not the strings. No integer-to-string conversion (with encoding and stuff) and a string comparison should take place here.

Typically, languages like C/C++ and even Java allow syntax like 'x' to represent single bytes. Also we should probably have a way to specify multi-byte integers this way. If we'll stick to existing practice, this usually means that we're bound to:

Define the string
Convert it to byte array
Convert it to integer (for example, unsigned 4-byte long little-endian)

I.e. although even this does not work right now, it should be something like "RIFF".to_bytes.to_u4le. Probably we could benefit from some more syntactic sugar here. Any ideas?

lithp · 2018-06-07T09:17:41Z

Also we should probably have a way to specify multi-byte integers this way.

I just look a quick look at the YAML spec, it already allows putting escape sequences into double-quoted strings, you can write literals such as: "\x0d\x0ahello\x0", I'm not very good with Unicode but I think for any of these literals there's only one obvious byte array it could become.

So, I think my proposal is to allow someone to write this:

enums:
  messages:
    "R": AuthenticationRequest
    10: BackendKeyData
    0x05: ParameterStatus
    "\x06": Bind

Where Kaitai attempts to convert each key into a byte array and throws an error if all keys do not result in byte arrays with the same width.

lithp · 2018-06-07T09:27:39Z

Where Kaitai attempts to convert each key into a byte array and throws an error if all keys do not result in byte arrays with the same width.

This rule gets a bit tricky if you allow literals such as 10, so maybe it only throws an error if there's no size of byte array which all of the keys can fit into? So, this would be invalid:

enums:
  messages:
    "R": AuthenticationRequest
    300: BackendKeyData

Because the first key says "this enum is for a byte array with width 1", while the second key requires at least a 2-byte enum.

webbnh · 2018-06-13T20:10:30Z

Is there a reason why having a consistent size matters? (So long as all of the values can be coerced to an integer type, I would think that "shorter" ones could simply be padded or promoted.)

I think all that really matters is whether the keys can be represented as an enum-type (which is an integer of some limited size (like 32 bits)). So, single character strings can be rendered as their ASCII code value or their Unicode code-point; integers work as they are; and, anything else has to be a string containing a number in a convertible format (with an optional radix specifier, etc.).

webbnh · 2018-06-13T20:13:26Z

I suppose you could generalize things to accept non-numeric strings of an appropriately short length (shorter ones would be padded with nulls) which could be converted to a numeric enum value by converting each character to its ASCII code (etc.) and then shifting and summing them.

GreyCat · 2018-06-14T10:30:54Z

@webbnh There are two severe problems that I see here:

Converting strings => integers is non-trivial. Shall "ABC" be converted into 0x41424300, or 0x43424100, or 0x00414243, or 0x00434241? Brining up non-ASCII is even weirder, as "ƛ" might be 0x019B (Unicode codepoint) or 0xC69B or 0x9BC6 (UTF-8 in different endiannesses).
Added yet another "special case" handling exactly for enums would be very confusing. It's probably better to add support for that in KS expression language and allow KS expression language in enum keys.

GreyCat added the enhancement label Jun 6, 2018

Mingun mentioned this issue Sep 21, 2024

Unconsistent requirements for enum keys when they represented as numbers and as strings #1132

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow writing the char representation when creating enums #448

Allow writing the char representation when creating enums #448

lithp commented Jun 5, 2018 •

edited by generalmimon

Loading

KOLANICH commented Jun 5, 2018 •

edited

Loading

GreyCat commented Jun 6, 2018

lithp commented Jun 7, 2018

lithp commented Jun 7, 2018

webbnh commented Jun 13, 2018

webbnh commented Jun 13, 2018

GreyCat commented Jun 14, 2018

Allow writing the char representation when creating enums #448

Allow writing the char representation when creating enums #448

Comments

lithp commented Jun 5, 2018 • edited by generalmimon Loading

KOLANICH commented Jun 5, 2018 • edited Loading

GreyCat commented Jun 6, 2018

lithp commented Jun 7, 2018

lithp commented Jun 7, 2018

webbnh commented Jun 13, 2018

webbnh commented Jun 13, 2018

GreyCat commented Jun 14, 2018

lithp commented Jun 5, 2018 •

edited by generalmimon

Loading

KOLANICH commented Jun 5, 2018 •

edited

Loading