-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit serialization format decision #144
Comments
Another simple way of viewing this (and one of the main questions I feel we need to answer ASAP post-adoption), is what the main goal of qlog is:
(I think it was @nibanks who put it like this, but for the life of me, I can't find the source). If 1), we probably need to go full binary and optimize for write speed/size. 2) is imo what we have now, with a relatively verbose JSON setup. If we want to go for both at the same time (which I'm not sure is possible, might lead to "worst of both worlds") we'd probably end up with something like CBOR. In my personal experience, it's the tooling part that's difficult and something that most parties don't want to invest in themselves/don't have the necessary expertise for in the networking/protocol teams. Making tooling easier (e.g., even using things like |
I think we can create translation libraries/tools easily enough between formats. The question then becomes what do you want to do? a) Optimize for tools - Standardize around a single format (JSON most likely?) that tools can easily use. QUIC implementations then have the choice to either write directly to that format, or have a custom format, and a custom post processing tool to convert to the tool format. b) Optimize for implementations - Standardize around a binary format (and helper library(s)?) for QUIC implementations to efficiently write to. Tools can either read from that format or perhaps someone could write a helper library to convert to some more easily consumed format. While I like the idea of trying to optimize for implementations, I wonder if it's just going to open a bigger can of worms. A lot of folks will have strong opinions here, and likely already have some solution they use for other performant logging anyways. The more I think about it, I kind of lean towards a). Implementations can (and will) do what they want. If JSON is optimal enough for them (it seems it is for FB?) then they can use it. If they want something more/different, they can implement a translation layer/tool (what MsQuic partially has). And best of all, it makes the standardization process simpler. One last thought though, is that while web-based solutions are currently the reason the tools are slow, I do expect JSON parsing to be significantly slower than a binary format, especially at GB file sizes. It'd be interesting to see a perf comparison. The best way I could think is to add qlog file format support to WPA, and update MsQuic's qlog translation layer. Then grab a large trace from MsQuic (binary ETW) and convert it to qlog. Then separately open the binary version and the qlog version and measure how long either takes. |
One other option to consider is the use of / overlap with the PCAP-NG format typically used for packet captures. This is apparently being considered for adoption on opsawg and might be flexible enough to include the endpoint-specific data we want to add to the mix. I still need time to analyze what PCAP-NG actually does, but initial discussion on this was on the mailing list at: https://mailarchive.ietf.org/arch/msg/qlog/2bSRgRdaRleLhTDFv_C4DYZ3zng/ One benefit of this would be that we can easily log raw (encrypted) packets along with endpoint data as in normal .pcaps (though I'm not sure how useful that is). A downside is that the format is (AFAIK) barely supported outside of tools like wireshark (e.g., no easy mature open source parsers available, though I could be completely wrong on that count). |
It might not be the encrypted packets you are logging, it might be the DNS requests (whether encrypted or not), or even the ICMP packet too big, or the ICMP port unreachables. |
As in implementer, why would I modify my endpoint to take a packet capture when e.g. tcpdump can already do that? |
Because tcpdump can't capture (a) the cleartext packets inside the QUIC/TLS, (b) your state transitions. |
In my experience, my QUIC client application fails to resolve a name, I don't even make a UDP socket or QUIC connection object. My client stderr log throws an error message, possibly reporting the error returned by my resolution syscall. Combining with wire packet captures seems to have marginal value when those packets don't contain more information than is available to the client. |
Discussed during IETF 111. The feeling in the room was to stick with JSON serialization as the canonical interop format for qlog. Use of JSON does not prevent other serialization formats but we can constrain our scope of work to focus on one in this set of deliverables. |
For clarification, the present specifics of the document's JSON serialization definitions are a starting point for further development should the WG declare consensus on using JSON. The discussion about streaming serialization (whether NDJSON or some other format) is separate, so I've created #172 . |
In draft-02, we decided to keep using JSON as the main serialization format, with NDJSON as a streaming option. We did update the data definition language used in the draft to make it easier to define custom serializations into e.g., binary formats, if others want a more performant option.
It might be needed to revisit this decision and to still go for a binary format by default. CBOR has so far been named most as a potential option, as it's an IETF standard (as opposed to e.g., protocol buffers or flatbuffers and similar) and has proved itself for other protocol-related use cases as well.
The question remains: do we consciously limit ourselves to a select few serialization formats? and even if we don't, which "default" formats do we commit to in the texts?
Several people are of the opinion it's enough to stick with for example JSON as the main format in the qlog specification and to have people write converters from/to JSON for more performant options themselves. This because even with something like CBOR, not all applications will want to employ that directly, and converters would still be needed.
The text was updated successfully, but these errors were encountered: