-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: simplify OCaml code generation by using Yojson AST -> atdml #313
Comments
4x performance impact sounds scary. But some absolute numbers would be interesting to see, maybe compared with JSON parsing performance in other GC languages, like Golang or Java.
Is this interface something like Simple API for XML (SAX)? SAX parsers exist in XML world and are the only option to get decent performance/memory footprint when parsing large number of XML documents. |
Looking at the generated ml code will give you an idea.
and it keeps going with large amounts of non-obvious code that was optimized for speed. Somewhere in there, there's a call to |
I've been working for 4 days on adding support for imports in atdgen and I'm still not done. The implementation is a giant mess for which I'm responsible. The same task for atdpy took me a few hours. Things that make it hard to work with the current atdgen implementation include:
My recent experience with writing atdpy (Python JSON support) and atdts (TypeScript JSON support) has been great because the focus was on basic functionality and not implementing things that are complicated to implement. So it would be really neat from a maintainer's but also from a user's perspective to have a simpler atdml tool that's based on the same model. Let me recapitulate the features that I think are essential:
Features to avoid include:
|
Let's do it! |
I'd like to hear what features people rely on, what works well for them, what doesn't work so well, etc. Before rushing into an implementation, let's come up with a list of supported and unsupported features that we want in atdml. I'll start one here based on my experience and known mistakes. Supported legacy features
New features
Legacy features that won't be supported by design
Features remaining unsupported by design
Unsupported legacy features due only to budget limits
|
@Lupus wrote:
The atdgen manual gives 3x. I believe it is an accurate average of read and write performance. I don't think it's that big unless all your application does is read JSON data, inspect a single field, and write it back. We had this pattern in a big 25-step map-reduce pipeline running on Hadoop back in the day. For this, we used mostly Biniou rather than JSON because it is again something like 4x faster. Taking care of such optimizations matters in a large-scale context, so it could make sense for a big data company to invest in this to reduce energy and/or hardware costs. It doesn't make sense to me at the moment or to my current employer (r2c), so I'd rather leave optimizations possible but secondary. |
We use inheritance A LOT. |
This is going to be a giant pain to update a large codebase? Hundreds of types or thousands or usages to update. |
Good to know. It shouldn't be difficult to add without adding complexity. |
Users would need to add the annotation
If ATD files tend to contain many different variant type definitions, I think a good solution would be to support a global setting in the head part of the file:
|
Global annotations would be useful also in atdpy to avoid all those dataclass annot on every types |
The goal is to simplify the maintenance of both yojson and atdgen. Yojson is the library used to parse JSON from OCaml code generated by atdgen. Atdgen is the command-line tool that generates OCaml code that implements the mapping between raw JSON data and OCaml types. Currently, atdgen calls semi-secret parsing functions provided by yojson and constructs idiomatic OCaml data directly without going through a JSON AST. It was done this way to achieve the best performance but it turns out to be harder to maintain than if we were going through an AST. Yojson already provides such an AST in the form of the type Yojson.Safe.t. See ocaml-community/yojson#151 for more context from the yojson perspective.
Proposal
Reimplement the code generation parts of atdgen for parsing JSON (files starting with
oj_
) so they rely only on the usual yojson AST type (Yojson.Safe.t
) and on the few JSON parsing functions that ordinary users would use (from_string
,from_file
,from_channel
,from_lexbuf
).To get a sense of the function calls that would get removed, run
git grep Yojson.Safe.read_
on the atd repository.Expected impact
Request for comments
Please tell us how this would affect you.
Relevant questions include:
The text was updated successfully, but these errors were encountered: