Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider switching the JSON3 #438

Open
ExpandingMan opened this issue Sep 1, 2021 · 12 comments
Open

consider switching the JSON3 #438

ExpandingMan opened this issue Sep 1, 2021 · 12 comments
Milestone

Comments

@ExpandingMan
Copy link
Contributor

This might be a fairly disruptive change, but I really think this package should consider dumping JSON for JSON3. JSON3 is lazy by default, very fast, and it might eventually be very useful here that it has integration with StructTypes.jl.

I've recently had a few cases where AWS returned lists of hundreds of thousands of items and JSON.jl is incredibly slow.

Might be able to help with this, but I wanted to gauge willingness. Seems likely it will involve a breaking change, but hopefully not "badly" breaking.

@omus
Copy link
Member

omus commented Sep 1, 2021

I think switching out the JSON library would be fine. One consideration is that the JSON dict key ordering shouldn't change as that can be annoying with IAM policies. As JSON3 already maintains the ordering I don't see any major issues with changing things.

Alternatively improving raw output support (#384) would allow you to use whatever JSON parsing library you prefer

@mattBrzezinski
Copy link
Member

This is something that I was thinking about recently. I wanted to benchmark the performances between these three packages;

I know LazyJSON is really really good. But it is also a package needs a lot of work to get it into a proper state. From my understanding it was more of a concept package. I'm definitely in favor of speeding things up, and converting to JSON3.jl should be pretty straight forward.

@ExpandingMan
Copy link
Contributor Author

JSON3 is definitely fast and it is already pretty widely used. It is also created and maintained by the incredibly prolific Jacob Quinn, so it is likely to remain very well maintained unless he is trampled to death in a freak llama stampede or something.

I do agree that perhaps we should think carefully about what we can lay on top of #384 (though I think JSON3 would be good to have in any case). The AWS API is very... well, a polite way to put it would be pedantic, but it's also not very consistent. There are about a trillion specific dict formats which they treat like full-blown types. It's all very nested and Java-y. While it doesn't seem like it would be any fun at all to try to generate them, some more restructuring of the library might be nice. For example, if params could be given as keyword arguments instead of in the dict. Sometimes the separation between main arguments and params in the AWS.jl don't make much sense either, but I suppose this is probably something we inherit from the AWS REST API.

@mattBrzezinski
Copy link
Member

JSON3 is definitely fast and it is already pretty widely used. It is also created and maintained by the incredibly prolific Jacob Quinn, so it is likely to remain very well maintained unless he is trampled to death in a freak llama stampede or something.

Yeah that's a good point. Even if LazyJSON were faster, does anyone really want to maintain it? Unless it's some how substantially faster, using JSON3 seems like the correct move here.

@mattBrzezinski
Copy link
Member

if params could be given as keyword arguments instead of in the dict.

Yes! I think this is a really wanted change and will make everyone's lives better. It shouldn't be too challenging to convert this into some form of kwargs.

@ExpandingMan
Copy link
Contributor Author

Yeah... I was going to bring it up before but I don't think there's any chance the interface to this package is ever going to be "nice". I think the general lousiness and Javaness of AWS's API dooms us to needing tons of wrapper packages 😨

Anyway, would still be nice.

@mattBrzezinski
Copy link
Member

mattBrzezinski commented Sep 1, 2021

Yeah... I was going to bring it up before but I don't think there's any chance the interface to this package is ever going to be "nice". I think the general lousiness and Javaness of AWS's API dooms us to needing tons of wrapper packages 😨

Anyway, would still be nice.

Yeah... I think AWS.jl will always end up being the barebones format where you'll need to (some degree) jump around Dicts.

Building out packages such as AWSS3.jl to have these convenience functions that hide away those parts is do-able. But I don't see a way that you can auto-generate that stuff? Unless it's already pre-written in another language and we convert into Julia (or use something underlying, but I don't think there's a well-formed AWS C library?).

@ExpandingMan
Copy link
Contributor Author

Yeah, there's really no way to do it. I think the only possible solutions are:

  1. Build a super sophisticated AWS.jl that consumes AWS API spec and docs and is somehow magically so good that it handles the horrible inconsistency thereof and somehow miraculously results in a package that's nice to use.
  2. Write a bunch of secondary wrapper packages as needed.

I think we'll probably be going with 2 lol

@omus
Copy link
Member

omus commented Sep 2, 2021

One gotcha with switching to JSON3 is that the default parsing behaviour when parsing JSON and XML will have very different output unless we make a StructTypes style XMLDict.

@ExpandingMan
Copy link
Contributor Author

Ugh, I keep forgetting how so much of their API relies on XML. Yuck.

My vote would be to still go to JSON3 and just do whatever we can to mangle the XML stuff into a similar output (e.g. use StructTypes) because none of the available XML parsers are nearly as nice as JSON3, in my opinion.

Maybe we can berate Jacob Quinn into writing an XML parser too 😆

bors bot added a commit that referenced this issue Sep 29, 2021
384: Use `AWS.Response` to handle streaming/raw/parsing r=omus a=omus

The idea is to use a struct in AWS.jl that can be used to handle the automatic parsing that is currently used to turn XML/JSON into a dict while also giving the option of accessing the raw output as a string or stream without all the keywords currently needed to be specified.

Depends on:
- #457

Related:
- #346
- #348
- #438

Closes:
- #224
- #433
- #468 (when using `use_response_type` the passed in I/O is not closed)
- #471
- #433

Update: The tests in this PR run using the deprecated behaviour. Mainly I did that here to prove this change is non-breaking. For the updated tests see #458 

Co-authored-by: Curtis Vogt <[email protected]>
@omus
Copy link
Member

omus commented Oct 1, 2021

As of AWS.jl 1.63 (#384) you can control how AWS.jl parses responses when using the feature use_response_type (e.g. @service S3 use_response_type=true). You just need to define your own read function which can handle the same MIME-types as defined here:

function _read(io::IO, ::MIME"application/xml")
xml = parse_xml(read(io, String))
root = XMLDict.root(xml.x) # Drop XML declaration
return xml_dict(root, LittleDict{Union{String,Symbol},Any})
end
function _read(io::IO, ::MIME"application/json")
# Note: Using JSON instead of JSON3 since it does not support OrderedDict/LittleDict
return JSON.parse(io; dicttype=LittleDict{String,Any})
end
_read(io::IO, ::MIME"text/plain") = read(io, String)
_read(io::IO, ::MIME) = read(io)

and then do parse(my_read_function, aws_response).

We can still internally switch over to JSON3 but we'll need to deal with parsing XML responses in a similar way to JSON3.

@omus omus added this to the 2.0 milestone Dec 2, 2021
@omus omus mentioned this issue Dec 2, 2021
@omus
Copy link
Member

omus commented Dec 2, 2021

I think we should probably switch to JSON3 as the default for AWS.jl@2. We still need to deal with the XML bit but we could probably do something inefficient to start with that returns a JSON3 like structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants