-
-
Notifications
You must be signed in to change notification settings - Fork 798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for non-blocking ("async") JSON parsing #57
Comments
+1. Please implement it if you can. |
Heh. Unless I have personal need, or someone pays me to do it, I doubt I'll ever work on this. But perhaps someone else has the itch. Always willing to help. |
+1 I'm using Aalto right now to parse XML - I've bolted it onto an Akka IO/Scala Delimited Continuations framework that I wrote that allows you to access the API as if it were regular blocking StAX code. I'd love to be able to migrate our JSON handling to the same framework. |
For what it is worth, I did start writing async parser for Smile backend, as that would be marginally easier. @dpratt I'd be very interested in learn more about your use of Aalto, including feedback, challenges, things you learnt. Maybe you could send email to my gmail address (tsaloranta)? This could even eventually help with JSON equivalent, if things worked out well. What I really need more than anything else is someone to collaborate, to make it less likely I write something that does not get used. |
I wrote a simple JSON non-blocking parser (as a part of a parser for Javon - Json's extention): https://github.com/rfqu/Javon . I can help to embed it into Jackson codebase. |
We are always happy to accept contributions, if you want to tackle this problem. In case of Jackson, interface would need to be via |
If you mean com.fasterxml.jackson.core.JsonParser, then it is technically impossible. It contains methods like nextToken(), which would block until input data are available, destroying the idea of non-blocking parser. The interface has to be turned inside out: parser should have method putChar(char nextChar), and transfer parsed data to a next stage which has interface similar to JsonGenerator. |
Why not have nextToken return the (already defined and standard Jackson On Tue, Aug 6, 2013 at 3:06 PM, Alexei Kaigorodov
|
It is defined, but not used. And I cannot see easy way to use it: when the consumer should awake and try to read next token again? It should be notified some way, but then it's natural to pass new token with the notification. Then, how do you think the consumer should save and restore its state, if it calls to nextToken () from recursive methods? |
I've used the same pattern quite successfully using Aalto - basically, you On Tue, Aug 6, 2013 at 4:40 PM, Alexei Kaigorodov
|
@rfqu Please have a look at Aalto, like @dpratt suggested, if you have time -- yes, minor modification is needed, but @dpratt Yes, exactly! There are challenges at higher levels, so data-binding would need to change a lot, from pull- to push model most likely. But at core level it is doable. |
@rfqu On how caller knows more is available: typically (at least with Aalto), caller both feeds new data and iterates; so it will feed a chunk of bytes, then iterate over tokens until there are no more available. Then get new data (possibly waiting, via NIO callbacks). I wrote a bit about using Aalto in non-blocking mode 2.5 years ago, see: http://www.cowtowncoder.com/blog/archives/2011/03/entry_451.html |
I can say that having at the very least a JsonParser implementation that We could even implement a JsonParser2 interface that extends JsonParser and On Tue, Aug 6, 2013 at 5:39 PM, Tatu Saloranta [email protected]:
|
I looked Aalto and found that to save its state, parser defines several dozens of constants. It is no fun to program that way. |
@rfqu You are free not to contribute, I was just outlining what kind of contribution would make sense, based on YOUR offer to help. I don't know where you get that pissy attitude however; that is unnecessary. I wish you best of luck with your projects however; and given that it can do non-blocking parsing the way you like it, I hope interested users find it. As to Aalto: it is just a (conceptually!) simple state machine; and if there was a state machine generator to use, it'd be much simpler to write. I don't greatly care if something is simple for implementors to write; pull-style is much easier for users to use and that what counts most to me. But as I said, you are free to explore other options that are more to your liking. |
@cowtowncoder to pop the stack on this conversation, I just wanted to insert that I've been really impressed with Aalto so far. It's super fast, and works exactly the way I want it to. I've actually given up on my original idea of using continuations with an XMLStreamReader shim - Scala's implementation of continuations imposes too much of a burden on clients, and it was really hard to make something that I'd be willing to use as a day-to-day parser. I've gone a different track of using a variant of Iteratees. It works really well and is fairly speedy, but it still suffers from a few performance issues, and I don't know if any of them are really addressable.
Since you've done this before, is there anything in specific I should be concerned about w.r.t. processing byte[] streams? |
I splitted Javon project in Javon itself and independent pure Json project, now it is at https://github.com/rfqu/df-codec |
@dpratt I think we should continue discussion (very good feedback btw) at Aalto users list at http://tech.groups.yahoo.com/group/aalto-xml-interest/ @rfqu Thank you for the link & good luck -- I honestly think it is good to have different impls, approaches and wish you good luck with your work on Javon. |
Is there a way to get this supported? I saw something implemented in jackson-smile. |
@testn Yes, by someone with lots of time to spend on implementing and testing it. Smile unfortunately only has skeleton (or, scaffolding), not full implementation. I know how it can be done (see |
Can you describe a bit how it should be done? Maybe I will take a stab at it. |
The way I did it with Aalto: https://github.com/FasterXML/aalto-xml was to basically do two things:
Of these, (1) is trivially simple, although I had to rewrite it a bit to support alternatives like feeding Second part is the complicated part: there needs to be state associated with all possible different state throughout decoding, at byte accurate level. Not just within tokens, but since this has to assume multi-byte UTF-8 decoding as well, within UTF-8 characters. An alternative could be to separate UTF-8 decoding, and this might simplify things a bit, but with some performance overhead. If so, parser would work with Regardless, amount of state to track with JSON is less than with XML, so that's a slight simplification. So... yeah, it is bit involved. I'm sure there are other approaches too, but in general state machine approach should work well. Others would probably use a state machine library or compiler; it could simplify the task. I am not sure whether it would, but since you'd be starting from the scratch it might make sense. |
Hello, I'm interested in asynchronous I/O too, but from a different angle. I still want the traditional synchronous Jackson API, but internally I want Jackson to do something conceptually like this: Token getNextToken() {
startReadingDataIntoBuffer();
blockUntilWeCanReadAToken();
//data continues streaming into buffer in the background
//until buffer is full or EOF is reached
return parseAToken();
} I guess my big question is: Could this pattern make Jackson faster by better saturating whatever I/O connection I'm using? I'm trying to stream some fairly large JSON files, and my gut feeling is that I'm not saturating my data connection because of the time the CPU spends on parsing between read() calls. Is there anyone around more knowledgeable than me who can comment on both whether my hunch is correct, and, if so, whether any possible performance gains are likely to be worth the engineering effort? If anyone has any ideas for resources I could look at (I already spent half an hour with Google without luck) or benchmarks I could write to answer my own question, I'm all ears. William Tracy |
@wtracy as far I understand, you want to continue to load data while parsing. That is, to load data and parse them in parallel. If so, the simplest way is to start a separate thread for data loading, and to use a circular buffer to connect loading and parsing threads. Probably, loading thread should do also token recognition and fill the buffer with tokens, not characters. |
@wtracy isn't that just like existing regular blocking parser? This is how Now, I am guessing like @rfqu that you may be wishing to continue loading in the background. If so, I think that multi-threaded background reading should reside outside Jackson core, and abstracted out behind |
@cowtowncoder you've convinced me that I probably based this whole idea on I think I should at least spend some more time understanding the internals William
|
Quick note: I am working on non-blocking parsing for Smile format, and hope to follow it up here for JSON, to be included in 2.9.0. |
Actson: https://github.com/michel-kraemer/actson is an async implementation of a JSON parser |
Thank you for sharing this. API looks similar, probably due to common roots via Aalto xml parser. |
Fwtw, Smile codec has fully functioning non-blocking implementation; |
Hi Tatu,
Nice to know, I will leverage this in Spring WebFlux support for Smile
expected for Spring Framework 5.0 RC3 [1].
Do you have more visibility about similar support for JSON which is
critical for our approaching GA? Do you plan to make it part of Jackson 2.9
RC for example?
Best regards,
Sébastien Deleuze
[1] https://jira.spring.io/browse/SPR-15424
…On Wed, May 24, 2017 at 12:52 AM, Tatu Saloranta ***@***.***> wrote:
Fwtw, Smile codec has fully functioning non-blocking implementation;
JsonParser now exposes necessary methods (for byte array -backed input).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#57 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAInNe5tJ7t9ADsBavk1JBpzMYxFUV2nks5r82M3gaJpZM4AaIA1>
.
|
@sdeleuze I am working on JSON async right now, and one thing I have to decide is whether to release one more pr (2.9.0.pr4), or take my chances with 2.9.0 final. If I get to push pr4 within a week or so (hopefully next weekend; probably no sooner), with json non-blocking, would that allow you to test it? Ideally I would have last pr as close as possible to eventual release, with only smaller fixes and new features. But there is some to releasing too. |
Awesome thanks! Yes sure, that would enable us to test it asap and
hopefully to leverage it in our RC3.
…On Tue, May 30, 2017 at 12:40 AM, Tatu Saloranta ***@***.***> wrote:
@sdeleuze <https://github.com/sdeleuze> I am working on JSON async right
now, and one thing I have to decide is whether to release one more pr
(2.9.0.pr4), or take my chances with 2.9.0 final. If I get to push pr4
within a week or so (hopefully next weekend; probably no sooner), with json
non-blocking, would that allow you to test it? Ideally I would have last pr
as close as possible to eventual release, with only smaller fixes and new
features. But there is some to releasing too.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#57 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAInNRvcNo5iJUu2J-DR54JTA-xzh37Dks5r-0legaJpZM4AaIA1>
.
|
And with latest commit, non-blocking parser now works well enough to pass I still need to work a bit on non-standard features (comments, various quoting alternatives), as well as just porting more tests. But things are looking good. |
…complete wrt non-standard features, but functional
@sdeleuze Finally got pr4 out (should be at Maven Central now) |
Awesome thanks, we are going to try to use the JSON non-blocking feature
next week, we will keep you informed.
…On Sat, Jun 17, 2017 at 7:03 AM, Tatu Saloranta ***@***.***> wrote:
@sdeleuze <https://github.com/sdeleuze> Finally got pr4 out (should be at
Maven Central now)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#57 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAInNUpapdqGg8hYp0iRIQiKp1Kqdlq6ks5sE14JgaJpZM4AaIA1>
.
|
@sdeleuze Sounds good -- I tried to add reasonable testing, but that'd only help verify it works the way I want to, not that it is good or fit :) |
@cowtowncoder We are going to try to leverage this during the week and send you feedbacks cc @poutsma. |
Using actson to provide async parsing for jackson: https://github.com/mmimica/async-jackson |
(migrated from http://jira.codehaus.org/browse/JACKSON-39 -- note, high vote count)
(suggested by Dimitri M on user list)
There are use cases where it'd be good to be able to feed input to parser, instead of trying to provide an input stream for parser to read from. This would cover use cases where input comes in chunks; for example, part of logical document in one chunk, then after a delay (perhaps in a separate request) another one and so forth. In these cases it may be difficult to implement InputStream (Reader etc) abstraction; instead, it would be better if application could feed (push) data to parser.
But if so, parser must be able to indicate cases where no data is YET available (but may become available).
This is similar to how Aalto Xml processor (http://www.cowtowncoder.com/hatchery/aalto/index.html) operatesd in its async mode. However, since Json is much simple format than xml, implementation might be simpler.
Based on my experiences with Aalto, implementation is a non-trivial thing however. One problem is that even UTF-8 decoding needs to be somewhat aware of chunk boundaries, so in the end a separate parser may be required: this because current parser uses blocking to handle these split cases. A smaller problem is that of indicating "not-yet-available" case – this can probably be handled by introducing a new member in JsonToken enumeration.
The text was updated successfully, but these errors were encountered: