-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to decode more than one object from InputStream #1662
Comments
Really happy to see this question being asked. I can provide a concrete example: I'm implementing an App where a Mobile or Desktop client scans food product barcodes and I look up some basic product info from several upstream, 3rd party services. The general state of food product barcodes, is that no single 'source of truth' API exists; it's necessary to integrate with multiple 3rd party services to achieve 'good enough' coverage across even a typical mix of domestic groceries. So, my own back-end takes the barcode request from the client and will broadcast this request to multiple upstream API's concurrently. Now, these services each have their own varied response times, but I wish to update my client as each response comes in, as soon as possible, to achieve a well-performing user interface. It feels like a pragmatic solution for the response (from my back-end server to the client) to be a long-lived HTTP stream, on which I will 'stream' back all the up-stream responses, one after the other, until all upstream backends have provided their answers, then I'll close the response to the client. I expect this process to take around 2-8 seconds overall. I appreciate that was a verbose explanation; but maybe the context helps. TL;DR |
Our use case is ingesting >1GB JSON files provided by a vendor, read from the file system. For example: {
"items": [
{ ... }
]
} Presently, we're using Vert.x's JsonParser utility; however, being able to do this with Kotlin Serialization would be great. Vert.x JsonParser examplepackage foo;
import io.vertx.core.json.JsonObject
import io.vertx.core.parsetools.JsonEventType
import io.vertx.core.parsetools.JsonParser
import io.vertx.kotlin.core.file.openOptionsOf
import io.vertx.kotlin.coroutines.CoroutineVerticle
import io.vertx.kotlin.coroutines.await
/**
* Whether the [JsonParser] has begun parsing the "items" field.
*/
enum class ParseState { NOT_STARTED, STARTED, FINISHED }
class MainVerticle2 : CoroutineVerticle() {
override suspend fun start() {
var state = ParseState.NOT_STARTED
// Open a ReadStream of the large JSON file.
val asyncFile = vertx.fileSystem()
.open("/path/to/large.json", openOptionsOf())
.await()
// Parse the file's ReadStream.
val parser = JsonParser.newParser(asyncFile)
// Parse JsonObjects as a single value, rather than separate field/token events.
// e.g. VALUE->{"foo": ["bar"]} vs [START_OBJECT,VALUE->"foo",START_ARRAY...].
.objectValueMode()
parser.handler { event ->
when (event.type()) {
JsonEventType.START_ARRAY -> {
if (state == ParseState.NOT_STARTED && event.fieldName() == "items") {
// Indicate that we're parsing the "items".
state = ParseState.STARTED
}
}
JsonEventType.END_ARRAY -> {
if (state == ParseState.STARTED) {
// Stop the parser once all items have been read.
state = ParseState.FINISHED
parser.end()
}
}
JsonEventType.VALUE -> {
if (state == ParseState.STARTED) {
// Consume individual items.
val item = event.value() as JsonObject
// TODO: do something with item
}
}
}
}
}
} |
Our app interfaces with external (3rd party) hardware that sends multiple JSON messages over a TCP socket connection. Polymorphic JSON over a streaming connection, that we want to expose to the rest of our app as a Flow. |
I'd like to tail a JSON lines file from a potentially infinite stream. Examples of this format can be found here: https://jsonlines.org/examples/ and look like this: ["Name", "Session", "Score", "Completed"]
["Gilbert", "2013", 24, true]
["Alexa", "2013", 29, true]
["May", "2012B", 14, false]
["Deloise", "2012A", 19, true] Note that this format is somewhat different from the JSON spec. In practice we often use logback with https://github.com/logstash/logstash-logback-encoder for our logging, which writes each log event as a single-line JSON to some output such as a file or socket. |
My use case is using WebSocket streaming to drive the entire user interface.
The current implementation uses streams of So although this is a streaming scenario, serialization is currently dealing with single objects only (events enveloped in frames). A drawback of the current approach is that a large event might have to be split into (artificially created) smaller chunks before serialization, to allow for fine-grained backpressure (in-flight synchronization events) and stuff like progress indicators. Chunking can occur at the binary level, though. |
Our use case is that we call an https JSON REST service that retrieves all items for a user on a mobile phone. We save the full response to a local json file (we don't hold the stream open, or process data while we are downloading from the REST call), then we use gson to open the saved json file as a stream, and read each item individually and save each item to a local database (we never have more than 1 read item in memory at a time).
Example Code for reading/saving json data
|
My use case is that I have a large json file which I want parse object by object, accumulating the results into sensible chunks for a memory constrained system. Ideally the API would provide both low level parser hooks like gson does i.e.
Similarly it would be nice to see a high level coroutines wrapper exposing a Flow (in the above case Flow) which would allow me to use the built in combinators to batch and transform the stream in a concurrent manor. |
although it does not support top-level array wrapping See #1662
My use case is streaming a huge (hundreds of MB) JSON payload, containing potentially hundreds of thousands of entities, from a REST API call into a local SQLite database. We cannot afford the memory cost of inflating this payload into a single huge model object. Currently we do this with Gson's low-level |
There's a prototype in #1691. It supports 'plain' objects in the stream one after another or objects, wrapped in a top-level array. You can check it out and tell if it fits your needs. |
This is a meta issue for collecting various use-cases for designing an API that will allow to actually 'stream' objects from
InputStream
, instead of decoding a single one usingJson.decodeFromStream
.Please describe your use case in detail, if this is an HTTP long-polling stream or WebSocket, first-party or third-party API, or just a very large file.
The text was updated successfully, but these errors were encountered: