Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support decoding partial streams #1789

Closed
martinbonnin opened this issue Dec 7, 2021 · 7 comments
Closed

Support decoding partial streams #1789

martinbonnin opened this issue Dec 7, 2021 · 7 comments
Labels

Comments

@martinbonnin
Copy link
Contributor

martinbonnin commented Dec 7, 2021

The following test:

  @Serializable
  class SimpleObject(val name: String)

  @Test
  fun testDecode() {
    val inputStream  = """
    {
      "name": "Object1"
    }
    {
      "name": "Object2"
    }
    """.trimIndent().byteInputStream()
    println("1")
    println(Json.decodeFromStream<SimpleObject>(inputStream).name)
    println("2")
    println(Json.decodeFromStream<SimpleObject>(inputStream).name)
  }

fails with:

Unexpected JSON token at offset 25: Expected EOF after parsing, but had { instead
JSON input: kotlinx.serialization.json.internal.ArrayAsSequence@2c4ca0f9
kotlinx.serialization.json.internal.JsonDecodingException: Unexpected JSON token at offset 25: Expected EOF after parsing, but had { instead
JSON input: kotlinx.serialization.json.internal.ArrayAsSequence@2c4ca0f9
	at app//kotlinx.serialization.json.internal.JsonExceptionsKt.JsonDecodingException(JsonExceptions.kt:24)
	at app//kotlinx.serialization.json.internal.JsonExceptionsKt.JsonDecodingException(JsonExceptions.kt:32)
	at app//kotlinx.serialization.json.internal.AbstractJsonLexer.fail(AbstractJsonLexer.kt:524)
	at app//kotlinx.serialization.json.internal.AbstractJsonLexer.fail$default(AbstractJsonLexer.kt:523)
	at app//kotlinx.serialization.json.internal.AbstractJsonLexer.expectEof(AbstractJsonLexer.kt:163)
	at app//kotlinx.serialization.json.JvmStreamsKt.decodeFromStream(JvmStreams.kt:66)

Is there a specific reason to mandate a trailing EOF? Being able to parse partial streams would be nice. For an example to extract a Json from a markdown file or to read an array of objects without comas like the example above.

PS: there could be a decodeFromStreamFully() that also closes the inputStream

@sandwwraith
Copy link
Member

Behaviour of decodeFromStream reflects decodeFromString, which also requires fully-formed string with no dangling characters.

You can try to use decodeToSequence if you need consume stream lazily or partially, I think it would suite your needs

@martinbonnin
Copy link
Contributor Author

Behaviour of decodeFromStream reflects decodeFromString, which also requires fully-formed string with no dangling characters.

Any reason decodeFromStream doesn't close the inputStream in that case? That'd save potential leaks if the caller forgets to wrap in .use {}?

You can try to use decodeToSequence if you need consume stream lazily or partially, I think it would suite your needs

Yup, that'd work for the use case above, thanks 👍 .

@sandwwraith
Copy link
Member

reason decodeFromStream doesn't close the inputStream in that case

Most of the frameworks don't do that either. The general agreement is that it is the stream creator's responsibility to close it unless there are some strong objections.

@martinbonnin
Copy link
Contributor Author

martinbonnin commented Dec 8, 2021

Most of the frameworks don't do that either. The general agreement is that it is the stream creator's responsibility to close it unless there are some strong objections

I agree it's fine to have the caller close the stream. My point is that if it's the case, it feels weird enforcing a trailing EOF.

For an example, Moshi behaves differently for String and BufferedSource. Because the caller owns the BufferedSource (and closes it), the deserialization lib shouldn't make any assumption about how the stream is used. A caller might want to resume where the deserialization stopped.

The following code works and outputs both objects:

  @Test
  fun deserialize() {
    val value  = """
    {
      "name": "Object1"
    }
    {
      "name": "Object2"
    }
    """.trimIndent()
    val buffer = Buffer().writeUtf8(value)
    val adapter = Moshi.Builder().build().adapter(SimpleObject::class.java)
    println("1")
    println(adapter.fromJson(buffer))
    println("2")
    println(adapter.fromJson(buffer))
  }

@qwwdfsad
Copy link
Collaborator

qwwdfsad commented Dec 8, 2021

I agree it's fine to have the caller close the stream. My point is that if it's the case, it feels weird enforcing a trailing EOF.

The key reason here is to InputStream being the wrong abstraction.

In order for JSON decoding to be efficient (or, at least, to be not terrible), IS has to be read in a buffered manner, otherwise, all the decoding does is round-tripping between InputStream.read() calls byte-by-byte.

As soon as one starts buffering, there is no way out -- when the JSON object is read, we cannot leave the stream as is, because we could've read the piece of the next object that will be lost for good. IS neither provides an API "read to this buffer without consuming it + commit the offset" nor "return this N bytes back to the stream".

Taking this into account, we basically have three options:

  • Do nothing. This is the worst one, as it leaves IS completely unusable for further reads
  • Close it when the decoding is done. It may or may not be convenient, e.g. because the API can be in the form "here's the IS that has a well-formed JSON object. For the next one ask me explicitly for more and do seek"
  • Just ensure that EOF is reached to let the owner of the stream decide what to do

@martinbonnin
Copy link
Contributor Author

The key reason here is to InputStream being the wrong abstraction.

Aaaaah right, good call 👍

Taking this into account, we basically have three options

Can we have a 4th option that takes a okio.BufferedSource as a parameter just like Moshi? That might even help with MPP streaming?

@qwwdfsad
Copy link
Collaborator

qwwdfsad commented Dec 8, 2021

Can we have a 4th option that takes a okio.BufferedSource as a parameter just like Moshi? That might even help with MPP streaming?

That's something we're planning to work on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants