Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serializer to get the raw json value of a key? #1058

Open
tmm1 opened this issue Sep 9, 2020 · 12 comments
Open

Serializer to get the raw json value of a key? #1058

tmm1 opened this issue Sep 9, 2020 · 12 comments
Assignees

Comments

@tmm1
Copy link

tmm1 commented Sep 9, 2020

What is your use-case and why do you need this feature?

I am looking for a way to deserialize a specific field in my json into a raw byte array representing the value json sub-document.

Basically my json documents have large/complex json sub-trees that I would like to avoid parsing to save cpu/allocations. But I still need the value so I can re-create the original json if needed.

In golang, for example, this can be achieved with json.RawMessage: https://golang.org/pkg/encoding/json/#RawMessage

In gson, a type adapter can be used to regenerate json during parsing. This is not particularly cpu/gc efficient, but it works: google/gson#1368

In moshi, there is work being done to be able to skip over the value and consume it into a raw value field: square/moshi#675

Describe the solution you'd like

I'm not familiar enough with the kotlin.serialization APIs to know if there is already a way to do this, or if it can be implemented within a custom serializer. Any pointers would be appreciated!

@tmm1 tmm1 added the feature label Sep 9, 2020
@elizarov
Copy link
Contributor

elizarov commented Sep 9, 2020

Please, check out Json Elements: https://github.com/Kotlin/kotlinx.serialization/blob/master/docs/json.md#json-elements
Does it do what you are looking at?

@sandwwraith
Copy link
Member

If I understood correctly, you want to save some part of JSON to a String (RawJson) property so it would be parsed later? We do not currently support this concept. JsonElement is an untyped version that does not do mapping on classes, although it still performs parsing to check that your JSON is valid

@tmm1
Copy link
Author

tmm1 commented Sep 9, 2020

If I understood correctly, you want to save some part of JSON to a String (RawJson) property so it would be parsed later?

Yes, exactly. I want to defer parsing for parts of the document.

JsonElement is not a good fit because it still parses and creates objects. So the cpu/memory benefits of lazy parsing are lost.

How does kotlin.serialization handle ignores unknown keys when deserializing into an object? Are the keys skipped during or after parsing? (I'm wondering what the cpu/allocation overhead is in cases where keys are ultimately ignored)

@sandwwraith
Copy link
Member

The unknown keys are skipped without parsing (tokenizing only). However, the skipped string is not saved anywhere, so it requires some additional amount of work to support such a feature

@qwwdfsad qwwdfsad added the json label Sep 13, 2020
@qwwdfsad
Copy link
Collaborator

The feature seems like a reasonable addition, tho it still has some open questions.

Are the keys skipped during or after parsing?

Could you please elaborate on your use-case here? Because "put all unknown keys in a separate String property with valid JSON string" and "Treat specifically marked property not as simple String, but as a valid JSON encoded in String" are completely different approaches.

JsonElement is not a good fit because it still parses and creates objects. So the cpu/memory benefits of lazy parsing are lost.

I wonder if there exist benchmarks (or maybe you have a relevant story to add?) to ensure that the performance boost is significant here. Because even without allocations of JsonElement, parser still has to 1) parse the JSON and extract the relevant sub-object 2) ensure that the whole sub-object is a valid JSON. And the second part is probably the slowest in the whole JSON decoding process, so I'm really interested in knowing how big is the performance improvement here.

@qwwdfsad qwwdfsad self-assigned this Sep 13, 2020
qwwdfsad added a commit that referenced this issue Sep 13, 2020
Most of the users only use its stable methods and should not implement it directly, while we still want to have an opportunity to evolve it in the future, add new methods etc. E.g. the potential future addition could be 'decodeRawJson(): String' method for #1058
@tmm1
Copy link
Author

tmm1 commented Sep 13, 2020

"Treat specifically marked property not as simple String, but as a valid JSON encoded in String"

This is what I'm interested in and what is implemented by the other examples I provided.

I have one specific key in my json that contains a huge json subtree, with thousands of objects and several layers of nesting. I don't want to these create thousands of objects per json parse because it leads to severe GC pressure on many Android devices.

qwwdfsad added a commit that referenced this issue Sep 14, 2020
Most of the users only use its stable methods and should not implement it directly, while we still want to have an opportunity to evolve it in the future, add new methods etc. E.g. the potential future addition could be 'decodeRawJson(): String' method for #1058
@qwwdfsad
Copy link
Collaborator

qwwdfsad commented Sep 14, 2020

Thanks for the clarification and your input!

It's not something we are going to do right now (at least until 1.1.0 version), but thanks to your feedback, I've left the possibility to add this functionality in a backwards-compatible way both for custom serializers and regular JSON usages.
Let's see how it goes in Moshi and the demand on that.

Design idea: instead of using @RawJson annotation, introduce an inline class RawString(value: String) with its own custom serializer to provide a better type-safety and emphasis user intention in a type

@ankushg
Copy link
Contributor

ankushg commented Sep 21, 2021

@qwwdfsad If this functionality were something I'd be interested in contributing, do you have any pointers on where to start?

@brendan-gero-humanetix
Copy link

brendan-gero-humanetix commented Dec 1, 2021

@qwwdfsad

I'd like to add that there's a slightly different use case that I have, which is preventing me from switching to kotlinx.serialization. I have a situation where I would like to store the sub-object, as a JSON string, in a database, but I also want to deserialise it to examine its contents. Without RawJson, this means deserialising the whole JSON object, and then reserialising the sub-object for storage. Similarly, when serving the data again (potentially as part of a collection), I'd need to deserialise the sub-object before serialising the full object for output. I might be a bit naive, not having delved into the specifics of how this all works, but to me this seems to be a bit redundant. The string is already there as a substring of the original input, or will become a substring of the output.

I've tried to achieve this through a custom serializer, a bit like this:

object RawJsonSerializer : JsonTransformingSerializer<String>(String.serializer()) {

    override fun transformDeserialize(element: JsonElement): JsonElement {
        if (element !is JsonObject) {
            throw Exception("Expected schedule object")
        }

        return JsonPrimitive(
            polymorphicSerialiser.encodeToString(polymorphicSerialiser.decodeFromJsonElement<BaseClass>(element))
        )
    }

    override fun transformSerialize(element: JsonElement): JsonElement {
        if (element !is JsonPrimitive || !element.isString) {
            throw Exception("Expected schedule string")
        }
        return JsonObject(polymorphicSerialiser.decodeFromString(element.content))
    }
}

but I found that for large collections of data, this ended up slower than using Jackson with the JsonRawValue annotation. Is there a better way to achieve this?

@chakflying
Copy link

I'm following this guide to implement fallback for deserializing Enums. However, I would like to also log the raw value when it failed. Is there any way to get this from the decoder?

@sandwwraith
Copy link
Member

Unfortunately, we do not support retrieving raw values yet.

Also relevant: #1405

fviernau added a commit to oss-review-toolkit/ort that referenced this issue Jul 12, 2022
The server does return vulnerabilities which do not have a severity
value in the dedicated property. The unspecified `databaseSpecific`
property often times holds a primitive `severity` property with values
such as `[HIGH, MEDIUM, LOW]`. Make use of these values as a fallback as
these to provide more indication than a `null` value.

Note: The data model of 'osv/client' currently uses subtypes of
JsonElement to expose a couple of unspecified JSON objects as
properties. Accessing these requires the client code to add
'kotlinx.serialization' as dependency which is not nice. A solution to
that would be to use "raw" string values containing the JSON, which is
unfortunately not yet possible but may become so in the future, see
[1][2][3].

So, for now add 'kotlinx.serialization' as dependency to the advisor in
order to access the property and leave a FIXME comment as reminder.

[1] Kotlin/kotlinx.serialization#1298
[2] Kotlin/kotlinx.serialization#1405
[3] Kotlin/kotlinx.serialization#1058

Signed-off-by: Frank Viernau <[email protected]>
fviernau added a commit to oss-review-toolkit/ort that referenced this issue Jul 13, 2022
The server does return vulnerabilities which do not have a severity
value in the dedicated property. The unspecified `databaseSpecific`
property often times holds a primitive `severity` property with values
such as `[HIGH, MEDIUM, LOW]`. Make use of these values as a fallback as
these to provide more indication than a `null` value.

Note: The data model of 'osv/client' currently uses subtypes of
JsonElement to expose a couple of unspecified JSON objects as
properties. Accessing these requires the client code to add
'kotlinx.serialization' as dependency which is not nice. A solution to
that would be to use "raw" string values containing the JSON, which is
unfortunately not yet possible but may become so in the future, see
[1][2][3].

So, for now add 'kotlinx.serialization' as dependency to the advisor in
order to access the property and leave a FIXME comment as reminder.

[1] Kotlin/kotlinx.serialization#1298
[2] Kotlin/kotlinx.serialization#1405
[3] Kotlin/kotlinx.serialization#1058

Signed-off-by: Frank Viernau <[email protected]>
fviernau added a commit to oss-review-toolkit/ort that referenced this issue Jul 13, 2022
The server does return vulnerabilities which do not have a severity
value in the dedicated property. The unspecified `databaseSpecific`
property often times holds a primitive `severity` property with values
such as `[HIGH, MEDIUM, LOW]`. Make use of these values as a fallback as
these to provide more indication than a `null` value.

Note: The data model of 'osv/client' currently uses subtypes of
JsonElement to expose a couple of unspecified JSON objects as
properties. Accessing these requires the client code to add
'kotlinx.serialization' as dependency which is not nice. A solution to
that would be to use "raw" string values containing the JSON, which is
unfortunately not yet possible but may become so in the future, see
[1][2][3].

So, for now add 'kotlinx.serialization' as dependency to the advisor in
order to access the property and leave a FIXME comment as reminder.

[1] Kotlin/kotlinx.serialization#1298
[2] Kotlin/kotlinx.serialization#1405
[3] Kotlin/kotlinx.serialization#1058

Signed-off-by: Frank Viernau <[email protected]>
fviernau added a commit to oss-review-toolkit/ort that referenced this issue Jul 13, 2022
The server does return vulnerabilities which do not have a severity
value in the dedicated property. The unspecified `databaseSpecific`
property often times holds a primitive `severity` property with values
such as `[HIGH, MEDIUM, LOW]`. Make use of these values as a fallback as
these to provide more indication than a `null` value.

Note: The data model of 'osv/client' currently uses subtypes of
JsonElement to expose a couple of unspecified JSON objects as
properties. Accessing these requires the client code to add
'kotlinx.serialization' as dependency which is not nice. A solution to
that would be to use "raw" string values containing the JSON, which is
unfortunately not yet possible but may become so in the future, see
[1][2][3].

So, for now add 'kotlinx.serialization' as dependency to the advisor in
order to access the property and leave a FIXME comment as reminder.

[1] Kotlin/kotlinx.serialization#1298
[2] Kotlin/kotlinx.serialization#1405
[3] Kotlin/kotlinx.serialization#1058

Signed-off-by: Frank Viernau <[email protected]>
fviernau added a commit to oss-review-toolkit/ort that referenced this issue Jul 13, 2022
The server does return vulnerabilities which do not have a severity
value in the dedicated property. The unspecified `databaseSpecific`
property often times holds a primitive `severity` property with values
such as `[HIGH, MEDIUM, LOW]`. Make use of these values as a fallback as
these to provide more indication than a `null` value.

Note: The data model of 'osv/client' currently uses subtypes of
JsonElement to expose a couple of unspecified JSON objects as
properties. Accessing these requires the client code to add
'kotlinx.serialization' as dependency which is not nice. A solution to
that would be to use "raw" string values containing the JSON, which is
unfortunately not yet possible but may become so in the future, see
[1][2][3].

So, for now add 'kotlinx.serialization' as dependency to the advisor in
order to access the property and leave a FIXME comment as reminder.

[1] Kotlin/kotlinx.serialization#1298
[2] Kotlin/kotlinx.serialization#1405
[3] Kotlin/kotlinx.serialization#1058

Signed-off-by: Frank Viernau <[email protected]>
fviernau added a commit to oss-review-toolkit/ort that referenced this issue Jul 13, 2022
The server does return vulnerabilities which do not have a severity
value in the dedicated property. The unspecified `databaseSpecific`
property often times holds a primitive `severity` property with values
such as `[HIGH, MEDIUM, LOW]`. Make use of these values as a fallback as
these to provide more indication than a `null` value.

Note: The data model of 'osv/client' currently uses subtypes of
JsonElement to expose a couple of unspecified JSON objects as
properties. Accessing these requires the client code to add
'kotlinx.serialization' as dependency which is not nice. A solution to
that would be to use "raw" string values containing the JSON, which is
unfortunately not yet possible but may become so in the future, see
[1][2][3].

So, for now add 'kotlinx.serialization' as dependency to the advisor in
order to access the property and leave a FIXME comment as reminder.

[1] Kotlin/kotlinx.serialization#1298
[2] Kotlin/kotlinx.serialization#1405
[3] Kotlin/kotlinx.serialization#1058

Signed-off-by: Frank Viernau <[email protected]>
@iseki0
Copy link
Contributor

iseki0 commented Feb 20, 2023

Thanks for the clarification and your input!

It's not something we are going to do right now (at least until 1.1.0 version), but thanks to your feedback, I've left the possibility to add this functionality in a backwards-compatible way both for custom serializers and regular JSON usages. Let's see how it goes in Moshi and the demand on that.

Design idea: instead of using @RawJson annotation, introduce an inline class RawString(value: String) with its own custom serializer to provide a better type-safety and emphasis user intention in a type

Currently we have any way to achieve it? The documentation said I must provide a correct descriptor. But in this case I don't know which descriptor is suitable.
We use JSON format in a bad way. Deserialize the whole tree is impossible in my case.(It use too many memory.) I must hand-write a deserializer which need access the json token and build structure in my own way. So I need kotlin serialization just "skip" the suitable tokens and leave it to my own code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants