Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to serialize BigInteger and BigDouble? #1051

Closed
MMairinger opened this issue Sep 8, 2020 · 13 comments · Fixed by #2041
Closed

How to serialize BigInteger and BigDouble? #1051

MMairinger opened this issue Sep 8, 2020 · 13 comments · Fixed by #2041
Labels

Comments

@MMairinger
Copy link

Currently I'm in the need of serializing numbers with many decimal digits (30+ decimal digits to be more precise). To deserialzie a JSON file with this many decimal points I used Java's BigIntegers and BigDecimal types which works fine. The problem that arises is when I want to serialize that value. It will be serialized as an Integer or Double respectively which cuts and rounds the actual value.

One example value is this, received in a JSON file.
0.083532162010669708251953125000
After deserializing I will have exactly a value of that above as a JsonLiteral. But when I serialize this value I will get a result of:
0.08353216201066971
Which is not my desired result.
A snippet from my unit tests showing the differences between expected and actual:
image

My questions are now how should I serialized BigIntegers and BigDoubles and will there be support for this kind of data types in the future?

@elizarov
Copy link
Contributor

elizarov commented Sep 8, 2020

Can you, please, provide the example code you use to serialize and deserialize your data between JSON and BigIntegers?

@MMairinger
Copy link
Author

MMairinger commented Sep 9, 2020

var deserializedJson= json.decodeFromString(JsonClassSerializer, jsonToParse) var convertedKotlin= convertJsonToKotlin(deserializedJson) var convertedJson= convertKotlinToJson(convertedKotlin) var serializedJson= generateJsonOutputFromKotlin(convertedJson, json)

The above sequence is what I went through. I took a JSON file as String input, configured my Json object and then decoded that input into an object instance of some JsonClass containing properties like val x: JsonArray and so on.
Then I convert that object into a different a different type that contains useable data types like for val x: List<Map<String, Any?>> and stuff like that.
Then that instance is converted back into a JsonClass instance, because we can't serialize Any types.
Finally the json output is generated from that JsonClass and here the problem arises. Because the standard serializer for JsonArray writes Numbers as either Long or Double.

The initial deserialization is correct, the Json -> Kotlin conversion is correct, transforming it back from Kotlin -> Json is still correct but the standard serializer when you annotate your class with @serializable writes BigDecimal and BigIntegers as Double and Long respectively.

To clarify I did not use a custom serializer, for anything here. I simply let kotlinx handle the serialization of the intermediate type and then converted that to the actual types.

This is how I converted a JsonLiteral to BigInteger and so on:
internal fun convertJsonPrimitive(jsonPrimitive: JsonPrimitive): Any? { return if (jsonPrimitive.isString) jsonPrimitive.contentOrNull else jsonPrimitive.booleanOrNull ?: if (jsonPrimitive.longOrNull != null) // if the casted value is the same as the raw json data, then the data is within double/long range if (jsonPrimitive.longOrNull.toString() != (jsonPrimitive.content)) BigInteger(jsonPrimitive.content) else jsonPrimitive.long else if (jsonPrimitive.doubleOrNull != null) if (jsonPrimitive.doubleOrNull.toString() != (jsonPrimitive.content)) BigDecimal(jsonPrimitive.content) else jsonPrimitive.double else JsonNull }

Edit: The code block seems to be displayed weirdly.

@sandwwraith
Copy link
Member

Do you have any special requirement to convert your input to JsonElement and only after that to a Kotlin class? If not, then you can skip JsonElement step and parse json to a kotlin class directly via decodeToString. You'll need to write a custom serializer for big numbers, however, it will be relatively simple: only decodeString/encodeString calls. See the sample here: https://github.com/Kotlin/kotlinx.serialization/blob/master/docs/serializers.md#primitive-serializer

@MMairinger
Copy link
Author

Yes, in fact, directly converting them was my initial plan until I saw that you cannot directly serialize Any types. As you answered in some old issue that we should use JsonObject to serialize types of Map<String, Any> I then did the same thing for every other type that had Any in it, i.e. JsonArray for List and so on.

I also fideled around with custom serializers but I gave up eventually since I couldn't get it to work. The easiest solution was for me to simply let kotlinx do the deserialization and then convert it with simple methods to standard kotlin types. Same goes for serialization.

Thanks for the answer, so I need a custom serializer for that.

Is it possible for the deserialization/serialization of BigInts/Doubles to become a feature in the future so we no longer need a custom serializer for very long numbers?

@tadfisher
Copy link
Contributor

As it is, there is no way to encode a JSON number outside of Kotlin's primitive types, which means that it is impossible to encode a decimal value without inducing precision loss.

  • If you use encodeString, StreamingJsonEncoder will quote the value, regardless of the descriptor kind.
  • If you use encodeJsonElement with JsonPrimitive, the value's string representation will be converted using String.toDoubleOrNull(), not only inducing precision loss but formatting the value using engineering notation (e.g. 1.11222333444E11).
  • There is no mechanism to encode an unquoted string value that I can find, using either Encoder or JsonEncoder.

Bear in mind that ECMA-404 does not specify that JSON numbers must represent IEEE-754 values; they are simply strings of digits with optional fraction and exponent parts. From json.org:

number
    integer fraction exponent

integer
    digit
    onenine digits
    '-' digit
    '-' onenine digits

digits
    digit
    digit digits

digit
    '0'
    onenine

onenine
    '1' . '9'

fraction
    ""
    '.' digits

exponent
    ""
    'E' sign digits
    'e' sign digits

sign
    ""
    '+'
    '-'

JsonEncoder should expose a mechanism to write an unquoted JSON number, which would solve this particular issue and allow for the use of non-Number types.

@samuelchou
Copy link

samuelchou commented Jul 4, 2022

I find out that using isLenient configuration, combined with in-build String constructor of BigDecimal, can solve the precision loss problem here.

With a serializer to deserialize String to BigDecimal:

object BigDecimalSerializer: KSerializer<BigDecimal> {
    override fun deserialize(decoder: Decoder): BigDecimal {
        return decoder.decodeString().toBigDecimal()
    }

    override fun serialize(encoder: Encoder, value: BigDecimal) {
        encoder.encodeString(value.toPlainString())
    }

    override val descriptor: SerialDescriptor
        get() = PrimitiveSerialDescriptor("BigDecimal", PrimitiveKind.STRING)
}

And a Json instance with isLenient setting active:

val json = Json { isLenient = true }

Which gives an advantage to allow String without " mark. (i.e. can treat json number as String)

We can deserialize a data class like this

@Serializable
data class TestDateAndValue(
    val date: String,
    @Serializable(with = BigDecimalSerializer::class)
    val value: BigDecimal,
    val anotherValue: Double,
)

WITHOUT precision loss:

@Test
fun `kotlin Json Parse`() {
    val jsonString = """
        {
            "date": "20220704",
            "value": 1234.56789123456789,
            "anotherValue": 123.456789
        }
    """.trimIndent()
    val parse = json.decodeFromString<TestDateAndValue>(jsonString)
    assertEquals("20220704", parse.date)
    assertEquals(BigDecimal("1234.56789123456789"), parse.value)
    assertEquals(123.456789, parse.anotherValue)
}

And it doesn't really interfere the other number serialization, as shown above, anotherValue deserialized without problems.

It looks perfect to me. Just share with you guys :)

@tadfisher
Copy link
Contributor

tadfisher commented Jul 4, 2022

@samuelchou The problem is in the encoding; even with the lenient flag, serializing will quote the value, which the server then needs to support. There's still no way to encode a BigDecimal as an arbitrary JSON number: StreamingJsonEncoder doesn't have an option to emit unquoted strings or otherwise arbitrary content; and TreeJsonEncoder uses JsonPrimitive for encoding, which calls toDouble under the hood.

@samuelchou
Copy link

samuelchou commented Jul 5, 2022

Oh, sorry for my misunderstanding, and thanks for your explaining. I now see the problem.

@pdvrieze
Copy link
Contributor

pdvrieze commented Jul 5, 2022

There is a bit of a challenge here as Json does not explicitly restrict number size. It suggests that 64-bit numbers should be supported, but even that is implementation specific.
On the other hand, in cases where the json is generated externally it would be good if the system would be able to support deserializing arbitrary size numbers.

@pschichtel
Copy link
Contributor

I've written this to to parse BigIntegers (and similarly BigDecimals) from either JSON numbers or strings, without losing precision.

object BigIntegerSerializer : KSerializer<BigInteger> {
    override val descriptor: SerialDescriptor = PrimitiveSerialDescriptor("BigInteger", PrimitiveKind.INT)

    override fun serialize(encoder: Encoder, value: BigInteger) = encoder.encodeString(value.toString())

    override fun deserialize(decoder: Decoder): BigInteger = BigInteger(decoder.decodeString())
}

object LenientBigIntegerSerializer : JsonTransformingSerializer<BigInteger>(BigIntegerSerializer) {
    override fun transformDeserialize(element: JsonElement): JsonElement {
        if (element is JsonPrimitive && !element.isString) {
            return JsonPrimitive(element.content)
        }
        return super.transformDeserialize(element)
    }

    override fun transformSerialize(element: JsonElement): JsonElement {
        if (element is JsonPrimitive && element.isString) {
            return JsonPrimitive(BigInteger(element.content))
        }
        return super.transformSerialize(element)
    }
}

@tadfisher
Copy link
Contributor

@pschichtel Again, this issue is about serializing. Your code will still lose precision when encoding, because JsonPrimitive converts to Double under the hood.

@samuelchou
Copy link

samuelchou commented Jul 12, 2022

If the problem is cannot transform String value to Number value (i.e. cannot remove ") by json serializer / JsonPrimitive, how about generating JsonObject and turn it into a String, then use a Regex for replacing " (and turn it back to JsonObject by parsing)?

It might be tricky but I assume that would work.

@aSemy
Copy link
Contributor

aSemy commented Sep 30, 2022

I've created a PR #2041 that will allow for accurate encoding and decoding of BigDecimals

See this test for an example: https://github.com/Kotlin/kotlinx.serialization/blob/46a5ff60b21b85f0a1d98c66f4d077e86e405ea6/formats/json-tests/jvmTest/src/kotlinx/serialization/BigDecimalTest.kt

fred01 pushed a commit to fred01/kotlinx.serialization that referenced this issue Nov 24, 2022
This PR provides a new function for encoding raw JSON content, without quoting it as a string. This allows for encoding JSON numbers of any size or precision, so BigDecimal and BigInteger can be supported.

Fixes Kotlin#1051
Fixes Kotlin#1405

The implementation is similar to how unsigned numbers are handled.

JsonUnquotedLiteral() is a new function that allows creating literal JSON content.
Added val coerceToInlineType to JsonLiteral, so that JsonUnquotedLiteral could use encodeInline()
Defined val jsonUnquotedLiteralDescriptor as a 'marker', for use with encodeInline()
ComposerForUnquotedLiterals (based on ComposerForUnsignedNumbers) will 'override' the encoder when a JsonLiteral has the jsonUnquotedLiteralDescriptor marker, and will encode the content as a string without surrounding quotes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants