Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to decode numeric literals containing an exponent #2227

Merged
merged 19 commits into from
May 17, 2023

Conversation

xBaank
Copy link
Contributor

@xBaank xBaank commented Mar 12, 2023

Closes #2078

@sandwwraith
Copy link
Member

sandwwraith commented Apr 20, 2023

Hi, thanks for your PR, and sorry it has taken so long to get to it. I see you have some other commits beside yours, did you start your branch off master? Please rebase it on dev.

Regarding contents: this is really nice addition IMO, as json.org spec indeed permits exponent numbers. However, to fully support them, you also need to consider JsonElement parsing — e.g. conversion from JsonElement to a serializable class. (decodeFromJsonElement<SomeData>(parseToJsonElement(string))) As far as I see, it uses simple String.toLong() conversion and will also fail on numbers with exponent. To test behavior in all available modes, you may use parameterizedTest function from JsonTestBase — see its usages. Unfortunately we can't accept your PR without supporting both these modes, as it will make Json parsing inconsistent.

Copy link
Member

@sandwwraith sandwwraith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@xBaank
Copy link
Contributor Author

xBaank commented Apr 20, 2023

did you start your branch off master? Please rebase it on dev.

I actually started it on dev but synced my branch from github and pulled the master changes accidentally

However, to fully support them, you also need to consider JsonElement parsing — e.g. conversion from JsonElement to a serializable class. (decodeFromJsonElement(parseToJsonElement(string))) As far as I see, it uses simple String.toLong() conversion and will also fail on numbers with exponent.

I will take a look on decoding from JsonElement to Type

@xBaank
Copy link
Contributor Author

xBaank commented Apr 20, 2023

I see you have some other commits beside yours, did you start your branch off master? Please rebase it on dev.

you mean creating another pull request to the dev branch with the master changes? or just rebase master into my branch with newest changes?

@sandwwraith
Copy link
Member

Your branch (with your-only commits) should start from the latest dev. You can force-push it to use the same pull request

@sandwwraith
Copy link
Member

You can drop unnecessary commits during interactive rebase

@xBaank
Copy link
Contributor Author

xBaank commented Apr 20, 2023

You can drop unnecessary commits during interactive rebase

ty, I didn't know

private val INT_NUMBERS_CHARS = arrayOf('1', '2', '3', '4', '5', '6', '7', '8', '9', '0', 'e', 'E', '+', '-')
internal fun String.toLongJson(): Long {
return toLongOrNull() ?: toDouble()
.takeIf { all { it in INT_NUMBERS_CHARS } }
Copy link
Contributor Author

@xBaank xBaank Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cases like this one be supported when decoding into a long? and if so, should it be truncated, rounded...?

Same here

@xBaank xBaank requested a review from sandwwraith April 26, 2023 08:33
}
internal fun String.toLongJsonOrNull(): Long? {
return toLongOrNull() ?: toDoubleOrNull()
?.takeIf { all { it in INT_NUMBERS_CHARS } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think such code poses hidden problem: all iterates all symbols and it in iterates all chars in the INT_NUMBERS_CHARS array, so it results in a quadratic complexity. Moreover, it is just hard to read. I see several possibilities here:

  1. To simplify condition, I think it is possible just to check '.' !in it. All other non-digit and non-E signs should be prohibited by toDoubleOrNull(), I guess?
  2. A harder, but more correct change would be extract parsing logic from consumeNumericLiteral or to create StringJsonLexer in place so we wouldn't have two different code paths doing the same thing.

I think it is possible to also extract these takeIfs to a separate function since it doesn't matter if you use toDouble or toDoubleOrNull given that exception is anyway thrown at the end if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think StringJsonLexer is the best solution

true -> 10.0.pow(exponentAccumulator.toDouble())
}

if(hasExponent) {
Copy link
Member

@sandwwraith sandwwraith May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] formatting: space before opening brace (check other places too)

private inline fun <T> mapExceptionsToNull(f : () -> T) : T? {
return try { f()}
catch (e : JsonDecodingException) { null }
catch (e : IndexOutOfBoundsException) { null }
Copy link
Member

@sandwwraith sandwwraith May 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt IIOBE can happen here, because there is special EOF handling in the consumeNumericLiteral. (and generally, lexer is not allowed to throw it, because it must throw only JsonDecodingException). Do you have an example when it happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this test was failing on TREE mode, hasChar on consumeNumericLiteral doesn't update when the char is - | + | e | E

@Test
fun testInvalidNumber() = parametrizedTest {
assertFailsWithSerial("JsonDecodingException") { default.decodeFromString<Holder>("""{"id":-}""", it) }
assertFailsWithSerial("JsonDecodingException") { default.decodeFromString<Holder>("""{"id":+}""", it) }
assertFailsWithSerial("JsonDecodingException") { default.decodeFromString<Holder>("""{"id":--}""", it) }
assertFailsWithSerial("JsonDecodingException") { default.decodeFromString<Holder>("""{"id":1-1}""", it) }
assertFailsWithSerial("JsonDecodingException") { default.decodeFromString<Holder>("""{"id":0-1}""", it) }
assertFailsWithSerial("JsonDecodingException") { default.decodeFromString<Holder>("""{"id":0-}""", it) }
assertFailsWithSerial("JsonDecodingException") { default.decodeFromString<Holder>("""{"id":a}""", it) }
assertFailsWithSerial("JsonDecodingException") { default.decodeFromString<Holder>("""{"id":-a}""", it) }
}

@@ -315,6 +321,18 @@ public val JsonPrimitive.contentOrNull: String? get() = if (this is JsonNull) nu
private fun JsonElement.error(element: String): Nothing =
throw IllegalArgumentException("Element ${this::class} is not a $element")

private inline fun <T> mapExceptionsToNull(f : () -> T) : T? {
return try { f()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting: space before brace.

I recommend using IntelliJ IDEA auto-formatter, as it honors Kotlin's style guide


if (hasExponent) {
val doubleAccumulator = accumulator.toDouble() * calculateExponent(exponentAccumulator, isExponentPositive)
if(doubleAccumulator > Long.MAX_VALUE || doubleAccumulator < Long.MIN_VALUE) fail("Numeric value overflow")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fornatting: space after if


@Test
fun testExponentDecodingTruncatedDecimal() = parametrizedTest {
val decoded = Json.decodeFromString<SomeData>("""{ "count": -1E-1 }""", it)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just realized that it is incorrect to truncate such values. That's why the problem with 4.9E-324 happens in the first place - it is too small. We should not allow such values to be parsed to Long, or any value with non-zero fractional part after applying exponent. I think the easiest way to check it would be with doubleAccumulator.floor() == doubleAccumulator, as in here:

val canBeConverted = number.isFinite() && floor(number) == number

public val JsonPrimitive.intOrNull: Int? get() = content.toIntOrNull()
public val JsonPrimitive.intOrNull: Int?
get() =
mapExceptionsToNull { StringJsonLexer(content).consumeNumericLiteral().toInt().toLong().toInt() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there's .toInt().toLong().toInt()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why I did that

Copy link
Member

@sandwwraith sandwwraith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, thank you!

@sandwwraith sandwwraith merged commit f76d073 into Kotlin:dev May 17, 2023
pdvrieze pushed a commit to pdvrieze/kotlinx.serialization that referenced this pull request Jul 4, 2023
JesusMcCloud pushed a commit to a-sit-plus/kotlinx.serialization that referenced this pull request Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants