-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Json configuration flag to allow comments #2592
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
fa75b84
Add Json configuration flag to allow comments
sandwwraith 4c112bf
fixup api dump
sandwwraith 5754583
Apply suggestions from code review
sandwwraith 1475f95
~minor fixups
sandwwraith 926fd40
~extract isWhitespace
sandwwraith 3113c46
~invert if
sandwwraith 8cb5ef1
~add more tests after review
sandwwraith 51a86ed
Add withComments mode to SpecConformanceTest
sandwwraith File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
64 changes: 64 additions & 0 deletions
64
benchmark/src/jmh/kotlin/kotlinx/benchmarks/json/TwitterFeedCommentsBenchmark.kt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
/* | ||
* Copyright 2017-2024 JetBrains s.r.o. Use of this source code is governed by the Apache 2.0 license. | ||
*/ | ||
|
||
package kotlinx.benchmarks.json | ||
|
||
import kotlinx.benchmarks.model.* | ||
import kotlinx.serialization.json.* | ||
import org.openjdk.jmh.annotations.* | ||
import java.io.* | ||
import java.util.concurrent.* | ||
|
||
@Warmup(iterations = 7, time = 1) | ||
@Measurement(iterations = 7, time = 1) | ||
@BenchmarkMode(Mode.Throughput) | ||
@OutputTimeUnit(TimeUnit.SECONDS) | ||
@State(Scope.Benchmark) | ||
@Fork(2) | ||
open class TwitterFeedCommentsBenchmark { | ||
val inputBytes = TwitterFeedBenchmark::class.java.getResource("/twitter_macro.json").readBytes() | ||
private val input = inputBytes.decodeToString() | ||
private val inputWithComments = prepareInputWithComments(input) | ||
private val inputWithCommentsBytes = inputWithComments.encodeToByteArray() | ||
|
||
private val jsonComments = Json { ignoreUnknownKeys = true; allowComments = true; } | ||
private val jsonNoComments = Json { ignoreUnknownKeys = true; allowComments = false; } | ||
|
||
fun prepareInputWithComments(inp: String): String { | ||
val result = inp.lineSequence().map { s -> | ||
// "id", "in_...", "is_...", etc | ||
if (!s.trimStart().startsWith("\"i")) s else "$s // json comment" | ||
}.joinToString("\n") | ||
assert(result.contains("// json comment")) | ||
return result | ||
} | ||
|
||
@Setup | ||
fun init() { | ||
// Explicitly invoking both variants before benchmarking so we know that both parser implementation classes are loaded | ||
require("foobar" == jsonComments.decodeFromString<String>("\"foobar\"")) | ||
require("foobar" == jsonNoComments.decodeFromString<String>("\"foobar\"")) | ||
} | ||
|
||
// The difference with TwitterFeedBenchmark.decodeMicroTwitter shows if we slow down when both StringJsonLexer and CommentsJsonLexer | ||
// are loaded by JVM. Should be almost non-existent on modern JVMs (but on OpenJDK-Corretto-11.0.14.1 there is one. 17 is fine.) | ||
@Benchmark | ||
fun decodeMicroTwitter() = jsonNoComments.decodeFromString(MicroTwitterFeed.serializer(), input) | ||
|
||
// The difference with this.decodeMicroTwitter shows if we slow down when comments are enabled but no comments present | ||
// in the input. It is around 13% slower than without comments support, mainly because skipWhitespaces is a separate function | ||
// that sometimes is not inlined by JIT. | ||
@Benchmark | ||
fun decodeMicroTwitterCommentSupport() = jsonComments.decodeFromString(MicroTwitterFeed.serializer(), input) | ||
|
||
// Shows how much actual skipping of the comments takes: around 10%. | ||
@Benchmark | ||
fun decodeMicroTwitterCommentInData() = jsonComments.decodeFromString(MicroTwitterFeed.serializer(), inputWithComments) | ||
|
||
@Benchmark | ||
fun decodeMicroTwitterCommentSupportStream() = jsonComments.decodeFromStream(MicroTwitterFeed.serializer(), ByteArrayInputStream(inputBytes)) | ||
|
||
@Benchmark | ||
fun decodeMicroTwitterCommentInDataStream() = jsonComments.decodeFromStream(MicroTwitterFeed.serializer(), ByteArrayInputStream(inputWithCommentsBytes)) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
165 changes: 165 additions & 0 deletions
165
formats/json-tests/commonTest/src/kotlinx/serialization/features/JsonCommentsTest.kt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,165 @@ | ||
/* | ||
* Copyright 2017-2024 JetBrains s.r.o. Use of this source code is governed by the Apache 2.0 license. | ||
*/ | ||
|
||
package kotlinx.serialization.features | ||
|
||
import kotlinx.serialization.* | ||
import kotlinx.serialization.json.* | ||
import kotlin.test.* | ||
|
||
class JsonCommentsTest: JsonTestBase() { | ||
val json = Json(default) { | ||
allowComments = true | ||
allowTrailingComma = true | ||
} | ||
|
||
val withLenient = Json(json) { | ||
isLenient = true | ||
ignoreUnknownKeys = true | ||
} | ||
|
||
@Test | ||
fun testBasic() = parametrizedTest { mode -> | ||
val inputBlock = """{"data": "b" /*value b*/ }""" | ||
val inputLine = "{\"data\": \"b\" // value b \n }" | ||
assertEquals(StringData("b"), json.decodeFromString(inputBlock, mode)) | ||
assertEquals(StringData("b"), json.decodeFromString(inputLine, mode)) | ||
} | ||
|
||
@Serializable | ||
data class Target(val key: String, val key2: List<Int>, val key3: NestedTarget, val key4: String) | ||
|
||
@Serializable | ||
data class NestedTarget(val nestedKey: String) | ||
|
||
private fun target(key4: String): Target = Target("value", listOf(1, 2), NestedTarget("foo"), key4) | ||
|
||
@Test | ||
fun testAllBlocks() = parametrizedTest { mode -> | ||
val input = """{ /*beginning*/ | ||
/*before key*/ "key" /*after key*/ : /*after colon*/ "value" /*before comma*/, | ||
"key2": [ /*array1*/ 1, /*array2*/ 2, /*end array*/], | ||
"key3": { /*nested obj*/ "nestedKey": "foo"} /*after nested*/, | ||
"key4": "/*comment inside quotes is a part of value*/", | ||
/*before end*/ | ||
}""" | ||
assertEquals(target("/*comment inside quotes is a part of value*/"), json.decodeFromString(input, mode)) | ||
} | ||
|
||
@Test | ||
fun testAllLines() = parametrizedTest { mode -> | ||
val input = """{ //beginning | ||
//before key | ||
"key" // after key | ||
: // after colon | ||
"value" //before comma | ||
, | ||
"key2": [ //array1 | ||
1, //array2 | ||
2, //end array | ||
], | ||
"key3": { //nested obj | ||
"nestedKey": "foo" | ||
} , //after nested | ||
"key4": "//comment inside quotes is a part of value", | ||
//before end | ||
}""" | ||
assertEquals(target("//comment inside quotes is a part of value"), json.decodeFromString(input, mode)) | ||
} | ||
|
||
@Test | ||
fun testMixed() = parametrizedTest { mode -> | ||
sandwwraith marked this conversation as resolved.
Show resolved
Hide resolved
|
||
val input = """{ // begin | ||
"key": "value", // after | ||
"key2": /* array */ /*another comment */ [1, 2], | ||
"key3": /* //this is a block comment */ { "nestedKey": // /*this is a line comment*/ "bar" | ||
"foo" }, | ||
"key4": /* nesting block comments /* not supported */ "*/" | ||
/* end */}""" | ||
assertEquals(target("*/"), json.decodeFromString(input, mode)) | ||
} | ||
|
||
@Test | ||
fun testWeirdKeys() { | ||
val map = mapOf( | ||
"// comment inside quotes is a part of key" to "/* comment inside quotes is a part of value */", | ||
"/*key */" to "/* value", | ||
"/* key" to "*/ value" | ||
) | ||
val input = """/* before begin */ | ||
{ | ||
${map.entries.joinToString(separator = ",\n") { (k, v) -> "\"$k\" : \"$v\"" }} | ||
} // after end | ||
""".trimIndent() | ||
val afterMap = json.parseToJsonElement(input).jsonObject.mapValues { (_, v) -> | ||
v as JsonPrimitive | ||
assertTrue(v.isString) | ||
v.content | ||
} | ||
assertEquals(map, afterMap) | ||
} | ||
|
||
@Test | ||
fun testWithLenient() = parametrizedTest { mode -> | ||
val input = """{ //beginning | ||
//before key | ||
key // after key | ||
: // after colon | ||
value //before comma | ||
, | ||
key2: [ //array1 | ||
1, //array2 | ||
2, //end array | ||
], | ||
key3: { //nested obj | ||
nestedKey: "foo" | ||
} , //after nested | ||
key4: value//comment_cannot_break_value_apart, | ||
key5: //comment without quotes where new token expected is still a comment | ||
value5, | ||
//before end | ||
}""" | ||
assertEquals(target("value//comment_cannot_break_value_apart"), withLenient.decodeFromString(input, mode)) | ||
} | ||
|
||
@Test | ||
fun testUnclosedCommentsErrorMsg() = parametrizedTest { mode -> | ||
val input = """{"data": "x"} // no newline""" | ||
assertEquals(StringData("x"), json.decodeFromString<StringData>(input, mode)) | ||
val input2 = """{"data": "x"} /* no endblock""" | ||
assertFailsWith<SerializationException>("Expected end of the block comment: \"*/\", but had EOF instead at path: \$") { | ||
json.decodeFromString<StringData>(input2, mode) | ||
} | ||
} | ||
|
||
private val lexerBatchSize = 16 * 1024 | ||
|
||
@Test | ||
fun testVeryLargeComments() = parametrizedTest { mode -> | ||
val strLen = lexerBatchSize * 2 + 42 | ||
val inputLine = """{"data": //a""" + "a".repeat(strLen) + "\n\"x\"}" | ||
assertEquals(StringData("x"), json.decodeFromString<StringData>(inputLine, mode)) | ||
val inputBlock = """{"data": /*a""" + "a".repeat(strLen) + "*/\"x\"}" | ||
assertEquals(StringData("x"), json.decodeFromString<StringData>(inputBlock, mode)) | ||
} | ||
|
||
@Test | ||
fun testCommentsOnThresholdEdge() = parametrizedTest { mode -> | ||
val inputPrefix = """{"data": /*a""" | ||
// Here, we test the situation when closing */ is divided in buffer: | ||
// * fits in the initial buffer, but / is not. | ||
// E.g. situation with batches looks like this: ['{', '"', 'd', ..., '*'], ['/', ...] | ||
val bloatSize = lexerBatchSize - inputPrefix.length - 1 | ||
val inputLine = inputPrefix + "a".repeat(bloatSize) + "*/\"x\"}" | ||
assertEquals(StringData("x"), json.decodeFromString<StringData>(inputLine, mode)) | ||
|
||
// Test when * is unclosed and last in buffer: | ||
val inputLine2 = inputPrefix + "a".repeat(bloatSize) + "*" | ||
assertFailsWith<SerializationException>("Expected end of the block comment: \"*/\", but had EOF instead at path: \$") { | ||
json.decodeFromString<StringData>(inputLine2, mode) | ||
} | ||
|
||
} | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4 changes: 4 additions & 0 deletions
4
formats/json-tests/jvmTest/resources/spec_cases/listing_comments.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
// Non-spec inputs that we accept with allowComments = true setting: | ||
n_object_trailing_comment.json | ||
n_object_trailing_comment_slash_open.json | ||
n_structure_object_with_comment.json |
Binary file modified
BIN
+731 Bytes
(110%)
formats/json-tests/jvmTest/src/kotlinx/serialization/json/SpecConformanceTest.kt
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need special tests for
StringJsonLexer
andReaderJsonLexer
. We now have two decoding branches that can be modified independently of each other.Since JSON without comments is actually a subset of JSON with comments, it is necessary to test it for both cases.
We can do this by running all the JSON tests for
allowComments = true
, however, this will be very redundant.It is easier to identify the main decoding cases and write a limited set of tests for them, which works the same regardless of the source (String or Reader) and the settings (with or without comments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid we do not have particular 'main decoding cases' tests. The closes place I can think of is Json spec tests:
kotlinx.serialization/formats/json-tests/jvmTest/src/kotlinx/serialization/json/SpecConformanceTest.kt
Line 27 in 5e8ccad
Do you think it is enough to add
allowComments = true
there?Also, 'JsonCommentsTest' may be already enough because it has arrays, nested objects, and strings, and there's really not much anything else in the json spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think SpecConformanceTest will be enough.
After all, this flag only changes parsing, but not the serialization logic.