Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23603][SQL]When the length of the json is in a range,get_json_object will result in missing tail data #20739

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -215,12 +215,20 @@ case class GetJsonObject(json: Expression, path: Expression)
path: List[PathInstruction]): Boolean = {
(p.getCurrentToken, path) match {
case (VALUE_STRING, Nil) if style == RawStyle =>
// there is no array wildcard or slice parent, emit this string without quotes
if (p.hasTextCharacters) {
g.writeRaw(p.getTextCharacters, p.getTextOffset, p.getTextLength)
} else {
g.writeRaw(p.getText)
}

// Jackson(>=2.7.7) fixes the possibility of missing tail data
// when the length of the value is in a range
// Now we use the jackson version is 2.6.x
// So comment calls the code for this method ( writeRaw(char[] text, int offset, int len) )
// Although using writeRaw(String text) will lose some performance
g.writeRaw(p.getText)
Copy link
Contributor

@southernriver southernriver May 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the different between writeRaw(String text) and writeRaw(String text, int offset, int len),I have seen the source code of both functions, but I don't know why this action could solve this? thank you !
code for 2.6.X

public void writeRaw(String text)
        throws IOException, JsonGenerationException
    {
        int start = 0;
        int len = text.length();
        while (len > 0) {
            char[] buf = _charBuffer;
            final int blen = buf.length;
            final int len2 = (len < blen) ? len : blen;
            text.getChars(start, start+len2, buf, 0);
            writeRaw(buf, 0, len2);
            start += len2;
            len -= len2;
        }
    }
@Override
    public void writeRaw(String text, int offset, int len)
        throws IOException, JsonGenerationException
    {
        while (len > 0) {
            char[] buf = _charBuffer;
            final int blen = buf.length;
            final int len2 = (len < blen) ? len : blen;
            text.getChars(offset, offset+len2, buf, 0);
            writeRaw(buf, 0, len2);
            offset += len2;
            len -= len2;
        }
    }

I debug that the value of p.getTextOffset is not zero and which is exactly the missing length of string!


// there is no array wildcard or slice parent, emit this string without quote
// if (p.hasTextCharacters) {
// g.writeRaw(p.getTextCharacters, p.getTextOffset, p.getTextLength)
// } else {
// g.writeRaw(p.getText)
// }
true

case (START_ARRAY, Nil) if style == FlattenStyle =>
Expand Down Expand Up @@ -474,7 +482,11 @@ case class JsonTuple(children: Seq[Expression])
case JsonToken.VALUE_STRING if parser.hasTextCharacters =>
// slight optimization to avoid allocating a String instance, though the characters
// still have to be decoded... Jackson doesn't have a way to access the raw bytes
generator.writeRaw(parser.getTextCharacters, parser.getTextOffset, parser.getTextLength)
// generator.writeRaw(parser.getTextCharacters, parser.getTextOffset, parser.getTextLength)

// jackson 2.6.x writeRaw(char[] text, int offset, int len) has a bug
generator.writeRaw(parser.getText)


case JsonToken.VALUE_STRING =>
// the normal String case, pass it through to the output without enclosing quotes
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,13 @@ class JsonExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
"1234")
}

test("some big value") {
val value = "x" * 3000
checkEvaluation(
GetJsonObject(NonFoldableLiteral((s"""{"big": "$value"}"""))
, NonFoldableLiteral("$.big")), value)
}

val jsonTupleQuery = Literal("f1") ::
Literal("f2") ::
Literal("f3") ::
Expand Down