Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23603][SQL]When the length of the json is in a range,get_json_object will result in missing tail data #20739

Closed
wants to merge 1 commit into from

Conversation

cxzl25
Copy link
Contributor

@cxzl25 cxzl25 commented Mar 5, 2018

What changes were proposed in this pull request?

Replace writeRaw(char[] text, int offset, int len) with writeRaw(String text)
Although using writeRaw(String text) will lose some performance

spark-shell:

val value = "x" * 3000
val json = s"""{"big": "$value"}"""
spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect
res0: Array[org.apache.spark.sql.Row] = Array([2991])

expect result : 3000
actual result : 2991

public abstract void writeRaw(char[] text,int offset,int len) throws IOException
Method that will force generator to copy input text verbatim with no modifications (including that no escaping is done and no separators are added even if context [array, object] would otherwise require such). If such separators are desired, use writeRawValue(String) instead.

public abstract void writeRaw(String text) throws IOException
Method that will force generator to copy input text verbatim with no modifications (including that no escaping is done and no separators are added even if context [array, object] would otherwise require such). If such separators are desired, use writeRawValue(String) instead.

Jackson(>=2.7.7) fixes the possibility of missing tail data when the length of the value is in a range
https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7
https://github.com/FasterXML/jackson-core/issues/30

How was this patch tested?

org.apache.spark.sql.catalyst.expressions.JsonExpressionsSuite
test("some big value")

@cxzl25
Copy link
Contributor Author

cxzl25 commented Mar 12, 2018

Another solution:
#20738

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@cxzl25 cxzl25 closed this Jun 28, 2018
// Now we use the jackson version is 2.6.x
// So comment calls the code for this method ( writeRaw(char[] text, int offset, int len) )
// Although using writeRaw(String text) will lose some performance
g.writeRaw(p.getText)
Copy link
Contributor

@southernriver southernriver May 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the different between writeRaw(String text) and writeRaw(String text, int offset, int len),I have seen the source code of both functions, but I don't know why this action could solve this? thank you !
code for 2.6.X

public void writeRaw(String text)
        throws IOException, JsonGenerationException
    {
        int start = 0;
        int len = text.length();
        while (len > 0) {
            char[] buf = _charBuffer;
            final int blen = buf.length;
            final int len2 = (len < blen) ? len : blen;
            text.getChars(start, start+len2, buf, 0);
            writeRaw(buf, 0, len2);
            start += len2;
            len -= len2;
        }
    }
@Override
    public void writeRaw(String text, int offset, int len)
        throws IOException, JsonGenerationException
    {
        while (len > 0) {
            char[] buf = _charBuffer;
            final int blen = buf.length;
            final int len2 = (len < blen) ? len : blen;
            text.getChars(offset, offset+len2, buf, 0);
            writeRaw(buf, 0, len2);
            offset += len2;
            len -= len2;
        }
    }

I debug that the value of p.getTextOffset is not zero and which is exactly the missing length of string!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants