Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of json functions #22348

Merged
merged 1 commit into from
Jun 12, 2024
Merged

Conversation

Dith3r
Copy link
Member

@Dith3r Dith3r commented Jun 10, 2024

Description

Avoid allocating heap ByteBuffer used by InputStreamReader and use StringReader for small inputs. (FasterXML/jackson-core#1081)

Before change:

FilterProject[filterPredicate = (CASE WHEN (data <> varchar 'second') THEN contains(transform($internal$json_string_to_array_cast(data_0), (x) -> json_extract_scalar(x, JsonPath '$.name')), data) ELSE boolean 'false' END)]
     │   Layout: [data:varchar]                                                                                                          
     │   Estimates: {rows: ? (?), cpu: 216.78G, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}                  
     │   CPU: 35.08s (91.80%), Scheduled: 36.97s (87.41%), Blocked: 0.00ns (0.00%), Output: 3492632 rows (34.65MB)          

After:

FilterProject[filterPredicate = (CASE WHEN (data <> varchar 'second') THEN contains(transform($internal$json_string_to_array_cast(data_0), (x) -> json_extract_scalar(x, JsonPath '$.name')), data) ELSE boolean 'false' END)]
     │   Layout: [data:varchar]                                                                                                          
     │   Estimates: {rows: ? (?), cpu: 216.78G, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}                  
     │   CPU: 16.49s (95.06%), Scheduled: 17.24s (82.05%), Blocked: 2.81s (92.72%), Output: 3492632 rows (34.65MB)            

Query execution time dropped from 6.40s to 4.29s

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# General
* Improve JSON parsing performance. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jun 10, 2024
@Dith3r Dith3r requested a review from sopel39 June 10, 2024 10:58
@raunaqmorarka raunaqmorarka requested a review from martint June 10, 2024 11:37
@Dith3r Dith3r force-pushed the ke/json branch 3 times, most recently from 608857d to b47b15a Compare June 10, 2024 12:28
@Dith3r Dith3r changed the title Improve performance for json functions Improve performance of json functions Jun 10, 2024
@Dith3r
Copy link
Member Author

Dith3r commented Jun 10, 2024

CI failure is not related to change.

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find! Some comments

core/trino-main/src/main/java/io/trino/util/JsonUtil.java Outdated Show resolved Hide resolved
core/trino-main/src/main/java/io/trino/util/JsonUtil.java Outdated Show resolved Hide resolved
core/trino-main/src/main/java/io/trino/util/JsonUtil.java Outdated Show resolved Hide resolved
Avoid allocating heap ByteBuffer used by InputStreamReader.
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find!

@sopel39 sopel39 merged commit 623bcc2 into trinodb:master Jun 12, 2024
94 of 95 checks passed
@github-actions github-actions bot added this to the 450 milestone Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants