Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce function calls in hot path #20126

Merged
merged 1 commit into from
Jan 11, 2024
Merged

Reduce function calls in hot path #20126

merged 1 commit into from
Jan 11, 2024

Conversation

oneonestar
Copy link
Member

@oneonestar oneonestar commented Dec 15, 2023

Description

Reduce function calls in hot path when reading TextFile.
For some reason JVM didn't inline Slice.getByte() and that caused a performance regression (compare Trino 388 and Trino 428).

Additional context and related issues

$ java -version
openjdk version "17.0.4.1" 2022-08-12
OpenJDK Runtime Environment Temurin-17.0.4.1+1 (build 17.0.4.1+1)
OpenJDK 64-Bit Server VM Temurin-17.0.4.1+1 (build 17.0.4.1+1, mixed mode, sharing)

 CREATE TABLE test (
    ...
    foo varchar,
    err_code integer,
    log_date varchar,
    log_hour varchar,
    log_min varchar
 )
 WITH (
    format = 'TEXTFILE',
    partitioned_by = ARRAY['log_date','log_hour','log_min'],
    textfile_field_separator = U&'\0009'
 )

explain analyze verbose
select
     count(*) as request_count
 from test
where log_date between  '20231101' and '20231130' and foo != 'bar' and err_code = 0;

Before:

    └─ ScanFilterProject[table = ...., filterPredicate = ((""foo"" <> VARCHAR 'bar') AND (""err_code"" = 0))]
           Layout: []
           Estimates: {rows: ? (0B), cpu: ?, memory: 0B, network: 0B}/{rows: ? (0B), cpu: ?, memory: 0B, network: 0B}/{rows: ? (0B), cpu: 0, memory: 0B, network: 0B}
           CPU: 3.87h (99.88%), Scheduled: 3.93h (99.88%), Blocked: 0.00ns (0.00%), Output: 2561824485 rows (0B)
           metrics:
             'CPU time distribution (s)' = {count=720, p01=2.13, p05=2.36, p10=3.12, p25=7.32, p50=18.91, p75=26.43, p90=39.03, p95=43.56, p99=53.73, min=1.86, max=63.28}
             'Filter CPU time' = {duration=1.21m}
             'Input rows distribution' = {count=720, p01=643102.00, p05=723008.49, p10=959696.92, p25=2202411.75, p50=5724157.26, p75=7800053.53, p90=11579449.88, p95=12846610.81, p99=15835723.00, min=551069.00, max=18246492.00}
             'Scheduled time distribution (s)' = {count=720, p01=2.17, p05=2.42, p10=3.21, p25=7.46, p50=19.18, p75=26.88, p90=39.37, p95=44.30, p99=54.87, min=1.93, max=63.85}
           Input avg.: 5762501.43 rows, Input std.dev.: 66.94%
           foo := foo:string:REGULAR
           err_code := err_code:int:REGULAR
           log_date:string:PARTITION_KEY
               :: [[20231101], [20231102], [20231103], [20231104], [20231105], [20231106], [20231107], [20231108], [20231109], [20231110], [20231111], [20231112], [20231113], [20231114], [20231115], [20231116], [20231117], [20231118], [20231119], [20231120], [20231121], [20231122], [20231123], [20231124], [20231125], [20231126], [20231127], [20231128], [20231129], [20231130]]
           log_min:string:PARTITION_KEY
               :: [[0]]
           log_hour:string:PARTITION_KEY
               :: [[0], [1], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [2], [20], [21], [22], [23], [3], [4], [5], [6], [7], [8], [9]]
           Input: 4149001027 rows (70.55GB), Filtered: 38.25%, Physical input: 317.99GB, Physical input time: 6658800.00ms

After:

    └─ ScanFilterProject[table = ... filterPredicate = ((""foo"" <> VARCHAR 'bar') AND (""err_code"" = 0))]
           Layout: []
           Estimates: {rows: ? (0B), cpu: ?, memory: 0B, network: 0B}/{rows: ? (0B), cpu: ?, memory: 0B, network: 0B}/{rows: ? (0B), cpu: 0, memory: 0B, network: 0B}
           CPU: 3.09h (99.99%), Scheduled: 3.17h (99.99%), Blocked: 0.00ns (0.00%), Output: 2561824485 rows (0B)
           metrics:
             'CPU time distribution (s)' = {count=720, p01=1.72, p05=2.28, p10=2.88, p25=6.53, p50=15.68, p75=20.91, p90=30.21, p95=33.17, p99=39.95, min=1.34, max=45.44}
             'Filter CPU time' = {duration=59.80s}
             'Input rows distribution' = {count=720, p01=643102.00, p05=722484.69, p10=962167.67, p25=2202245.63, p50=5737402.26, p75=7801883.69, p90=11601371.25, p95=12858997.28, p99=15835723.00, min=551069.00, max=18246492.00}
             'Scheduled time distribution (s)' = {count=720, p01=1.82, p05=2.37, p10=2.96, p25=6.69, p50=16.00, p75=21.37, p90=31.16, p95=34.23, p99=42.22, min=1.38, max=46.59}
           Input avg.: 5762501.43 rows, Input std.dev.: 66.94%
           foo := foo:string:REGULAR
           err_code := err_code:int:REGULAR
           log_date:string:PARTITION_KEY
               :: [[20231101], [20231102], [20231103], [20231104], [20231105], [20231106], [20231107], [20231108], [20231109], [20231110], [20231111], [20231112], [20231113], [20231114], [20231115], [20231116], [20231117], [20231118], [20231119], [20231120], [20231121], [20231122], [20231123], [20231124], [20231125], [20231126], [20231127], [20231128], [20231129], [20231130]]
           log_hour:string:PARTITION_KEY
               :: [[0], [1], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [2], [20], [21], [22], [23], [3], [4], [5], [6], [7], [8], [9]]
           log_min:string:PARTITION_KEY
               :: [[0]]
           Input: 4149001027 rows (70.55GB), Filtered: 38.25%, Physical input: 317.99GB, Physical input time: 6814200.00ms
Trino 388 (For reference)
└─ ScanFilterProject[table = ..., filterPredicate = ((""foo"" <> VARCHAR 'bar') AND (""err_code"" = 0))]
         Layout: []
         Estimates: {rows: ? (0B), cpu: ?, memory: 0B, network: 0B}/{rows: ? (0B), cpu: ?, memory: 0B, network: 0B}/{rows: ? (0B), cpu: ?, memory: 0B, network: 0B}
         CPU: 3.58h (99.98%), Scheduled: 3.64h (99.98%), Blocked: 0.00ns (0.00%), Output: 2561824485 rows (0B)
         metrics:
           'Input distribution' = {count=721.00, p01=632889.00, p05=721412.80, p10=956607.82, p25=2181684.40, p50=5721734.28, p75=7801610.53, p90=11581353.33, p95=12852007.22, p99=15835723.00, min=0.00, max=18246492.00}
         Input avg.: 5754509.05 rows, Input std.dev.: 67.09%
         foo := foo:string:REGULAR
         err_code := err_code:int:REGULAR
         log_min:string:PARTITION_KEY
             :: [[0]]
         log_hour:string:PARTITION_KEY
             :: [[0], [1], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [2], [20], [21], [22], [23], [3], [4], [5], [6], [7], [8], [9]]
         log_date:string:PARTITION_KEY
             :: [[20231101], [20231102], [20231103], [20231104], [20231105], [20231106], [20231107], [20231108], [20231109], [20231110], [20231111], [20231112], [20231113], [20231114], [20231115], [20231116], [20231117], [20231118], [20231119], [20231120], [20231121], [20231122], [20231123], [20231124], [20231125], [20231126], [20231127], [20231128], [20231129], [20231130]]
         Input: 4149001027 rows (317.99GB), Filtered: 38.25%


image
CPU frame graph where:

  • Top: before patch
  • Middle: inline getByte()
  • Bottom: also inline isEscapeByte()

Compare Trino 388 and Trino 428
image (1)

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(X) Release notes are required, with the following suggested text:

# Section
* Improve performance when querying TextFile table ({issue}`issuenumber`)

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jan 10, 2024
@hashhar
Copy link
Member

hashhar commented Jan 10, 2024

ping @sopel39 / @Dith3r

@mosabua
Copy link
Member

mosabua commented Jan 10, 2024

Related .. does this still apply and fix the issue for Java 21 .. since that is now the required runtime.

@sopel39 sopel39 merged commit 87ea350 into trinodb:master Jan 11, 2024
57 checks passed
@sopel39
Copy link
Member

sopel39 commented Jan 11, 2024

merged, thanks!

@sopel39 sopel39 added the no-release-notes This pull request does not require release notes entry label Jan 11, 2024
@github-actions github-actions bot added this to the 436 milestone Jan 11, 2024
@oneonestar oneonestar deleted the inline branch February 6, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed no-release-notes This pull request does not require release notes entry stale
Development

Successfully merging this pull request may close these issues.

4 participants