pg_plan_advsr is a PostgreSQL extension that provides Automated execution plan tuning using feedback loop. This extension might help you if you have an analytic query which has many joins and aggregates and you'd like to get an efficient plan to reduce execution time. This extension is intended to use in a plan tuning phase at the end of system development.
- Note: This extension is intended to be used in a validation environment, not a commercial environment.
- master branch for PostgreSQL 12 or above
This README contains the following sections:
- Cookbook
- Objects created by the extension
- Options
- Usage
- Installation Requirements
- Installation
- Internals
- Limitations
- Support
- Author
- Acknowledgments
pg_plan_advsr was created by Tatsuro Yamada.
This is a simple example of how to use pg_plan_advsr for auto execution plan tuning. More detailed information will be provided in the sections Options and Usage.
Try running regression tests inside the sql directory: pg_plan_advsr/sql. First, create a test table, etc. to execute init.sql. After that, run base.sql and observe the results of automatic plan tuning.
As shown below, you can see that a Nested Loop join during the first run has changed to a Hash join after tuning. The error in the estimated number of rows should be zero, and the query execution time was successfully reduced.
Note that base.sql
will clear these tables and view such as hint_plan.hints, plan_repo.norm_queries, plan_repo.raw_queries, plan_repo.plan_history, and pg_store_plans view. So, please take care if you use it.
psql
\i init.sql
\i base.sql
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=295.29..515.35 rows=1 width=16) (actual time=24.591..164.421 rows=9991 loops=1)
-> Hash Join (cost=295.00..515.01 rows=1 width=16) (actual time=24.500..46.584 rows=10000 loops=1)
Hash Cond: ((a.c1 = b.c1) AND (a.c2 = b.c2))
-> Seq Scan on table_a a (cost=0.00..145.00 rows=10000 width=8) (actual time=0.032..3.899 rows=10000 loops=1)
-> Hash (cost=145.00..145.00 rows=10000 width=8) (actual time=24.359..24.359 rows=10000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 519kB
-> Seq Scan on table_b b (cost=0.00..145.00 rows=10000 width=8) (actual time=0.026..5.771 rows=10000 loops=1)
-> Index Scan using ind_c_c2 on table_c c (cost=0.29..0.33 rows=1 width=8) (actual time=0.009..0.010 rows=1 loops=10000)
Index Cond: ((c2 = a.c2) AND (c2 >= 10))
Filter: ((c1 > 1) AND (c1 = a.c1))
Planning Time: 3.952 ms
Execution Time: 168.867 ms
(12 rows)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=590.00..934.88 rows=9991 width=16) (actual time=37.030..74.904 rows=9991 loops=1)
Hash Cond: ((a.c1 = b.c1) AND (a.c2 = b.c2))
-> Hash Join (cost=295.00..564.93 rows=9991 width=16) (actual time=17.751..40.944 rows=9991 loops=1)
Hash Cond: ((c.c1 = a.c1) AND (c.c2 = a.c2))
-> Seq Scan on table_c c (cost=0.00..195.00 rows=9990 width=8) (actual time=0.060..7.318 rows=9991 loops=1)
Filter: ((c1 > 1) AND (c2 >= 10))
Rows Removed by Filter: 9
-> Hash (cost=145.00..145.00 rows=10000 width=8) (actual time=17.658..17.660 rows=10000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 519kB
-> Seq Scan on table_a a (cost=0.00..145.00 rows=10000 width=8) (actual time=0.019..4.141 rows=10000 loops=1)
-> Hash (cost=145.00..145.00 rows=10000 width=8) (actual time=19.249..19.250 rows=10000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 519kB
-> Seq Scan on table_b b (cost=0.00..145.00 rows=10000 width=8) (actual time=0.022..4.799 rows=10000 loops=1)
Planning Time: 3.961 ms
Execution Time: 78.294 ms
(15 rows)
See: Usage for more details.
- FUNCTION
pg_plan_advsr_enable_feedback()
RETURNS void - FUNCTION
pg_plan_advsr_disable_feedback()
RETURNS void - FUNCTION
plan_repo.get_hint(bigint)
RETURNS text- If you give a pgsp_planid as an argument, it will return the hints to reproduce the plan based on pgsp_planid
- FUNCTION
plan_repo.get_extstat(bigint)
RETURNS text- If you give a queryid as an argument, it will return the syntax for generating extended statistics. This function supports PG14 or above since it uses compute_query_id.
plan_repo.plan_history
plan_repo.norm_queries
plan_repo.raw_queries
Table "plan_repo.plan_history"
Column | Type | Description
---------------------+-----------------------------+-------------------------------------------------------------------------------
id | integer | Sequence as a primary key: nextval('plan_repo.plan_history_id_seq'::regclass)
norm_query_hash | text | MD5 based on normalized query text
pgsp_queryid | bigint | Queryid of pg_store_plans
pgsp_planid | bigint | Planid of pg_sotre_plans
execution_time | numeric | Execution time (ms) of this planid
rows_hint | text | Rows hint of this plan
scan_hint | text | Scan hint of this plan
join_hint | text | Join hint of this plan
lead_hint | text | Leading hint of this plan
scan_rows_err | numeric | Sum of estimation row error of scans
scan_err_ratio | numeric | Maximum estimation row error ratio of scans
join_rows_err | numeric | Sum of estimation row error of joins
join_err_ratio | numeric | Maximum estimation row error ratio of joins
scan_cnt | integer | Number of scan nodes in this plan
join_cnt | integer | Number of Join nodes in this plan
application_name | text | Application name of client tool such as "psql"
timestamp | timestamp without time zone | Timestamp of this record inserted
Table "plan_repo.norm_queries"
Column | Type | Description
-------------------+----------------------------+-----------------------------------
norm_query_hash | text | MD5 based on normalized query text
norm_query_string | text | Normalized query text
Table "plan_repo.raw_queries"
Column | Type | Description
------------------+-----------------------------+----------------------------------------------------------------------------------------
norm_query_hash | text | MD5 based on normalized query text
raw_query_id | integer | Sequence of raw query text: nextval('plan_repo.raw_queries_raw_query_id_seq'::regclass)
raw_query_string | text | Raw query text (not normalized)
timestamp | timestamp without time zone | Timestamp of this record inserted
-
plan_repo.plan_history_pretty
Columns are same as plan_history table, but number of decimal places are reduced for readability
-
pg_plan_advsr.enabled
"ON": Enable pg_plan_advsr. It allows creating various hints for fixing row estimation errors and also for reproducing a plan. It also stores them in the plan_history table. If you want to use "auto plan tuning using feedback loop", you have to execute below function "pg_plan_advsr_enable_feedback(). Default setting is "ON".
-
pg_plan_advsr.quieted
"ON": Enable quiet mode. It allows to disable emmiting the following messages when your EXPLAIN ANALYZE commmand finished.
pgsp_queryid pgsp_planid Execution time Hints for current plan Rows hint (feedback info) and so on
Default setting is "OFF".
-
pg_plan_advsr.widely
"ON": Enable creating hints even if EXPLAIN command without ANALYZE option. It allows creating various hints needed for reproducing a plan, but it doesn't create hints for fixing row estimation errors because there is no information of Actual rows. It also stores them in the plan_history table. If you want to get hints to reproduce a plan, this option helps you. Default setting is "OFF".
-
pg_plan_advsr_enable_feedback()
This function allows you to use feedback loop for plan tuning. Actually, it is a wrapper for these commands:
set pg_plan_advsr.enabled to on; set pg_hint_plan.enable_hint_table to on; set pg_hint_plan.debug_print to on;
-
pg_plan_advsr_disable_feedback()
This function disables using feedback loop for plan tuning. It is a wrapper for these commands:
set pg_plan_advsr.enabled to on; set pg_hint_plan.enable_hint_table to off; set pg_hint_plan.debug_print to off;
TBA
There are four types of usage:
- Displaying Execution Plan Characteristics
- Automatic Execution Plan Tuning
- Automatic Hint Clause Generation
- Extended Statistics Suggestion
Details on how to use each are shown below.
-
Displaying Execution Plan Characteristics
First, Make sure
pg_plan_advsr.enabled to on
. Then, Execute EXPLAIN ANALYZE command (which is your query). You can get the result with theDESCRIBE
section appended, and you can see the number of joins, scans, errors in row count estimation, and so on.e.g.
explain (analyze, verbose) select * from t where a = 1 and b = 1; QUERY PLAN -------------------------------------------------------------------------------------------------------- Seq Scan on public.t (cost=0.00..195.00 rows=100 width=8) (actual time=0.052..6.269 rows=100 loops=1) Output: a, b Filter: ((t.a = 1) AND (t.b = 1)) Rows Removed by Filter: 9900 Query Identifier: -3455024416178978571 Planning Time: 0.930 ms Execution Time: 9.136 ms DESCRIBE ------------------------ application: psql pgsp_queryid: -3455024416178978571 pgsp_planid: 3455841613 join_cnt: 0 join_rows_err: 0 join_err_ratio: 0.00 scan_cnt: 1 scan_rows_err: 0 scan_err_ratio: 0.00 lead hint: LEADING( t ) join hint: scan hint: SEQSCAN(t) rows hint: (23 rows)
-
For auto plan tuning
First, Run
select pg_plan_advsr_enable_feedback();
. Then, Execute EXPLAIN ANALYZE command (which is your query) repeatedly until row estimation errors had vanished. Finally, You can check a result of the tuning by using the below queries:select pgsp_queryid, pgsp_planid, execution_time, scan_hint, join_hint, lead_hint from plan_repo.plan_history order by id; select queryid, planid, plan from pg_store_plans where queryid='your pgsp_queryid in plan_history' order by first_call;
See shell script file as an example: JOB/auto_tune_31c.sh
Demo of auto tuning (3x speed)
If you'd like to reproduce the execution plans on other environments, you'd be better to read the other.
Note:
- A plan may temporarily worse than an initial plan during auto tuning phase.
- Use stable data for auto plan tuning. This extension doesn't get converged plan (the ideal plan for the data) if it was updating concurrently.
-
For getting hints of current query
First, Run
select pg_plan_advsr_disable_feedback();
. Then, Execute EXPLAIN ANALYZE command (which is your query). Finally, You can get hints by using the below queries:select pgsp_queryid, pgsp_planid, execution_time, scan_hint, join_hint, lead_hint from plan_repo.plan_history order by id;
e.g.
pgsp_queryid | pgsp_planid | execution_time | scan_hint | join_hint | lead_hint --------------+-------------+----------------+-----------------------------------------+--------------------+------------------------- 4173287301 | 3707748199 | 265.179 | SEQSCAN(t2) SEQSCAN(x) INDEXSCAN(t1) | HASHJOIN(t2 t1 x) +| LEADING( (t2 (x t1 )) ) | | | | NESTLOOP(t1 x) | 4173287301 | 1101439786 | 2.149 | SEQSCAN(x) INDEXSCAN(t1) INDEXSCAN(t2) | NESTLOOP(t2 t1 x) +| LEADING( ((x t1 )t2 ) ) | | | | NESTLOOP(t1 x) | # \a Output format is unaligned. # \t Tuples only is on. select plan_repo.get_hint(1101439786); /*+ LEADING( ((x t1 )t2 ) ) NESTLOOP(t2 t1 x) NESTLOOP(t1 x) SEQSCAN(x) INDEXSCAN(t1) INDEXSCAN(t2) */ --1101439786
You can use the hints to reproduce the execution plan anywhere. It also can be used to modify the execution plan by changing the hints manually.
-
For getting extended statistics suggestion
This feature is enabled when you use PG14 or above with pg_qualstats. First, Make sure
pg_plan_advsr.enabled to on
. Then, Execute EXPLAIN ANALYZE command (which is your query). Finally, You can get extended statistics suggestion by using the below queries:select * from plan_repo.get_extstat(queryid);
e.g.
# select * from plan_repo.get_extstat(-3455024416178978571); suggest ------------------------------------------ CREATE STATISTICS ON a, b FROM public.t; (1 row)
pg_plan_advsr uses pg_hint_plan and pg_store_plans cooperatively.
- PostgreSQL 12 or above
- pg_hint_plan
- pg_store_plans
- pg_qualstats
- if you'd like to use Extended statistic suggestion feature on PG14 or above
- RHEL/CentOS/Rocky = 7.x or above
TBA
There are two methods to install the extension: Using building pg_plan_advsr manually.
-
Build and install (make && make install)
- Prerequisite for installation
- Install postgresql-devel package if you installed PostgreSQL by rpm files
- Set the PATH environment variable to pg_config of your PostgreSQL
Operations
-
git clone extensions
-- Required git clone https://github.com/ossc-db/pg_hint_plan.git git clone https://github.com/ossc-db/pg_store_plans.git git clone https://github.com/ossc-db/pg_plan_advsr.git -- Optional: if you use Extended statistic suggestion feature on PG14 or above git clone https://github.com/powa-team/pg_qualstats.git
-
git checkout
Set the appropriate version of PostgreSQL for the VERSION variable. For example, If you use PG12, see below:
export VERSION=12 cd pg_hint_plan git checkout -b PG${VERSION} origin/PG${VERSION} && git checkout $(git describe --tag) cd ../pg_store_plans git checkout $(git describe --tag)
-
build and install
-- Required cd ../pg_hint_plan make -s && make -s install cp pg_stat_statements.c ../pg_plan_advsr/ cp normalize_query.h ../pg_plan_advsr/ cd ../pg_store_plans make -s USE_PGXS=1 all install cp pgsp_json*.[ch] ../pg_plan_advsr/ cd ../pg_plan_advsr git describe --alway make make install -- Optional cd ../pg_qualstats make make install
-
edit PostgreSQL.conf
vi $PGDATA/postgresql.conf ---- Add these lines ----------------------------------------------------- -- Required shared_preload_libraries = 'pg_hint_plan, pg_plan_advsr, pg_store_plans' max_parallel_workers_per_gather = 0 max_parallel_workers = 0 compute_query_id = on or -- Optional shared_preload_libraries = 'pg_hint_plan, pg_plan_advsr, pg_store_plans, pg_qualstats' max_parallel_workers_per_gather = 0 max_parallel_workers = 0 compute_query_id = on pg_qualstats.resolve_oids = true pg_qualstats.sample_rate = 1 ----------------------------------------------------------------------------------- ---- Consider tweak these numbers ------------------------------------------------- -- Use a large value than join numbers of your query geqo_threshold = 12 -> 20 from_collapse_limit = 8 -> 20 join_collapse_limit = 8 -> 20 -- Optional random_page_cost = 4 -> 1 -----------------------------------------------------------------------------------
-
run create extension commands on psql
pg_ctl start psql -- Required create extension pg_hint_plan; create extension pg_store_plans; create extension pg_plan_advsr; -- Optional create extension pg_qualstats;
- You can try this extension with Join Order Benchmark as a example. See: how_to_setup.md in JOB directory
- Prerequisite for installation
-
Dockerfile (experimental)
Operations
\# cd pg_plan_advsr/docker \# ./build.sh
See: build.sh and Dockerfile
TBA
These presentation materials are useful to know concepts and its architecture, and these show a benchmark result by using Join order benchmark:
- Handle InitPlans and SubPlans
- Handle Append and MergeAppend
- Fix bese-relation's estimated row error (This is pg_hint_plan's limitation)
- Concurrent execution
- Extended Statistics Suggestion for Grouping columuns and Expressions
- Extended Statistics Suggestion on PG13 or below
- Parallel query
- Partitioned Table
- JIT
See: TODO file
pg_plan_advsr uses pg_hint_plan and pg_store_plans, it would be better to check these document to know their limitations.
If you want to report a problem with pg_plan_advsr, please include the following information because we will analyze it by reproducing your problem:
- Versions
- PostgreSQL
- pg_hint_plan
- pg_store_plans
- Query
- DDL
- CREATE TABLE
- CREATE INDEX
- Data (If possible)
If you have a problem or question or any kind of feedback, the preferred option is to open an issue on GitHub: https://github.com/ossc-db/pg_plan_advsr/issues This requires a GitHub account. Of course, any Pull request welcome!
Tatsuro Yamada (yamatattsu at gmail dot com)
Copyright (c) 2019-2024, NIPPON TELEGRAPH AND TELEPHONE CORPORATION
The following individuals (in alphabetical order) have contributed to pg_plan_advsr as patch authors, reviewers, testers, advisers, or reporters of issues. Thanks a lot!
Amit Langote
David Pitts
Etsuro Fujita
Hironobu Suzuki
Julien Rouhaud
Kaname Furutani
Kyotaro Horiguchi
Laurenz Albe
Nuko Yokohama
Sam Xu