Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Runs Page: Filtering runs on status takes 10x more time than without filter #25659

Closed
HynekBlaha opened this issue Oct 31, 2024 · 2 comments
Labels
area: UI/UX Related to User Interface and User Experience type: bug Something isn't working

Comments

@HynekBlaha
Copy link
Contributor

HynekBlaha commented Oct 31, 2024

What's the issue?

Hello,
When I list job runs, it takes ~8s. When I add filter on status, it takes ~80s.
Needless to say, the UI timeouts and throws warning.

Alembic version

select version_num from alembic_version
> '284a732df317'

Count of all Run records

select count(*)
FROM runs
> 5248671

Count of all Run Tags records

select count(*)
FROM run_tags
> 53221915

List runs without status filter

explain analyze SELECT runs.id, runs.run_body, runs.status, runs.create_timestamp, runs.update_timestamp, runs.start_time, runs.end_time
FROM runs
JOIN run_tags AS run_tags_filter0 ON runs.run_id = run_tags_filter0.run_id AND run_tags_filter0.key = '.dagster/repository' AND run_tags_filter0.value = '__repository__@my_repository'
WHERE runs.pipeline_name = 'my_job' AND (runs.run_id NOT IN (SELECT run_tags.run_id
FROM run_tags
WHERE run_tags.key = 'dagster/backfill')) ORDER BY runs.id DESC
 LIMIT 27
>
Limit  (cost=18441.94..1149838677.70 rows=2 width=1262) (actual time=1372.219..6995.692 rows=27 loops=1)
  ->  Nested Loop  (cost=18441.94..1149838677.70 rows=2 width=1262) (actual time=1372.217..6995.664 rows=27 loops=1)
        ->  Index Scan Backward using idx_runs_by_job on runs  (cost=18245.96..1149649975.98 rows=953 width=1299) (actual time=1365.801..6828.236 rows=27 loops=1)
              Index Cond: (pipeline_name = 'my_job'::text)
              Filter: (NOT (SubPlan 1))
              SubPlan 1
                ->  Materialize  (cost=18245.40..1221695.21 rows=1403463 width=37) (actual time=3.160..151.164 rows=1326050 loops=27)
                      ->  Gather  (cost=18245.40..1203712.90 rows=1403463 width=37) (actual time=85.124..919.494 rows=1326050 loops=1)
                            Workers Planned: 2
                            Workers Launched: 1
                            ->  Parallel Bitmap Heap Scan on run_tags  (cost=17245.40..1062366.60 rows=584776 width=37) (actual time=83.019..993.759 rows=663025 loops=2)
                                  Recheck Cond: (key = 'dagster/backfill'::text)
                                  Rows Removed by Index Recheck: 4979064
                                  Heap Blocks: exact=12852 lossy=64155
                                  ->  Bitmap Index Scan on idx_run_tags  (cost=0.00..16894.54 rows=1403463 width=0) (actual time=76.047..76.047 rows=1326136 loops=1)
                                        Index Cond: (key = 'dagster/backfill'::text)
        ->  Bitmap Heap Scan on run_tags run_tags_filter0  (cost=195.98..198.00 rows=1 width=37) (actual time=6.190..6.190 rows=1 loops=27)
              Recheck Cond: (((run_id)::text = (runs.run_id)::text) AND (key = '.dagster/repository'::text) AND (value = '__repository__@my_repository'::text))
              Heap Blocks: exact=27
              ->  BitmapAnd  (cost=195.98..195.98 rows=1 width=0) (actual time=6.183..6.183 rows=0 loops=27)
                    ->  Bitmap Index Scan on idx_run_tags_run_idx  (cost=0.00..6.22 rows=222 width=0) (actual time=0.019..0.019 rows=13 loops=27)
                          Index Cond: ((run_id)::text = (runs.run_id)::text)
                    ->  Bitmap Index Scan on idx_run_tags  (cost=0.00..186.31 rows=12775 width=0) (actual time=6.160..6.160 rows=143060 loops=27)
                          Index Cond: ((key = '.dagster/repository'::text) AND (value = '__repository__@my_repository'::text))
Planning Time: 0.366 ms
Execution Time: 7005.495 ms

List runs with status filter

explain analyze SELECT runs.id, runs.run_body, runs.status, runs.create_timestamp, runs.update_timestamp, runs.start_time, runs.end_time
FROM runs
JOIN run_tags AS run_tags_filter0 ON runs.run_id = run_tags_filter0.run_id AND run_tags_filter0.key = '.dagster/repository' AND run_tags_filter0.value = '__repository__@my_repository'
WHERE runs.pipeline_name = 'my_job' AND runs.status IN ('SUCCESS') AND (runs.run_id NOT IN (SELECT run_tags.run_id
FROM run_tags
WHERE run_tags.key = 'dagster/backfill')) ORDER BY runs.id DESC
 LIMIT 27
>

Limit  (cost=1048545598.05..1048545598.06 rows=2 width=1262) (actual time=80151.849..80154.887 rows=27 loops=1)
  ->  Sort  (cost=1048545598.05..1048545598.06 rows=2 width=1262) (actual time=80151.847..80154.884 rows=27 loops=1)
        Sort Key: runs.id DESC
        Sort Method: quicksort  Memory: 30kB
        ->  Nested Loop  (cost=66263.12..1048545598.04 rows=2 width=1262) (actual time=3514.850..80154.769 rows=42 loops=1)
              ->  Bitmap Heap Scan on runs  (cost=66067.13..1048373518.80 rows=869 width=1299) (actual time=3507.300..79791.315 rows=42 loops=1)
                    Recheck Cond: ((pipeline_name = 'my_job'::text) AND ((status)::text = 'SUCCESS'::text))
                    Rows Removed by Index Recheck: 1
                    Filter: (NOT (SubPlan 1))
                    Rows Removed by Filter: 297
                    Heap Blocks: exact=350
                    ->  BitmapAnd  (cost=47821.11..47821.11 rows=1737 width=0) (actual time=241.016..241.017 rows=0 loops=1)
                          ->  Bitmap Index Scan on idx_runs_by_job  (cost=0.00..74.84 rows=1905 width=0) (actual time=0.126..0.126 rows=896 loops=1)
                                Index Cond: (pipeline_name = 'my_job'::text)
                          ->  Bitmap Index Scan on idx_run_status  (cost=0.00..47745.58 rows=4763086 width=0) (actual time=240.840..240.840 rows=4803907 loops=1)
                                Index Cond: ((status)::text = 'SUCCESS'::text)
                    SubPlan 1
                      ->  Materialize  (cost=18246.02..1221765.40 rows=1403543 width=37) (actual time=0.332..125.170 rows=1323914 loops=339)
                            ->  Gather  (cost=18246.02..1203781.68 rows=1403543 width=37) (actual time=111.294..1831.291 rows=1326229 loops=1)
                                  Workers Planned: 2
                                  Workers Launched: 1
                                  ->  Parallel Bitmap Heap Scan on run_tags  (cost=17246.02..1062427.38 rows=584810 width=37) (actual time=109.166..1833.278 rows=663114 loops=2)
                                        Recheck Cond: (key = 'dagster/backfill'::text)
                                        Rows Removed by Index Recheck: 4979062
                                        Heap Blocks: exact=16064 lossy=69430
                                        ->  Bitmap Index Scan on idx_run_tags  (cost=0.00..16895.14 rows=1403543 width=0) (actual time=99.591..99.592 rows=1326315 loops=1)
                                              Index Cond: (key = 'dagster/backfill'::text)
              ->  Bitmap Heap Scan on run_tags run_tags_filter0  (cost=195.99..198.01 rows=1 width=37) (actual time=8.643..8.643 rows=1 loops=42)
                    Recheck Cond: (((run_id)::text = (runs.run_id)::text) AND (key = '.dagster/repository'::text) AND (value = '__repository__@my_repository'::text))
                    Heap Blocks: exact=42
                    ->  BitmapAnd  (cost=195.99..195.99 rows=1 width=0) (actual time=8.635..8.636 rows=0 loops=42)
                          ->  Bitmap Index Scan on idx_run_tags_run_idx  (cost=0.00..6.22 rows=222 width=0) (actual time=0.040..0.040 rows=13 loops=42)
                                Index Cond: ((run_id)::text = (runs.run_id)::text)
                          ->  Bitmap Index Scan on idx_run_tags  (cost=0.00..186.32 rows=12776 width=0) (actual time=8.590..8.590 rows=143060 loops=42)
                                Index Cond: ((key = '.dagster/repository'::text) AND (value = '__repository__@my_repository'::text))
Planning Time: 0.317 ms
Execution Time: 80160.748 ms

What did you expect to happen?

Filtered records should be returned at least in the same time as unfiltered.
UI should not be timing out.

How to reproduce?

No response

Dagster version

1.8.13

Deployment type

Dagster Helm chart

Deployment details

No response

Additional information

DB instance: Postgres 15.7 (General Purpose, D4ds_v5, 4 vCores, 16 GiB RAM, 512 GiB storage)

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
By submitting this issue, you agree to follow Dagster's Code of Conduct.

@HynekBlaha HynekBlaha added the type: bug Something isn't working label Oct 31, 2024
@HynekBlaha HynekBlaha changed the title Filtering backfill runs on status takes 10x more time that without filter New Runs Page: Filtering backfill runs on status takes 10x more time that without filter Oct 31, 2024
@HynekBlaha HynekBlaha changed the title New Runs Page: Filtering backfill runs on status takes 10x more time that without filter New Runs Page: Filtering runs on status takes 10x more time that without filter Oct 31, 2024
@jamiedemaria
Copy link
Contributor

hey @HynekBlaha we're releasing a database migration in this week's release (1.9.0) that should improve performance. We recommend running those migrations or switching to the legacy runs feed

@garethbrickman garethbrickman changed the title New Runs Page: Filtering runs on status takes 10x more time that without filter New Runs Page: Filtering runs on status takes 10x more time than without filter Oct 31, 2024
@garethbrickman garethbrickman added the area: UI/UX Related to User Interface and User Experience label Oct 31, 2024
@HynekBlaha
Copy link
Contributor Author

Hi @jamiedemaria, I confirm that after running dagster instance migrate the runs page is much more responsive.
Thank you, closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: UI/UX Related to User Interface and User Experience type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants