[backend] performance issues with list runs API #9780

deepk2u · 2023-07-25T21:02:49Z

Environment

How did you deploy Kubeflow Pipelines (KFP)?
full kubeflow deployment using manifests repo
KFP version:
sdk-2.0.0b7
KFP SDK version:
sdk-2.0.0b7

Steps to reproduce

create more than 200k runs in at least one namespace
other namespaces can have any number of runs
click on the runs tab on Kubeflow UI, it starts to timeout

I did some digging and found that in case if the number of runs is really high, then the select query starts to take more than a minute.

select * from run_details where Namespace = 'namespace-with-200k-runs' limit 1; this query took 1 minute 46 sec

I tried to query using the experiment tab, where we pass the experiment id, and that query is still performing as expected.
select * from run_details where ExperimentUUID = 'a0dd8afa-d481-4c83-b2c2-31ef4b3d12ec' limit 1; this query is taking milliseconds

On a side note, 200k runs are really high, and I checked. Someone created a few Recurring Runs for that particular run using a cron schedule, which was running every second. That was a mistake. I feel we should add some warning below the cron schedule box if it is per minute or per second, or there should be a mechanism to not allow these kinds of cron schedules for administrators.

Expected result

Listing runs should not time out, irrespective of how much data we have in the database.

Materials and Reference

Below is a snapshot of the data we have in the run_details table.

mysql> select count(*) from run_details;
+----------+
| count(*) |
+----------+
|   227025 |
+----------+
1 row in set (1.91 sec)

mysql> SELECT Namespace,COUNT(*) as count FROM run_details GROUP BY Namespace ORDER BY count DESC;
+-----------------------------------------------------+--------+
| Namespace                                           | count  |
+-----------------------------------------------------+--------+
| n1      | 219437 |
| n2       |   2032 |
| n3                 |   1478 |
| n4            |   1090 |
| n5        |    384 |
| n6      |    367 |
| n7            |    285 |
| n8    |    283 |
.....
56 rows in set (1 min 45.73 sec)

To fix this query, I manually created an Index on the Namespace column and everything seems to be running fine for now.

mysql> show indexes from run_details;
+-------------+------------+-------------------------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table       | Non_unique | Key_name                                  | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+-------------------------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| run_details |          0 | PRIMARY                                   |            1 | UUID            | A         |      108032 |     NULL | NULL   |      | BTREE      |         |               |
| run_details |          1 | experimentuuid_createatinsec              |            1 | ExperimentUUID  | A         |         148 |     NULL | NULL   |      | BTREE      |         |               |
| run_details |          1 | experimentuuid_createatinsec              |            2 | CreatedAtInSec  | A         |      112366 |     NULL | NULL   |      | BTREE      |         |               |
| run_details |          1 | experimentuuid_conditions_finishedatinsec |            1 | ExperimentUUID  | A         |         150 |     NULL | NULL   |      | BTREE      |         |               |
| run_details |          1 | experimentuuid_conditions_finishedatinsec |            2 | Conditions      | A         |         254 |     NULL | NULL   |      | BTREE      |         |               |
| run_details |          1 | experimentuuid_conditions_finishedatinsec |            3 | FinishedAtInSec | A         |      112366 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------------+------------+-------------------------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
6 rows in set (0.00 sec)

mysql> CREATE INDEX namespace ON run_details (Namespace);
Query OK, 0 rows affected (8 min 52.07 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql>
mysql>
mysql> show indexes from run_details;
+-------------+------------+-------------------------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table       | Non_unique | Key_name                                  | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+-------------------------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| run_details |          0 | PRIMARY                                   |            1 | UUID            | A         |      108032 |     NULL | NULL   |      | BTREE      |         |               |
| run_details |          1 | experimentuuid_createatinsec              |            1 | ExperimentUUID  | A         |         148 |     NULL | NULL   |      | BTREE      |         |               |
| run_details |          1 | experimentuuid_createatinsec              |            2 | CreatedAtInSec  | A         |      112367 |     NULL | NULL   |      | BTREE      |         |               |
| run_details |          1 | experimentuuid_conditions_finishedatinsec |            1 | ExperimentUUID  | A         |         150 |     NULL | NULL   |      | BTREE      |         |               |
| run_details |          1 | experimentuuid_conditions_finishedatinsec |            2 | Conditions      | A         |         254 |     NULL | NULL   |      | BTREE      |         |               |
| run_details |          1 | experimentuuid_conditions_finishedatinsec |            3 | FinishedAtInSec | A         |      112367 |     NULL | NULL   | YES  | BTREE      |         |               |
| run_details |          1 | namespace                                 |            1 | Namespace       | A         |          55 |     NULL | NULL   |      | BTREE      |         |               |
+-------------+------------+-------------------------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
7 rows in set (0.00 sec)

Impacted by this bug? Give it a 👍.

The text was updated successfully, but these errors were encountered:

Linchin · 2023-07-27T22:43:01Z

Hi @deepk2u, thank you for bringing this issue up and offer a solution! If you would like to create a PR to fix this problem, it would be great.

tam0201 · 2023-08-01T08:38:41Z

We also impacted by this issue, most of our namespace contains > 200k runs. Anh now we cannot view any run on UI. Hope there will be a fix soon!

deepk2u · 2023-08-01T22:36:36Z

@Linchin @tam0201 I have tried to fix it in my PR #9806 . Please take a look when you have time.

…beflow#9806) * Update client_manager.go * Update client_manager.go

* Update client_manager.go * Update client_manager.go

…beflow#9806) * Update client_manager.go * Update client_manager.go

deepk2u added area/backend kind/bug labels Jul 25, 2023

Linchin mentioned this issue Aug 4, 2023

fix(backend): fix timeouts with list run api. Fixes #9780 #9806

Merged

1 task

google-oss-prow bot closed this as completed in a6af41c Aug 4, 2023

zijianjoy pushed a commit to zijianjoy/pipelines that referenced this issue Aug 10, 2023

fix(backend): fix timeouts with list run api. Fixes kubeflow#9780 (ku…

9fa502d

…beflow#9806) * Update client_manager.go * Update client_manager.go

chensun pushed a commit that referenced this issue Aug 17, 2023

fix(backend): fix timeouts with list run api. Fixes #9780 (#9806)

c467ece

* Update client_manager.go * Update client_manager.go

juliusvonkohout mentioned this issue Oct 13, 2023

[backend] severe performance problem in listruns API #9890

Closed

stijntratsaertit pushed a commit to stijntratsaertit/kfp that referenced this issue Feb 16, 2024

fix(backend): fix timeouts with list run api. Fixes kubeflow#9780 (ku…

a573fe5

…beflow#9806) * Update client_manager.go * Update client_manager.go

thesuperzapper mentioned this issue Apr 2, 2024

[frontend] Cannot list runs and/or artifacts (upstream request timeout) #10230

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[backend] performance issues with list runs API #9780

[backend] performance issues with list runs API #9780

deepk2u commented Jul 25, 2023

Linchin commented Jul 27, 2023

tam0201 commented Aug 1, 2023

deepk2u commented Aug 1, 2023

[backend] performance issues with list runs API #9780

[backend] performance issues with list runs API #9780

Comments

deepk2u commented Jul 25, 2023

Environment

Steps to reproduce

Expected result

Materials and Reference

Linchin commented Jul 27, 2023

tam0201 commented Aug 1, 2023

deepk2u commented Aug 1, 2023