-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[backend] performance issues with list runs API #9780
Labels
Comments
Hi @deepk2u, thank you for bringing this issue up and offer a solution! If you would like to create a PR to fix this problem, it would be great. |
We also impacted by this issue, most of our namespace contains > 200k runs. Anh now we cannot view any run on UI. Hope there will be a fix soon! |
1 task
zijianjoy
pushed a commit
to zijianjoy/pipelines
that referenced
this issue
Aug 10, 2023
…beflow#9806) * Update client_manager.go * Update client_manager.go
chensun
pushed a commit
that referenced
this issue
Aug 17, 2023
stijntratsaertit
pushed a commit
to stijntratsaertit/kfp
that referenced
this issue
Feb 16, 2024
…beflow#9806) * Update client_manager.go * Update client_manager.go
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Environment
full kubeflow deployment using manifests repo
sdk-2.0.0b7
sdk-2.0.0b7
Steps to reproduce
I did some digging and found that in case if the number of runs is really high, then the select query starts to take more than a minute.
select * from run_details where Namespace = 'namespace-with-200k-runs' limit 1;
this query took 1 minute 46 secI tried to query using the experiment tab, where we pass the experiment id, and that query is still performing as expected.
select * from run_details where ExperimentUUID = 'a0dd8afa-d481-4c83-b2c2-31ef4b3d12ec' limit 1;
this query is taking millisecondsOn a side note, 200k runs are really high, and I checked. Someone created a few Recurring Runs for that particular run using a cron schedule, which was running every second. That was a mistake. I feel we should add some warning below the cron schedule box if it is per minute or per second, or there should be a mechanism to not allow these kinds of cron schedules for administrators.
Expected result
Listing runs should not time out, irrespective of how much data we have in the database.
Materials and Reference
Below is a snapshot of the data we have in the run_details table.
To fix this query, I manually created an Index on the Namespace column and everything seems to be running fine for now.
Impacted by this bug? Give it a 👍.
The text was updated successfully, but these errors were encountered: