-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce API response times by 30% by using memcache migration flag #5239
Reduce API response times by 30% by using memcache migration flag #5239
Conversation
Do you have some examples? When we talk about 30%, what is the actual amount total time saved in seconds on a fairly naive request (i.e., are we talking hundredths of a second per request, or tenths?) Are we talking about requests that don't otherwise have database access (e.g., |
awx/main/middleware.py
Outdated
return redirect(reverse("ui:migrations_notran")) | ||
if cache.get('migration_in_progress', False): | ||
executor = MigrationExecutor(connection) | ||
plan = executor.migration_plan(executor.loader.graph.leaf_nodes()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I take it this is getting to be fairly expensive per-request?
awx/main/apps.py
Outdated
from django.utils.translation import ugettext_lazy as _ | ||
|
||
|
||
def raise_migration_flag(**kwargs): | ||
from awx.main.tasks import set_migration_flag | ||
set_migration_flag() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably actually want set_migration_flag.apply_async()
so that it gets broadcast everywhere, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm interesting in chatting about numbers, but I really like this idea in general if we can show that the migration planner is contributing notable overhead to certain requests.
Build failed.
|
26a2a88
to
c551677
Compare
Oh I forgot the other thing that's still TBD on this. Hopefully reasonable statement: we should not run the task manager while migrations are running. It's probably benign either way, because right now the task manager will run and just rollback in the event of errors, until it finishes and it completes. But there's a reasonable user experience case to show a log "migration is running" versus ERROR ERROR ERROR. |
Build succeeded.
|
In spite of substantial hardware differences, I actually get almost the same number for that timing, as low as 6 seconds. Here is the branch I used for the timings previously, which I just rebased: https://github.com/ansible/awx/compare/devel...AlanCoding:middleware_timings?expand=1 I extremely consistently get higher numbers using that method. When I first did this my number was 157 ms. I re-tested just now over a slightly longer time frame and got 200 ms. This sounds suspiciously high, I know. This is why I was doing the more complicated task of bracketing the middleware methods - because those numbers remained stubbornly higher than when tested in isolation. I cannot tell you exactly why, my speculation falls short of justifying the large magnitude of the difference. Your 6 second number averages out to 63 ms, which isn't exactly small either when considering that it applies to all requests. |
Even if we're not talking about saving multiple tenths of seconds, I do think this change is simple enough, and worth it. It's always bothered me that we rebuild the migration executor on every single request just for the purpose of showing an upgrade screen. I like the idea of setting a flag in the cache once migrations start, and having the local instances unset their flag as requests land there and they discover migration is done. |
I suppose it would be reasonable to add a check to the task manager's startup. This should definitely lean on the same memcached flag check, though. |
c551677
to
09fb989
Compare
Build succeeded.
|
Build succeeded.
|
did some basic manual testing. This shows the same behavior as we have now, that if you migrate back 1, the migration screen will never go away. Also observed the log messages related to the task manager. Filed #5302 related to this. |
I combined this content with my timing stuff and got some updated numbers.
that's for 87 data points. This is not as good as I had hoped at 0.02 seconds, but much improved over 0.3 I was seeing before with same hardware & settings. Anyway, I think I'm finished at this point. Checking these numbers was my last TODO item. |
@ryanpetrello are you in favor of merge? |
Oh, I thought I merged this. Yep, let's merge it. |
Build failed (gate pipeline). For information on how to proceed, see
|
regate |
(needs rebase before merging) |
Build failed (gate pipeline). For information on how to proceed, see
|
9ba88a0
to
a0910eb
Compare
Build succeeded.
|
Build succeeded (gate pipeline).
|
SUMMARY
When profiling times taken in requests (or just ever looking at the debug toolbar output), the migration middleware came up as a big resource consumer. It was tricky to pin down, because it wasn't much if tested with isolated requests.
I did timings with the debug settings off and just logged middleware times and did some typical UI use. I found general numbers of ~40% of the time was spent in middleware and ~30% was spent in the migration middleware alone.
This is not hard to reduce, but this PR still needs some more testing via those prior means and verification of continued functionality.
ISSUE TYPE
COMPONENT NAME
AWX VERSION
ADDITIONAL INFORMATION