-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Db partition analytics cmeyers2 #10023
Db partition analytics cmeyers2 #10023
Conversation
Build failed.
|
4c3bf9c
to
a6922be
Compare
Build failed.
|
a6922be
to
33a4665
Compare
Build failed.
|
Build failed.
|
e89cb41
to
e814bee
Compare
Build failed.
|
e814bee
to
a249487
Compare
Build failed.
|
Build failed.
|
271e6c7
to
641db8a
Compare
Build succeeded.
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
3a417ad
to
19a7fbb
Compare
Build succeeded.
|
Build succeeded.
|
Build failed.
|
Build failed.
|
Build failed.
|
5ff5811
to
0010040
Compare
Build failed.
|
Build succeeded.
|
ba79159
to
920dcc8
Compare
* Old, _unpartitioned_main_jobevent table does not have the job_created column * New, main_jobevent does. * Always in clude the job_created column. NULL if old, job_created if new * Bump events_table schema version from 1.2 to 1.3 because of the job_created field
* The order by results in an in-memory sort that COULD blow out the worker mem buffer and result in sorting having to take place on disk. * This WILL happen with a default postgres 4MB mem buffer. We saw as much as 20MB used. Note that AWX defaults postgres mem worker buffer to 3% of the DB memory on external installs and 1% on same-node installs. So for a 16GB remote DB this would not be a problem. * We are going to avoid this problem all together by NOT doing a sort when gathering. Instead, we will sort remotely, in analytics.
* Before, we would get the min and max pk of the set we are to gather. This changeset removes that. * Before, we would, basically, know the size of the set we are to gather and would query 100,000 of those job event records at a time. That logic is now gone. * Now, for unpartitioned job events we gather 4 hours at a time by created time. * Now, for partitioned job events we gather 4 hours at a time by modified time.
* trigger via jobs/<id>/job_events/?limit=10 * Can and should be used in conjunction with an indexed set of fields to generate efficient pagination queries. i.e. jobs/<id>/job_events?limit=10&start_line__gte=10 * If limit is not specified in the query params then the default pagination will be used.
* Do not cascade delete unified job events. We will clean those up in cleanup_job runs * Add limit pagination to all unified job events endpoints
* Use an initial request for max event `counter` to get the total row count, otherwise rely on websocket message counters to update remote row count * For running jobs, request event ranges with counters to handle events getting saved to db out of display order * For jobs that are no longer running, continue to use page/pageSize scheme for paging through the job events
* job_created is a fake field as far as Django is concerned. Under the hood, in postgres, this is the partition key so it is real. sqlite doesn't support partitioning so we need to fake some things. Specifically, we need to remove job_created from being auto-added to get_event_queryset() * Add pagination tests for <unified_job_name>/<id>/<job_events>?limit=x endpoint to make sure the paginator is wired up.
ada48ba
to
ffbbcd2
Compare
thx @kdelee! |
Build succeeded.
|
Build succeeded (gate pipeline).
|
Add OPTIONS documentation for new job limit feature Looking at the docs and stuff from #10023 I'm sure this is somewhere else too, but this is the place that users should naturally expect it to be. Reviewed-by: Chris Meyers <None>
tables.
80 million partitioned + 1.5 million unpartitioned Events
::json
casts, 100,000 event batches*micro benchmarking consists of simply copying a query, running it manually, and observing the runtime.
**micro benchmark time x (80 million / batch size)
**Note that this testing does NOT include the extra modified range query that is needed for correctness. We expect this to be quite fast and is only needed to catch edge case events.