Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Db partition analytics cmeyers2 #10023

Merged
merged 90 commits into from
Jun 4, 2021

Conversation

chrismeyersfsu
Copy link
Member

  • Keep old primary key based analytics gathering for unpartitioned
    tables.
  • Use created time on new partitioned tables.

80 million partitioned + 1.5 million unpartitioned Events

Query awx-manage gather_analytics --dry-run Time Micro Benchmark Query Time* Query Only Time**
sequential index scan, multiple ::json casts, 100,000 event batches 102m7.836s 6s 80 minutes
sequential index scan, optimized json cast, 100,000 event batches 48m9.276s 2.2s 30.4 minutes
sequential index scan, optimized json cast, 1,00,000 event batches 39m35.094s 10s 13.3 minutes
sequential table scan, optimized json cast, per-partition batch 600,000 *** 36m42.081s 11.5s 25.5 minutes

*micro benchmarking consists of simply copying a query, running it manually, and observing the runtime.
**micro benchmark time x (80 million / batch size)
**Note that this testing does NOT include the extra modified range query that is needed for correctness. We expect this to be quite fast and is only needed to catch edge case events.

@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

@chrismeyersfsu chrismeyersfsu force-pushed the db_partition_analytics_cmeyers2 branch 2 times, most recently from 4c3bf9c to a6922be Compare May 6, 2021 19:15
@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

@chrismeyersfsu chrismeyersfsu force-pushed the db_partition_analytics_cmeyers2 branch from a6922be to 33a4665 Compare May 12, 2021 12:41
@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

@jladdjr jladdjr force-pushed the db_partition_analytics_cmeyers2 branch from e89cb41 to e814bee Compare May 12, 2021 19:14
@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

@jladdjr jladdjr force-pushed the db_partition_analytics_cmeyers2 branch from e814bee to a249487 Compare May 12, 2021 22:02
@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

@jladdjr jladdjr force-pushed the db_partition_analytics_cmeyers2 branch from 271e6c7 to 641db8a Compare May 13, 2021 22:45
@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@softwarefactory-project-zuul
Copy link
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.

@chrismeyersfsu chrismeyersfsu force-pushed the db_partition_analytics_cmeyers2 branch from 3a417ad to 19a7fbb Compare May 20, 2021 17:58
@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

@jakemcdermott jakemcdermott force-pushed the db_partition_analytics_cmeyers2 branch from 5ff5811 to 0010040 Compare May 25, 2021 21:49
@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@chrismeyersfsu chrismeyersfsu force-pushed the db_partition_analytics_cmeyers2 branch from ba79159 to 920dcc8 Compare May 26, 2021 19:53
chrismeyersfsu and others added 19 commits June 4, 2021 09:17
* Old, _unpartitioned_main_jobevent table does not have the job_created
column
* New, main_jobevent does.
* Always in clude the job_created column. NULL if old, job_created if
new
* Bump events_table schema version from 1.2 to 1.3 because of the
job_created field
* The order by results in an in-memory sort that COULD blow out the
worker mem buffer and result in sorting having to take place on disk.
* This WILL happen with a default postgres 4MB mem buffer. We saw as
much as 20MB used. Note that AWX defaults postgres mem worker buffer to
3% of the DB memory on external installs and 1% on same-node installs.
So for a 16GB remote DB this would not be a problem.
* We are going to avoid this problem all together by NOT doing a sort
when gathering. Instead, we will sort remotely, in analytics.
* Before, we would get the min and max pk of the set we are to gather.
This changeset removes that.
* Before, we would, basically, know the size of the set we are to gather
and would query 100,000 of those job event records at a time. That logic
is now gone.
* Now, for unpartitioned job events we gather 4 hours at a time by
created time.
* Now, for partitioned job events we gather 4 hours at a time by
modified time.
* trigger via jobs/<id>/job_events/?limit=10
* Can and should be used in conjunction with an indexed set of fields to
generate efficient pagination queries. i.e.
jobs/<id>/job_events?limit=10&start_line__gte=10
* If limit is not specified in the query params then the default
pagination will be used.
* Do not cascade delete unified job events. We will clean those up in
cleanup_job runs
* Add limit pagination to all unified job events endpoints
* Use an initial request for max event `counter` to get the total row count,
otherwise rely on websocket message counters to update remote row count

* For running jobs, request event ranges with counters to handle events getting
saved to db out of display order

* For jobs that are no longer running, continue to use page/pageSize scheme for
paging through the job events
* job_created is a fake field as far as Django is concerned. Under the
hood, in postgres, this is the partition key so it is real. sqlite
doesn't support partitioning so we need to fake some things.
Specifically, we need to remove job_created from being auto-added to
get_event_queryset()
* Add pagination tests for <unified_job_name>/<id>/<job_events>?limit=x
endpoint to make sure the paginator is wired up.
@jladdjr jladdjr force-pushed the db_partition_analytics_cmeyers2 branch from ada48ba to ffbbcd2 Compare June 4, 2021 16:17
@jladdjr
Copy link
Contributor

jladdjr commented Jun 4, 2021

thx @kdelee!

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded (gate pipeline).

@softwarefactory-project-zuul softwarefactory-project-zuul bot merged commit 0f6e221 into devel Jun 4, 2021
softwarefactory-project-zuul bot added a commit that referenced this pull request Jun 8, 2021
Add OPTIONS documentation for new job limit feature

Looking at the docs and stuff from #10023
I'm sure this is somewhere else too, but this is the place that users should naturally expect it to be.

Reviewed-by: Chris Meyers <None>
@shanemcd shanemcd deleted the db_partition_analytics_cmeyers2 branch August 30, 2021 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants