Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working on funcs to move default partition data to new partitions #2547

Merged
merged 17 commits into from
Jan 25, 2021

Conversation

Red-HAP
Copy link
Contributor

@Red-HAP Red-HAP commented Dec 10, 2020

Ticket

COST-812

Description

Functions that will move partitioned data from the default partition by month to newly created month partitions.

It's broken up into a few components:

  • class PartitionDefaultData : This is the meat of the processing. The flow works like this: Given a schema, partitioned table, and default table partition
    1. Find all partition bounds in the default partition's data
    2. For each of these bounds, create a new partition with a date range for 100 years from the bounds (This is to let PG create all of the internal bits and bobs that a table partition will utilize)
    3. Detach this new partition (This will disable the partition checks)
    4. Use a CTE delete/returning with an insert to move the data for the specific partition range bounds from the default partition directly into the new, detached partition
    5. Attach the new partition with the correct bounds.
    6. Goto second step until no more bounds
  • function repartition_default_data : This is a driver function that will get default partition information from all tenant schemata and use PartitionDefaultData to fix 'em

Designing the code this way was necessary for two reasons:

  1. PostgreSQL will not allow a new partition to be created with a range overlapping any data from the default partition
  2. The partition management triggers are locked into specific workflows at the moment.

Utilization of the code can be done as a task to crawl the entire database or to simply execute once processing has been completed, whichever is more relevant.

This code is not meant to bypass the most efficient means of managing partitions which is to create them as needed during data processing.

Currently, this branch contains a migration that will apply the functionality on a one-time basis.

Review Request

Please have a close look at the tests to see if there is something I missed in testing. This must be bulletproof as it is moving customer data (delete + insert).

Testing

This is easiest to check with the reporting_ocpusagelineitem_daily_summary table (OCPUsageLineItemDailySummary model)

Hard

  • Insert new records with a usage_start value 10 years in the future to get data into the default partition.
  • Then either use the class or the driver func to operate on the table.
  • Then check with psql to examine that the default partition is now empty and new partitions were created.

Easy

  • Checkout master
  • Insert new records with a usage_start value 10 years in the future to get data into the default partition.
  • Checkout move_default_partition_data branch
  • make run-migrations
  • Check that the default partition table is empty
  • Check that there is a new partition for the record(s) you entered
  • Check that only these record(s) are in the new and correct partition(s).

@codecov
Copy link

codecov bot commented Dec 10, 2020

Codecov Report

Merging #2547 (ff9fc54) into master (8d61f4b) will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##           master   #2547    +/-   ##
=======================================
  Coverage    94.7%   94.8%            
=======================================
  Files         281     281            
  Lines       21337   21514   +177     
  Branches     2433    2443    +10     
=======================================
+ Hits        20215   20389   +174     
- Misses        673     674     +1     
- Partials      449     451     +2     

IF (copy_data OR
(object_rec.table_name ~ 'partitioned_tables') OR
(object_rec.table_name ~ 'django_migrations')) AND
(object_rec.table_kind = 'r')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to fix a bug copying partitioned table data.

' DETACH PARTITION ' ||
quote_ident(OLD.schema_name) || '.' || quote_ident(OLD.table_name) ||
' ;';
IF ( OLD.active )
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multiple active checks here prevent unnecessary (and unwanted) action when updating the partitiion_parameters when the record is inactive.

quote_ident(OLD.schema_name) || '.' || quote_ident(OLD.table_name) || ' ';
IF ( (NEW.partition_parameters->>'default') = 'true' )
THEN
action_stmt = action_stmt || 'DEFAULT ;';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes malformed statement build

grafana/Dockerfile-grafana Outdated Show resolved Hide resolved
cur = conn_execute(part_track_insert_sql, vals, _conn=self.conn)
self.tracking_rec = fetchone(cur)

def _create_partition(self):
Copy link
Contributor Author

@Red-HAP Red-HAP Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_create_partition, _attach_partition, and _detach_partition were written to take advantage of the trigger code.


return res

def repartition_default_data(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main method to operate on a partitioned table's default partition.

koku/koku/pg_partition.py Outdated Show resolved Hide resolved
@Red-HAP Red-HAP marked this pull request as ready for review December 14, 2020 20:47
@Red-HAP Red-HAP changed the title WIP working on funcs to move default partition data to new partitions Working on funcs to move default partition data to new partitions Dec 14, 2020
@adberglund
Copy link
Contributor

@Red-HAP can you include the bugfix in the OCP processor here so we do start creating partitions

See: https://github.com/project-koku/koku/blob/master/koku/masu/processor/ocp/ocp_report_processor.py#L143
where we have to do a string replace to get a format that the parser accepts.

Here is the bug: https://github.com/project-koku/koku/blob/master/koku/masu/processor/ocp/ocp_report_processor.py#L241

on that line we are not doing the replace, and the try actually always fails on the parser, but we fail silently and no partition is created.

@Red-HAP Red-HAP requested review from dccurtis and a team and removed request for adberglund and dccurtis January 25, 2021 14:47
Copy link
Contributor

@adberglund adberglund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Checked out master
  2. Ran make load-test-customer-data start=2020-11-01 end=2021-01-25
  3. Checked and only saw default partition
  4. Checked out move_default_partition_data
  5. Ran run-migratons
  6. Checked partitions and confirmed all data is where it should be.

@Red-HAP Red-HAP merged commit f24016a into master Jan 25, 2021
@Red-HAP Red-HAP deleted the move_default_partition_data branch January 25, 2021 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants