Add unique indexes for Containers tables #19

Ladas · 2017-06-23T12:01:48Z

Add unique indexes for Containers tables

Ladas · 2017-06-23T12:02:25Z

Ladas · 2017-06-23T12:08:25Z

cc @Fryguy @agrare @cben @simon3z what do you guys think is missing or should be different? Few questions/discussion points below:

I've decided to index container_image as [:ems_id, :image_ref], since the image_ref has the host and port of registry inside
Can I remove the :protocol from [:container_service_id, :ems_ref, :protocol] now?
:container_definitions and :containers were nested before, can I trust the :ems_ref is unique under a Manager?
Why is :container_volume not using :ems_ref?
then the last question is, if we are clear to delete a duplicate, or we need to reconnect relations first. I assume we will mostly see duplicated disconnected ContainerImages

Fryguy · 2017-06-23T15:21:05Z

How do you deal with archived / soft-deleted if you are indexing on ems_id ems_ref? That is, if you have multiple providers and they both archive something with the same ems_ref, you get an error.

kbrock · 2017-06-23T16:02:17Z

@Fryguy I think that use case will still be good. we will not be setting ems_id to nil.

Fryguy · 2017-06-23T16:07:25Z

@kbrock I wonder if ems_refs ever get reused in a provider (or in a future provider). That is, if you delete an object, does the provider then have the right to reuse the id number? If they do, then this would break this idea.

cben · 2017-06-25T14:36:48Z

I've decided to index container_image as [:ems_id, :image_ref], since the image_ref has the host and port of registry inside

@enoodle please advise. From ManageIQ/manageiq#14185 it sounds like [:ems_id, :image_ref] is not good enough?

Can I remove the :protocol from [:container_service_id, :ems_ref, :protocol] now?

I don't think we did anything about ManageIQ/manageiq-providers-kubernetes#35
parse_service_port_config sets :ems_ref => "#{service_id}_#{port_config.port}_#{port_config.targetPort}" (where service_id is the service's ems_ref)
so I think you still need protocol for uniqueness ATM.

Perhaps since this is a synthetic ems_ref it's better not to rely on it but on the underlying fields you want eg. `[:port, :target_port, :protocol

:container_definitions and :containers were nested before, can I trust the :ems_ref is unique under a Manager?

Why is :container_volume not using :ems_ref?

cc @zakiva. IIRC ContainerVolume and PersistentVolume share a table (STI) but only one sets ems_ref?

then the last question is, if we are clear to delete a duplicate, or we need to reconnect relations first. I assume we will mostly see duplicated disconnected ContainerImages

zakiva · 2017-06-25T15:00:26Z

Why is :container_volume not using :ems_ref?
cc @zakiva. IIRC ContainerVolume and PersistentVolume share a table (STI) but only one sets ems_ref?

Correct, only Persistent Volumes use :ems_ref

enoodle · 2017-06-25T15:12:03Z

I've decided to index container_image as [:ems_id, :image_ref], since the image_ref has the host and port of registry inside
@enoodle please advise. From ManageIQ/manageiq#14185 it sounds like [:ems_id, :image_ref] is not good enough?

There is a possibility of two images from different registries that are the same. comparing with just image_ref will show them a different. The best identifier is the digest, this is why it is used if it exists.

cben · 2017-06-25T15:13:18Z

I wonder if ems_refs ever get reused in a provider (or in a future provider).

Kubernetes UIDs should be permanently unique [https://kubernetes.io/docs/concepts/overview/working-with-objects/names/]
But container refresh_parser synthesizes ems_refs on several tables from other parts, and I'm afraid those might get reused. (it'd be jolly good if we called these something other than ems_ref...)

services might get ems_ref "#{new_result[:namespace]}_#{new_result[:name]}", sounds like can get reused!
- this suggests ServicePortConfig ems_ref derived from it can get reused too.
pods use metadata.uid from k8s
- some pod's children are derived from it, and sound unique unless you can change them via kubectl edit pod.

We've generally assumed pods are immutable once created. Let's see. Trying to oc edit one port's name I get:

# * spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`

which is good, but yes I can oc edit the image: field! (Makes it soon go to ErrImagePull -> CrashLoopBackOff status, editing back makes it go Running again, and increments restartCount).

ContainerDefinition & Container ems_ref include the image, but they also include the immutable name. I'm not sure off-hand if it's a problem.

Ladas · 2017-07-03T14:40:19Z

@Fryguy @kbrock right now we count with fact that ems_id, ems_ref is permanently unique, when we reconnect it. The elements that can be reused should not be archivable. Every sane system should use permanently unique uuids though, otherwise the reporting will be a pain. :-)

@enoodle so right now, we just duplicated the image, even if the checksum is the same https://github.com/Ladas/manageiq/blob/6e2ed956d8ec84ec0131e9efce9587ad4edd3c85/app/models/ems_refresh/save_inventory_container.rb#L313-L313 . So I just see that the :container_image_registry_id is redundant here, because image ref already contains unique keys of the container_image_registry.

Or is there a case when we put digest into the image_ref?

@cben cool, I'll fix the inline FIXMEs then. So I think we just need to agree on 5.

enoodle · 2017-07-03T15:08:17Z

@Ladas We should have the digest inside the image_ref so it is ok as it is in my opinion.

cben · 2017-07-05T09:19:31Z

[responding from gitter.]

cben what is the story of kubernetes not using UUIDs for references? It's kind of a weird solution
it's not accessible in API even?
usually the providers have both UUIDs and human readable names

So an interesting point is that the names are a source of truth! It's not just the API giving us incomplete information; mostly the API gives us exactly the "desired state" the user declared in etcd, which kubelets and other control loops in k8s are continuously converging to.
For example consider a route from host: "foo.example.com" to kind: "Service", name: "foo".
Yesterday it pointed to Fred's service with name: foo, uid: xxxxxxxx-....
Today he destroyed this service, and Barney created a new one with name: foo, uid: yyyyyyyy-....
This makes requests arriving to that hostname go the new service, with no changes to route object !

I believe such link changes would be picked up by full refresh (untested!) but may be challenging for targetted refresh.
~~Not~~ a problem for @simon3z's plan of full collect & parse + partial save by checksums. EDIT: yes it can be a problem as I'm pushing name resolution to persistor. Route parsing would return exactly same lazy_find, so checksum filter will drop it and persistor will never re-resolve the name.

@moolitayer and a few others raised the interesting idea that maybe then we shouldn't resolve links to uids or ids at all. We could (1) store names and do explicit joins by name at access time (2) consider making names the actual primary & foreign key instead of ids, so relations work exactly same?!
Unfortunately, it's unclear whether this can be compatible with past reporting and archiving.
Suppose someone wanted "chargeback by hostname". foo.example.com should be charged to Fred yesterday and Barney today — but if name is the primary key, Barney's service actually overwrote Fred's in the same DB row :-(

Anyway this'd require a lot of thinking... Presently we're keeping row create/update/delete by UID, with links name links resolved at refresh time.

lpichler · 2017-07-14T15:43:37Z

db/migrate/20170714102630_clean_up_duplicates_in_containers_tables.rb

+    ContainerTemplateParameter => [:container_template_id, :name],
+    ContainerVolume            => [:parent_id, :parent_type, :name],
+    CustomAttribute            => [:resource_id, :resource_type, :name, :unique_name, :section, :source],
+    Hardware                   => [:vm_or_template_id, :host_id, :computer_system_id],


There are no stubs for Hardware and OperatingSystem. 🏭

Ladas · 2017-07-25T15:00:30Z

@Fryguy @cben talked with @agrare , it will be safer to merge these as a first thing for next release, leaving us a full release to modify the indexes in a case of bugs.

Ladas · 2017-11-13T12:58:06Z

@cben @agrare reviving this PR, can you review? One last FIXME is https://github.com/ManageIQ/manageiq-schema/pull/19/files#diff-9f048b534014c0bc40f2e38460cc7bd5R65

then we should figure out the ManageIQ/manageiq#16454

cben · 2017-11-14T09:32:25Z

What about archived rows? IIRC idea was to have unique index scoped to deleted_on IS NULL, but I don't see anything like that in the PR?

I recently added container_quota_scopes table (#111), missing here. [:container_quota_id, :scope] should be unique.

We're going to abuse archiving in container_quota* tables for keeping modification history and not just deletions. I don't think this changes anything for indexes, non-archived portion will still be unique.

cben · 2017-11-14T09:33:32Z

db/migrate/20171101102630_clean_up_duplicates_in_containers_tables.rb

+  end
+
+  def duplicate_data_query_returning_min_id(model, unique_index_columns)
+    model.group(unique_index_columns).select("min(id)")


perhaps max(id) is better, likely to be more up-to-date?
(shouldn't matter unless we screwed up, but why not)

right, TBH I am not sure, might be that max is better. :-)

cben · 2018-02-11T14:37:33Z

To be honest, this got stuck in "I ought to give this some scrunity, but don't know what to look for" state. Sorry :-/ I did review this, no objections except quota. But thanks to Ari's reminder, I'll try to re-review archiving vs uniqueness in all archived tables.

Ladas · 2018-02-14T13:32:59Z

db/migrate/20180109102659_add_unique_indexes_to_containers_tables.rb

+              :unique => true,
+              :name   => "index_hardwares_on_computer_system_id_"
+
+    remove_index :operating_systems, :vm_or_template_id


@cben so, this is how we will have to model unique indexes, when table are shared and have nullable parts
(we should redesign the DB though, having N:M mapping tables for vm_or_template_id, host_id and computer_system_id, then those tables can have correct NOT NULL constraints and unique indexes )

Then for other cases, where index is several columns that are nullable, I think we will have to build the unique :ems_ref out of them

Ladas · 2018-02-14T13:33:45Z

db/migrate/20180109102659_add_unique_indexes_to_containers_tables.rb

+              :unique => true,
+              :name   => "index_container_images_unique_multi_column"
+    add_index :container_image_registries,
+              %i(ems_id host port),


@cben should we have :ems_ref, or can we always fill some default port?

cben · 2018-02-14T14:00:22Z

hmm, adding the quota values to unique index does NOT solve the same problem as adding them to refresh manager_ref.

refresh only looks at active records and care when it changes — whether it should archive & create new one
but over time, the value might be repeated! first 7, then 9, then 7 again — so adding value to uniqueness is useless!
For uniqueness, should either include deleted_on — resource is unique (*) at given moment in time, or just exclude archived rows from uniqueness.

(*) I hope. Need to validate it :)

Cleanup duplicates before adding uniq keys on Container tables

Add unique indexes to Containers tables

Spec for cleaning up duplicates

Move the migrations to a newer date

Add Comp.System, Hardware and Op. System indexes

Change date of migrations

ContainerDefinition model was deleted

ContainerServicePortConfig index should be service_id + name

container_component_statuses table was removed

Adding unique index for ContainerQuotaScope

Move migrations to a newer date

Keep latest duplicate instead of the oldest duplicate

Optimizing DB duplicates cleanup, trying a query posted on PG wiki https://wiki.postgresql.org/wiki/Deleting_duplicates It works like a charm, on a table with 600k rows it took more than 3hours (it never finished, who know how long it would have took) With this change, it cleans 600k table in about 3s.

Fix rubocop and code climate issues

Add missing tables and fields comparing to inventory_collections.rb

Manually define index name for container_volumes, since the generated is too long

Properly define partial indexes for OS and Hardware, this way it will be usable for upsert, since we always define 1 foreign_key and the rest is NULL.

Modify container_quota_items unique index according to latest plans, where we want to keep archived quota_items untouched.

CustomAtribute index without unique_name, which is used only under OpenStack

Separate index for ContainerVolume and PersistentVolume

miq-bot · 2018-07-11T12:48:46Z

Checked commits Ladas/manageiq-schema@2a88a92~...94cd6e5 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0
3 files checked, 0 offenses detected
Everything looks fine. 🏆

miq-bot · 2019-01-14T05:00:13Z

This pull request has been automatically closed because it has not been updated for at least 6 months.

Feel free to reopen this pull request if these changes are still valid.

Thank you for all your contributions!

Ladas force-pushed the add_unique_indexes_for_containers_tables branch from b25bd9f to 64941a2 Compare June 23, 2017 12:02

miq-bot assigned Fryguy Jun 23, 2017

Ladas mentioned this pull request Jun 23, 2017

[WIP] Add unique indexes for containers tables ManageIQ/manageiq#15308

Closed

1 task

Ladas force-pushed the add_unique_indexes_for_containers_tables branch from 64941a2 to d5fed05 Compare July 14, 2017 15:03

lpichler reviewed Jul 14, 2017

View reviewed changes

Ladas force-pushed the add_unique_indexes_for_containers_tables branch from d5fed05 to 5bbe55b Compare July 17, 2017 11:15

chessbyte added performance enhancement index and removed performance labels Jul 25, 2017

Ladas force-pushed the add_unique_indexes_for_containers_tables branch from 5bbe55b to 1fa2fe2 Compare November 1, 2017 18:46

Ladas force-pushed the add_unique_indexes_for_containers_tables branch 2 times, most recently from 7dcf7d7 to a3242b0 Compare November 13, 2017 12:55

cben reviewed Nov 14, 2017

View reviewed changes

Ladas force-pushed the add_unique_indexes_for_containers_tables branch from e54e39b to ffd2027 Compare November 14, 2017 13:20

Ladas commented Feb 14, 2018

View reviewed changes

Ladas force-pushed the add_unique_indexes_for_containers_tables branch from 37e127c to 543e989 Compare February 15, 2018 09:21

tumido mentioned this pull request Feb 20, 2018

[WIP] Add unique indexes for Cloud tables #173

Closed

Ladas added 20 commits May 10, 2018 11:53

Cleanup duplicates before adding uniq keys on Container tables

2a88a92

Cleanup duplicates before adding uniq keys on Container tables

Add unique indexes to Containers tables

8f455c0

Add unique indexes to Containers tables

Spec for cleaning up duplicates

45297f1

Spec for cleaning up duplicates

Move the migrations to a newer date

9c0172d

Move the migrations to a newer date

Add Comp.System, Hardware and Op. System indexes

aa13a26

Add Comp.System, Hardware and Op. System indexes

Change date of migrations

95d8573

Change date of migrations

ContainerDefinition model was deleted

853d393

ContainerDefinition model was deleted

ContainerServicePortConfig index should be service_id + name

0b10b49

ContainerServicePortConfig index should be service_id + name

container_component_statuses table was removed

ea662c8

container_component_statuses table was removed

Adding unique index for ContainerQuotaScope

de1a966

Adding unique index for ContainerQuotaScope

Move migrations to a newer date

64f86f9

Move migrations to a newer date

Keep latest duplicate instead of the oldest duplicate

07510bf

Keep latest duplicate instead of the oldest duplicate

Fix rubocop and code climate issues

3e0ea8a

Fix rubocop and code climate issues

Add missing tables and fields comparing to inventory_collections.rb

4f0e542

Add missing tables and fields comparing to inventory_collections.rb

Manually define index name for container_volumes

226a8ff

Manually define index name for container_volumes, since the generated is too long

Properly define partial indexes for OS and Hardware

88ff3b8

Properly define partial indexes for OS and Hardware, this way it will be usable for upsert, since we always define 1 foreign_key and the rest is NULL.

Modify container_quota_items unique index according to latest plans

d4d9d96

Modify container_quota_items unique index according to latest plans, where we want to keep archived quota_items untouched.

CustomAtribute index without unique_name

5edeb5e

CustomAtribute index without unique_name, which is used only under OpenStack

Separate index for ContainerVolume and PersistentVolume

94cd6e5

Separate index for ContainerVolume and PersistentVolume

Ladas force-pushed the add_unique_indexes_for_containers_tables branch from 543e989 to 94cd6e5 Compare July 11, 2018 12:45

miq-bot closed this Jan 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unique indexes for Containers tables #19

Add unique indexes for Containers tables #19

Ladas commented Jun 23, 2017

Ladas commented Jun 23, 2017

Ladas commented Jun 23, 2017 •

edited

Loading

Fryguy commented Jun 23, 2017

kbrock commented Jun 23, 2017

Fryguy commented Jun 23, 2017

cben commented Jun 25, 2017

zakiva commented Jun 25, 2017

enoodle commented Jun 25, 2017

cben commented Jun 25, 2017

Ladas commented Jul 3, 2017

enoodle commented Jul 3, 2017

cben commented Jul 5, 2017 •

edited

Loading

lpichler Jul 14, 2017 •

edited

Loading

Ladas Jul 18, 2017

Ladas commented Jul 25, 2017

Ladas commented Nov 13, 2017

cben commented Nov 14, 2017

cben Nov 14, 2017

Ladas Nov 14, 2017

cben commented Feb 11, 2018 via email

Ladas Feb 14, 2018 •

edited

Loading

Ladas Feb 14, 2018

cben commented Feb 14, 2018

miq-bot commented Jul 11, 2018

miq-bot commented Jan 14, 2019

Add unique indexes for Containers tables #19

Add unique indexes for Containers tables #19

Conversation

Ladas commented Jun 23, 2017

Ladas commented Jun 23, 2017

Ladas commented Jun 23, 2017 • edited Loading

Fryguy commented Jun 23, 2017

kbrock commented Jun 23, 2017

Fryguy commented Jun 23, 2017

cben commented Jun 25, 2017

zakiva commented Jun 25, 2017

enoodle commented Jun 25, 2017

cben commented Jun 25, 2017

Ladas commented Jul 3, 2017

enoodle commented Jul 3, 2017

cben commented Jul 5, 2017 • edited Loading

lpichler Jul 14, 2017 • edited Loading

Choose a reason for hiding this comment

Ladas Jul 18, 2017

Choose a reason for hiding this comment

Ladas commented Jul 25, 2017

Ladas commented Nov 13, 2017

cben commented Nov 14, 2017

cben Nov 14, 2017

Choose a reason for hiding this comment

Ladas Nov 14, 2017

Choose a reason for hiding this comment

cben commented Feb 11, 2018 via email

Ladas Feb 14, 2018 • edited Loading

Choose a reason for hiding this comment

Ladas Feb 14, 2018

Choose a reason for hiding this comment

cben commented Feb 14, 2018

miq-bot commented Jul 11, 2018

miq-bot commented Jan 14, 2019

Ladas commented Jun 23, 2017 •

edited

Loading

cben commented Jul 5, 2017 •

edited

Loading

lpichler Jul 14, 2017 •

edited

Loading

Ladas Feb 14, 2018 •

edited

Loading