-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unique indexes for Containers tables #19
Add unique indexes for Containers tables #19
Conversation
b25bd9f
to
64941a2
Compare
cc @Fryguy @agrare @cben @simon3z what do you guys think is missing or should be different? Few questions/discussion points below:
|
How do you deal with archived / soft-deleted if you are indexing on ems_id ems_ref? That is, if you have multiple providers and they both archive something with the same ems_ref, you get an error. |
@Fryguy I think that use case will still be good. we will not be setting |
@kbrock I wonder if ems_refs ever get reused in a provider (or in a future provider). That is, if you delete an object, does the provider then have the right to reuse the id number? If they do, then this would break this idea. |
@enoodle please advise. From ManageIQ/manageiq#14185 it sounds like [:ems_id, :image_ref] is not good enough?
I don't think we did anything about ManageIQ/manageiq-providers-kubernetes#35 Perhaps since this is a synthetic ems_ref it's better not to rely on it but on the underlying fields you want eg. `[:port, :target_port, :protocol
cc @zakiva. IIRC ContainerVolume and PersistentVolume share a table (STI) but only one sets ems_ref?
|
Correct, only Persistent Volumes use :ems_ref |
There is a possibility of two images from different registries that are the same. comparing with just image_ref will show them a different. The best identifier is the digest, this is why it is used if it exists. |
Kubernetes UIDs should be permanently unique [https://kubernetes.io/docs/concepts/overview/working-with-objects/names/]
We've generally assumed pods are immutable once created. Let's see. Trying to oc edit one port's name I get:
which is good, but yes I can ContainerDefinition & Container ems_ref include the image, but they also include the immutable name. I'm not sure off-hand if it's a problem. |
@Fryguy @kbrock right now we count with fact that ems_id, ems_ref is permanently unique, when we reconnect it. The elements that can be reused should not be archivable. Every sane system should use permanently unique uuids though, otherwise the reporting will be a pain. :-) @enoodle so right now, we just duplicated the image, even if the checksum is the same https://github.com/Ladas/manageiq/blob/6e2ed956d8ec84ec0131e9efce9587ad4edd3c85/app/models/ems_refresh/save_inventory_container.rb#L313-L313 . So I just see that the :container_image_registry_id is redundant here, because image ref already contains unique keys of the container_image_registry. Or is there a case when we put digest into the image_ref? @cben cool, I'll fix the inline FIXMEs then. So I think we just need to agree on 5. |
@Ladas We should have the digest inside the image_ref so it is ok as it is in my opinion. |
[responding from gitter.]
So an interesting point is that the names are a source of truth! It's not just the API giving us incomplete information; mostly the API gives us exactly the "desired state" the user declared in etcd, which kubelets and other control loops in k8s are continuously converging to.
@moolitayer and a few others raised the interesting idea that maybe then we shouldn't resolve links to uids or ids at all. We could (1) store names and do explicit joins by name at access time (2) consider making names the actual primary & foreign key instead of ids, so relations work exactly same?! Anyway this'd require a lot of thinking... Presently we're keeping row create/update/delete by UID, with links name links resolved at refresh time. |
64941a2
to
d5fed05
Compare
ContainerTemplateParameter => [:container_template_id, :name], | ||
ContainerVolume => [:parent_id, :parent_type, :name], | ||
CustomAttribute => [:resource_id, :resource_type, :name, :unique_name, :section, :source], | ||
Hardware => [:vm_or_template_id, :host_id, :computer_system_id], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no stubs for Hardware
and OperatingSystem
. 🏭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
d5fed05
to
5bbe55b
Compare
5bbe55b
to
1fa2fe2
Compare
7dcf7d7
to
a3242b0
Compare
@cben @agrare reviving this PR, can you review? One last FIXME is https://github.com/ManageIQ/manageiq-schema/pull/19/files#diff-9f048b534014c0bc40f2e38460cc7bd5R65 then we should figure out the ManageIQ/manageiq#16454 |
What about archived rows? IIRC idea was to have unique index scoped to deleted_on IS NULL, but I don't see anything like that in the PR? I recently added We're going to abuse archiving in container_quota* tables for keeping modification history and not just deletions. I don't think this changes anything for indexes, non-archived portion will still be unique. |
end | ||
|
||
def duplicate_data_query_returning_min_id(model, unique_index_columns) | ||
model.group(unique_index_columns).select("min(id)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps max(id)
is better, likely to be more up-to-date?
(shouldn't matter unless we screwed up, but why not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, TBH I am not sure, might be that max is better. :-)
e54e39b
to
ffd2027
Compare
To be honest, this got stuck in "I ought to give this some scrunity, but
don't know what to look for" state. Sorry :-/
I did review this, no objections except quota.
But thanks to Ari's reminder, I'll try to re-review archiving vs uniqueness
in all archived tables.
|
:unique => true, | ||
:name => "index_hardwares_on_computer_system_id_" | ||
|
||
remove_index :operating_systems, :vm_or_template_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cben so, this is how we will have to model unique indexes, when table are shared and have nullable parts
(we should redesign the DB though, having N:M mapping tables for vm_or_template_id, host_id and computer_system_id, then those tables can have correct NOT NULL constraints and unique indexes )
Then for other cases, where index is several columns that are nullable, I think we will have to build the unique :ems_ref out of them
:unique => true, | ||
:name => "index_container_images_unique_multi_column" | ||
add_index :container_image_registries, | ||
%i(ems_id host port), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cben should we have :ems_ref, or can we always fill some default port?
hmm, adding the quota values to unique index does NOT solve the same problem as adding them to refresh manager_ref.
(*) I hope. Need to validate it :) |
37e127c
to
543e989
Compare
Cleanup duplicates before adding uniq keys on Container tables
Add unique indexes to Containers tables
Spec for cleaning up duplicates
Move the migrations to a newer date
Add Comp.System, Hardware and Op. System indexes
Change date of migrations
ContainerDefinition model was deleted
ContainerServicePortConfig index should be service_id + name
container_component_statuses table was removed
Adding unique index for ContainerQuotaScope
Move migrations to a newer date
Keep latest duplicate instead of the oldest duplicate
Optimizing DB duplicates cleanup, trying a query posted on PG wiki https://wiki.postgresql.org/wiki/Deleting_duplicates It works like a charm, on a table with 600k rows it took more than 3hours (it never finished, who know how long it would have took) With this change, it cleans 600k table in about 3s.
Fix rubocop and code climate issues
Add missing tables and fields comparing to inventory_collections.rb
Manually define index name for container_volumes, since the generated is too long
Properly define partial indexes for OS and Hardware, this way it will be usable for upsert, since we always define 1 foreign_key and the rest is NULL.
Modify container_quota_items unique index according to latest plans, where we want to keep archived quota_items untouched.
CustomAtribute index without unique_name, which is used only under OpenStack
Separate index for ContainerVolume and PersistentVolume
543e989
to
94cd6e5
Compare
Checked commits Ladas/manageiq-schema@2a88a92~...94cd6e5 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0 |
This pull request has been automatically closed because it has not been updated for at least 6 months. Feel free to reopen this pull request if these changes are still valid. Thank you for all your contributions! |
Add unique indexes for Containers tables