-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migrations for merging container_definition and container #24
Conversation
5c3d7f0
to
6142bf9
Compare
missing |
6142bf9
to
4f29d38
Compare
Looks good, one thing i'm concerned about is iterating over all containers. That would amount to tens of thousands of queries in large environments. Take a look at: |
|
||
say_with_time("Copying over columns from container_definition to container") do | ||
Container.all.each do |container| | ||
container_definition = ContainerDefinition.find(container.container_definition_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this ever fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think so. in order to have containers at all the inventory refresh has to go through and save provider->pod->container_def->container.
@@ -0,0 +1,159 @@ | |||
class UnifyContainerDefinitionAndContainer < ActiveRecord::Migration[5.0] | |||
class ContainerDefinition < ActiveRecord::Base | |||
has_many :container_port_configs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zeari we need to add class_name
for all relations:
:class_name => 'UnifyContainerDefinitionAndContainer::ContainerPortConfig'
because without it, it will not take models defined in migrations.
|
||
class ContainerEnvVar < ActiveRecord::Base | ||
belongs_to :container_definition | ||
self.inheritance_column = :_type_disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this in only for tables with type
column and there is no ContainerEnvVar#type
, same for ContainerDefinition
, SecurityContext
and ContainerPortConfig
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were asked to put this in all models to prevent problems if we ever add the column in the future (I understand that is the current policy)
is it possible to separate structural and data migration to standalone migrations ? |
add_column :containers, :command, :text | ||
|
||
say_with_time("Copying over columns from container_definition to container") do | ||
Container.all.each do |container| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can save some N+1 queries by this and use batch processing by with find_each
:
Container.includes(:container_definition => [:container_port_configs, :container_env_vars, :security_contexts]).find_each do |container|
container.update!( :image => container_definition.image, ...)
container.container_definition.container_port_configs.update_all(...
container.container_definition.container_env_vars.update_all(...
container.container_definition.security_context.update
end
Im not sure that would turn out so nice. we need to do structural changes before and after copying the data. |
fa40488
to
c4c99f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is nice to avoid each
, especially with a bunch of includes()
in a migration. It leads to a bunch of individual UPDATE
or INSERT
queries
end | ||
end | ||
|
||
Container.includes(:container_definition => [:container_port_configs, :container_env_vars, :security_context]).find_each do |container| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
starting on the other side (e.g. ContainerEnvVars
) would work better.
In general, removing belongs_to
our suggestion from gitter:
update security_contexts
set resource_id = (select containers.id from containers,container_definitions
where containers.container_definition_id = container_definitions.id
and container_definition_id = security_contexts.resource_id
and resource_type = 'ContainerDefinition')
eb87281
to
9c541f1
Compare
:capabilities_drop => container.capabilities_drop, | ||
:command => container.command | ||
) | ||
container.update!(:container_definition_id => container_def.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbrock what do you think is a good way to bulk insert into container_definitions?
overall we need to do something like
INSERT INTO container_definitions ( ems_id, name, image.... )
SELECT ems_id, name, image....
FROM containers
and then update container_definition_id
in containers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might be able to do something like
add_column :container_definitions, :container_id, :bigint
insert into container_definitions (container_id, columns...)
values (select (id, columns...) from containers)
update containers set container_definition_id = (select container_definition.id from container_definitions, containers)
where container.id = container_definitions.container_id
remove_column :container_definitions, :container_id
or maybe leverage RETURNING
somehow -
INSERT INTO container_definitions VALUES (....) RETURNING (id,....)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zeari Oh, I had thought that the container_definitions
already existed for the containers
.
The pure sql way is:
insert into container_Definitions (container_id, columns...)
select id, columns..
from conatiners
Not sure the MIQ way of doing that. we tend to be a little sql shy
4b6073a
to
be9e7a7
Compare
@zeari @cben IIUC we need this PR to be reviewed/approved before we can move on with ManageIQ/manageiq#15393 and ManageIQ/manageiq-providers-kubernetes#42 @cben @moolitayer @agrare can you review/approve? |
@zeari can we have tags or other relevant relations associated with a removed |
Looking at |
run_as_user run_as_non_root capabilities_add capabilities_drop command).each do |column| | ||
join_sql = definitions.project(definitions[column.to_sym]) | ||
.where(definitions[:id].eq(containers[:container_definition_id])).to_sql | ||
Container.update_all("#{column} = (#{join_sql})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
I don't think we should refactor according to codeclimate's suggstions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I prefer the update_all
calls. So yes, ignore the rubo cops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out the 2N + 1 queries.
Put in a suggestion for a fix for that
|
||
say_with_time("splitting columns from container into container_definition") do | ||
ContainerDefinition.transaction do | ||
Container.all.each do |container| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be # containers x 2 queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe temporary column ContainerDefinition#container_id
would make sense.
That way, this could be done in 2 queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, but this means we go with pure sql on this. (either that or something similar like https://gist.github.com/janko-m/2b2cea3e8e21d9232fb9 which is similar)
I did try to mitigate this with ContainerDefinition.transaction
but it still means the DB goes through a lot of INSERT INTO container_definitions VALUES (......)
.
So pure sql is all I can think of doing here.
@cben @moolitayer this PR is missing your final review/approval. |
ContainerEnvVar.update_all("container_definition_id = (#{join_sql})") | ||
end | ||
|
||
say_with_time("switch container_definition_id with container_id for security_contexts") do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swtich resource_id with container_id, resource_type to 'Container' for security_contexts
def down | ||
create_table :container_definitions do |t| | ||
t.belongs_to :ems, :type => :bigint | ||
t.string :ems_ref |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definition ems_ref is lost in up
. Would copying it back FROM containers restore old situation?
parser synthesizes "#{pod_id}_#{container_def.name}_#{container_def.image}"
,
for containers it synthesizes "#{pod_id}_#{container.name}_#{container.image}"
.
- pod_id is same
pod.metadata.uid
for both. - name should be same, it's what parser cross-ref'd containers to container_definitions on.
- is image necessarily same? cc @enoodle
(I don't know if 100% reversible migration is a blocker here, but want to understands things first.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, If I understand correctly the contianer definition gets its image thought the container.
t.string :ems_ref | ||
t.bigint :old_ems_id | ||
t.timestamp :deleted_on | ||
t.string :name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above, copying name back should be OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comment that above columns were lost in up
and are being reconstructed (by copying),
while following columns are moved back from containers table.
create_table :container_definitions do |t| | ||
t.belongs_to :ems, :type => :bigint | ||
t.string :ems_ref | ||
t.bigint :old_ems_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did containers and definitions always have same ems_id, old_ems_id? were they archived in sync?
(trying to understand if copying back is safe)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theyre usually always archived in sync.
# from container_definition.rb
def disconnect_inv
return if ems_id.nil?
_log.info "Disconnecting Container definition [#{name}] id [#{id}]"
self.container.try(:disconnect_inv)
self.deleted_on = Time.now.utc
save
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice maneuver with the temp column 🔀 :-)
t.string :ems_ref | ||
t.bigint :old_ems_id | ||
t.timestamp :deleted_on | ||
t.string :name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comment that above columns were lost in up
and are being reconstructed (by copying),
while following columns are moved back from containers table.
|
||
add_column :containers, :container_definition_id, :bigint | ||
|
||
# populate container_definitions. use container_id to keep relation to containers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider turning this comment to a say_with_time
also for UPDATE below
ContainerEnvVar.update_all("container_id = (#{join_sql})") | ||
end | ||
|
||
say_with_time("switch container_definition_id with container_id for security_contexts") do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, resource_id and resource_type
ContainerPortConfig.update_all("container_id = (#{join_sql})") | ||
end | ||
|
||
say_with_time("switch container_definition_id with container_id for container_port_configs") do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for container_env_vars
ActiveRecord::Base.connection.execute(update_statement) | ||
|
||
# finally, remove the temp column | ||
remove_column :container_definitions, :container_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idea: if you move this remove to the end, can you simplify the 3 "switch container_definition_id with container_id" below to not need a join?
personally I'm not so much concerned with performance as simplicity ;-)
if yes, perhaps it's worth creating a temp column even in up
?
just an idea, feel free to keep as is.
:name => "mycontainer", | ||
:ems_ref => "123", | ||
:ems_id => 1, | ||
:old_ems_id => 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take realistic data from real DB. (that's not how ems_ref looks, not sure ems_id == old_ems_id == 1 is possible etc...)
And since I have some worries about the overlapping columns like ems_ref
, consider spelling out full data for both AND the expected merge, without assuming definition_hash.merge(container_hash))
.
[have I mentioned migrations scare me, perhaps unreasonably?]
SecurityContext.where(:resource_type => 'ContainerDefinition').update_all("resource_type = 'Container', resource_id = (#{join_sql})") | ||
end | ||
|
||
MiqQueue.where(:method_name => "purge_timer", :class_name => 'ContainerDefinition').destroy_all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6f23e0c
to
a37a7bc
Compare
@zeari ok as long as we don't crash in case there is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 test checks it does not crash;
I wondered why test doesn't explicitly check container_def2
is deleted => there is nothing to check as the whole table is gone :-) 🔨
@zeari I think there are some relevant rubocop warnings |
@@ -805,7 +787,7 @@ container_port_configs: | |||
- port | |||
- host_port | |||
- protocol | |||
- container_definition_id | |||
- container_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carbonin Will this cause a problem since it's in the middle of the pack? I think it's ok, just want to verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, as long as it was implemented as a rename (not a delete and add).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Yeah" as in it's okay.... just to clarify 👍
This pull request is not mergeable. Please rebase and repush. |
a37a7bc
to
4d6ac3c
Compare
4d6ac3c
to
c7630f4
Compare
c7630f4
to
0738280
Compare
Checked commits zeari/manageiq-schema@5ab7b5f~...0738280 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0 db/migrate/20170718111834_unify_container_definition_and_container.rb
spec/migrations/20170718111834_unify_container_definition_and_container_spec.rb
|
ping @Fryguy @carbonin Can we merge this one and ManageIQ/manageiq#15393 ? |
cc @Loicavenel |
class ContainerDefinition < ActiveRecord::Base | ||
has_many :container_port_configs, :class_name => 'UnifyContainerDefinitionAndContainer::ContainerPortConfig' | ||
has_many :container_env_vars, :class_name => 'UnifyContainerDefinitionAndContainer::ContainerEnvVar' | ||
has_one :security_context, :as => :resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look like it's being used, so can we remove it? I'm asking because polymorphic types end up causing issues when accessed through has_many/belongs_to, but luckily you are using resource_type/resource_id directly in the migrations below.
|
||
class SecurityContext < ActiveRecord::Base | ||
self.inheritance_column = :_type_disabled | ||
belongs_to :resource, :polymorphic => true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as https://github.com/ManageIQ/manageiq-schema/pull/24/files#r129430898 (this is the other half)
t.bigint :container_id # temp column | ||
end | ||
|
||
add_index :container_definitions, :deleted_on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to add the index after you've done the data migration, otherwise each insert needs to update the index individually making it slower overall.
add_column :containers, :container_definition_id, :bigint | ||
|
||
say_with_time("populate container_definitions. use container_id to keep relation to containers") do | ||
insert_statement = "INSERT INTO container_definitions (container_id, ems_id, ems_ref, old_ems_id, deleted_on, name, image, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, bit Prefer this format for building SQL
insert_statement = <<-SQL
INSERT INTO container_defintions...
SELECT id, ...
FROM containers
SQL
The reason is that a) it keeps the SQL on it's own lines, and b) some IDEs will see Ruby <<-SQL block and actually color the SQL statement with SQL syntax coloring.
LGTM, so I am going to merge as is to unstick the main PR. @zeari Can you create a follow up PR that cleans up the various things I mentioned above? |
part of ManageIQ/manageiq#15393