Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrations for merging container_definition and container #24

Merged
merged 2 commits into from
Jul 25, 2017

Conversation

zeari
Copy link

@zeari zeari commented Jun 25, 2017

@zeari zeari force-pushed the unify_container_definition branch from 5c3d7f0 to 6142bf9 Compare June 25, 2017 11:22
@moolitayer
Copy link

missing db/schema.yml changes

@zeari zeari force-pushed the unify_container_definition branch from 6142bf9 to 4f29d38 Compare June 25, 2017 11:57
@moolitayer
Copy link

Looks good, one thing i'm concerned about is iterating over all containers. That would amount to tens of thousands of queries in large environments. Take a look at:
https://github.com/ManageIQ/manageiq-schema/blob/64af9f368dc352686f2908dec6ca9ab4b66d44ec/db/migrate/20170115140217_update_ems_in_miq_alert_status.rb


say_with_time("Copying over columns from container_definition to container") do
Container.all.each do |container|
container_definition = ContainerDefinition.find(container.container_definition_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this ever fail?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think so. in order to have containers at all the inventory refresh has to go through and save provider->pod->container_def->container.

@@ -0,0 +1,159 @@
class UnifyContainerDefinitionAndContainer < ActiveRecord::Migration[5.0]
class ContainerDefinition < ActiveRecord::Base
has_many :container_port_configs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zeari we need to add class_name for all relations:
:class_name => 'UnifyContainerDefinitionAndContainer::ContainerPortConfig'
because without it, it will not take models defined in migrations.


class ContainerEnvVar < ActiveRecord::Base
belongs_to :container_definition
self.inheritance_column = :_type_disabled
Copy link
Contributor

@lpichler lpichler Jun 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this in only for tables with type column and there is no ContainerEnvVar#type, same for ContainerDefinition, SecurityContext and ContainerPortConfig.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were asked to put this in all models to prevent problems if we ever add the column in the future (I understand that is the current policy)

@lpichler
Copy link
Contributor

is it possible to separate structural and data migration to standalone migrations ?

add_column :containers, :command, :text

say_with_time("Copying over columns from container_definition to container") do
Container.all.each do |container|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can save some N+1 queries by this and use batch processing by with find_each:

Container.includes(:container_definition => [:container_port_configs, :container_env_vars, :security_contexts]).find_each do |container|
  container.update!(  :image => container_definition.image, ...)
  container.container_definition.container_port_configs.update_all(...
  container.container_definition.container_env_vars.update_all(...
  container.container_definition.security_context.update
end

@zeari
Copy link
Author

zeari commented Jun 26, 2017

is it possible to separate structural and data migration to standalone migrations ?

Im not sure that would turn out so nice. we need to do structural changes before and after copying the data.

@zeari zeari force-pushed the unify_container_definition branch 2 times, most recently from fa40488 to c4c99f4 Compare June 26, 2017 13:22
Copy link
Member

@kbrock kbrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is nice to avoid each, especially with a bunch of includes() in a migration. It leads to a bunch of individual UPDATE or INSERT queries

end
end

Container.includes(:container_definition => [:container_port_configs, :container_env_vars, :security_context]).find_each do |container|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting on the other side (e.g. ContainerEnvVars) would work better.
In general, removing belongs_to

our suggestion from gitter:

update security_contexts 
set resource_id = (select containers.id from containers,container_definitions 
  where containers.container_definition_id = container_definitions.id 
  and container_definition_id = security_contexts.resource_id 
  and resource_type = 'ContainerDefinition')

@zeari zeari force-pushed the unify_container_definition branch 2 times, most recently from eb87281 to 9c541f1 Compare June 29, 2017 10:45
:capabilities_drop => container.capabilities_drop,
:command => container.command
)
container.update!(:container_definition_id => container_def.id)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbrock what do you think is a good way to bulk insert into container_definitions?
overall we need to do something like

INSERT INTO container_definitions ( ems_id, name, image.... )
SELECT  ems_id, name, image....
FROM    containers

and then update container_definition_id in containers

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might be able to do something like

add_column :container_definitions, :container_id, :bigint
    

insert into container_definitions (container_id, columns...)
values (select (id, columns...) from containers)

update containers set container_definition_id = (select container_definition.id from container_definitions, containers)
                      where container.id = container_definitions.container_id


remove_column :container_definitions, :container_id

or maybe leverage RETURNING somehow -

INSERT INTO container_definitions VALUES (....) RETURNING (id,....)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zeari Oh, I had thought that the container_definitions already existed for the containers.

The pure sql way is:

insert into container_Definitions (container_id, columns...)
  select id, columns..
  from conatiners

Not sure the MIQ way of doing that. we tend to be a little sql shy

@zeari zeari force-pushed the unify_container_definition branch 3 times, most recently from 4b6073a to be9e7a7 Compare July 2, 2017 11:18
@simon3z
Copy link
Contributor

simon3z commented Jul 3, 2017

@zeari @cben IIUC we need this PR to be reviewed/approved before we can move on with ManageIQ/manageiq#15393 and ManageIQ/manageiq-providers-kubernetes#42

@cben @moolitayer @agrare can you review/approve?

@moolitayer
Copy link

@zeari can we have tags or other relevant relations associated with a removed container_definitions ?
(anything that has a generic resource relation to container_definitions )

@zeari
Copy link
Author

zeari commented Jul 3, 2017

@zeari can we have tags or other relevant relations associated with a removed container_definitions ?
(anything that has a generic resource relation to container_definitions )

Looking at container_definition.rb, i cant say that we do. Metrics, events and such belong to container.

run_as_user run_as_non_root capabilities_add capabilities_drop command).each do |column|
join_sql = definitions.project(definitions[column.to_sym])
.where(definitions[:id].eq(containers[:container_definition_id])).to_sql
Container.update_all("#{column} = (#{join_sql})")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link

@moolitayer moolitayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍
I don't think we should refactor according to codeclimate's suggstions

Copy link
Member

@kbrock kbrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I prefer the update_all calls. So yes, ignore the rubo cops

Copy link
Member

@kbrock kbrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out the 2N + 1 queries.
Put in a suggestion for a fix for that


say_with_time("splitting columns from container into container_definition") do
ContainerDefinition.transaction do
Container.all.each do |container|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be # containers x 2 queries

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe temporary column ContainerDefinition#container_id would make sense.

That way, this could be done in 2 queries.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but this means we go with pure sql on this. (either that or something similar like https://gist.github.com/janko-m/2b2cea3e8e21d9232fb9 which is similar)

I did try to mitigate this with ContainerDefinition.transaction but it still means the DB goes through a lot of INSERT INTO container_definitions VALUES (......).

So pure sql is all I can think of doing here.

@simon3z
Copy link
Contributor

simon3z commented Jul 11, 2017

@cben @moolitayer this PR is missing your final review/approval.

ContainerEnvVar.update_all("container_definition_id = (#{join_sql})")
end

say_with_time("switch container_definition_id with container_id for security_contexts") do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swtich resource_id with container_id, resource_type to 'Container' for security_contexts

def down
create_table :container_definitions do |t|
t.belongs_to :ems, :type => :bigint
t.string :ems_ref
Copy link
Contributor

@cben cben Jul 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definition ems_ref is lost in up. Would copying it back FROM containers restore old situation?
parser synthesizes "#{pod_id}_#{container_def.name}_#{container_def.image}",
for containers it synthesizes "#{pod_id}_#{container.name}_#{container.image}".

  • pod_id is same pod.metadata.uid for both.
  • name should be same, it's what parser cross-ref'd containers to container_definitions on.
  • is image necessarily same? cc @enoodle

(I don't know if 100% reversible migration is a blocker here, but want to understands things first.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, If I understand correctly the contianer definition gets its image thought the container.

t.string :ems_ref
t.bigint :old_ems_id
t.timestamp :deleted_on
t.string :name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above, copying name back should be OK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comment that above columns were lost in up and are being reconstructed (by copying),
while following columns are moved back from containers table.

create_table :container_definitions do |t|
t.belongs_to :ems, :type => :bigint
t.string :ems_ref
t.bigint :old_ems_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did containers and definitions always have same ems_id, old_ems_id? were they archived in sync?
(trying to understand if copying back is safe)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theyre usually always archived in sync.

# from container_definition.rb
def disconnect_inv
   return if ems_id.nil?
   _log.info "Disconnecting Container definition [#{name}] id [#{id}]"
    self.container.try(:disconnect_inv)
    self.deleted_on = Time.now.utc
    save
  end

Copy link
Contributor

@cben cben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice maneuver with the temp column 🔀 :-)

t.string :ems_ref
t.bigint :old_ems_id
t.timestamp :deleted_on
t.string :name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comment that above columns were lost in up and are being reconstructed (by copying),
while following columns are moved back from containers table.


add_column :containers, :container_definition_id, :bigint

# populate container_definitions. use container_id to keep relation to containers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider turning this comment to a say_with_time
also for UPDATE below

ContainerEnvVar.update_all("container_id = (#{join_sql})")
end

say_with_time("switch container_definition_id with container_id for security_contexts") do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, resource_id and resource_type

ContainerPortConfig.update_all("container_id = (#{join_sql})")
end

say_with_time("switch container_definition_id with container_id for container_port_configs") do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for container_env_vars

ActiveRecord::Base.connection.execute(update_statement)

# finally, remove the temp column
remove_column :container_definitions, :container_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea: if you move this remove to the end, can you simplify the 3 "switch container_definition_id with container_id" below to not need a join?
personally I'm not so much concerned with performance as simplicity ;-)

if yes, perhaps it's worth creating a temp column even in up?

just an idea, feel free to keep as is.

:name => "mycontainer",
:ems_ref => "123",
:ems_id => 1,
:old_ems_id => 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take realistic data from real DB. (that's not how ems_ref looks, not sure ems_id == old_ems_id == 1 is possible etc...)
And since I have some worries about the overlapping columns like ems_ref, consider spelling out full data for both AND the expected merge, without assuming definition_hash.merge(container_hash)).

[have I mentioned migrations scare me, perhaps unreasonably?]

SecurityContext.where(:resource_type => 'ContainerDefinition').update_all("resource_type = 'Container', resource_id = (#{join_sql})")
end

MiqQueue.where(:method_name => "purge_timer", :class_name => 'ContainerDefinition').destroy_all
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cben @simon3z Destroying the purge jobs.

@zeari zeari force-pushed the unify_container_definition branch from 6f23e0c to a37a7bc Compare July 17, 2017 15:09
@simon3z
Copy link
Contributor

simon3z commented Jul 17, 2017

Added test. im ignoring container_definitions without containers because i dont think theyre meaningful:

@zeari ok as long as we don't crash in case there is a container_definitions without containers.
That is why having a test seemed interesting (to show that we have no issues with that).

Copy link
Contributor

@cben cben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 test checks it does not crash;
I wondered why test doesn't explicitly check container_def2 is deleted => there is nothing to check as the whole table is gone :-) 🔨

@moolitayer
Copy link

@zeari I think there are some relevant rubocop warnings

@@ -805,7 +787,7 @@ container_port_configs:
- port
- host_port
- protocol
- container_definition_id
- container_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carbonin Will this cause a problem since it's in the middle of the pack? I think it's ok, just want to verify.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as long as it was implemented as a rename (not a delete and add).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Yeah" as in it's okay.... just to clarify 👍

@miq-bot
Copy link
Member

miq-bot commented Jul 17, 2017

This pull request is not mergeable. Please rebase and repush.

@zeari zeari force-pushed the unify_container_definition branch from a37a7bc to 4d6ac3c Compare July 18, 2017 11:08
@zeari zeari force-pushed the unify_container_definition branch from 4d6ac3c to c7630f4 Compare July 18, 2017 11:19
@zeari zeari force-pushed the unify_container_definition branch from c7630f4 to 0738280 Compare July 18, 2017 12:18
@miq-bot
Copy link
Member

miq-bot commented Jul 18, 2017

Checked commits zeari/manageiq-schema@5ab7b5f~...0738280 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
2 files checked, 16 offenses detected

db/migrate/20170718111834_unify_container_definition_and_container.rb

spec/migrations/20170718111834_unify_container_definition_and_container_spec.rb

@zeari
Copy link
Author

zeari commented Jul 20, 2017

ping @Fryguy @carbonin Can we merge this one and ManageIQ/manageiq#15393 ?

@moolitayer
Copy link

cc @Loicavenel

class ContainerDefinition < ActiveRecord::Base
has_many :container_port_configs, :class_name => 'UnifyContainerDefinitionAndContainer::ContainerPortConfig'
has_many :container_env_vars, :class_name => 'UnifyContainerDefinitionAndContainer::ContainerEnvVar'
has_one :security_context, :as => :resource
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look like it's being used, so can we remove it? I'm asking because polymorphic types end up causing issues when accessed through has_many/belongs_to, but luckily you are using resource_type/resource_id directly in the migrations below.


class SecurityContext < ActiveRecord::Base
self.inheritance_column = :_type_disabled
belongs_to :resource, :polymorphic => true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.bigint :container_id # temp column
end

add_index :container_definitions, :deleted_on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to add the index after you've done the data migration, otherwise each insert needs to update the index individually making it slower overall.

add_column :containers, :container_definition_id, :bigint

say_with_time("populate container_definitions. use container_id to keep relation to containers") do
insert_statement = "INSERT INTO container_definitions (container_id, ems_id, ems_ref, old_ems_id, deleted_on, name, image,
Copy link
Member

@Fryguy Fryguy Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, bit Prefer this format for building SQL

    insert_statement = <<-SQL
INSERT INTO container_defintions...
SELECT id, ...
FROM containers
SQL

The reason is that a) it keeps the SQL on it's own lines, and b) some IDEs will see Ruby <<-SQL block and actually color the SQL statement with SQL syntax coloring.

@Fryguy
Copy link
Member

Fryguy commented Jul 25, 2017

LGTM, so I am going to merge as is to unstick the main PR. @zeari Can you create a follow up PR that cleans up the various things I mentioned above?

@Fryguy Fryguy merged commit 41a7428 into ManageIQ:master Jul 25, 2017
@Fryguy Fryguy added this to the Sprint 66 Ending Aug 7, 2017 milestone Jul 25, 2017
@Fryguy Fryguy added the schema label Jul 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.