Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a patch to ActiveRecord::Migration for tracking replicated migrations #17919

Merged
merged 9 commits into from
Sep 26, 2018

Conversation

carbonin
Copy link
Member

This will fix https://bugzilla.redhat.com/show_bug.cgi?id=1601015 by only allowing the global region to execute a particular migration if it is sure that migration was successfully executed and replicated from all of the remote regions.

This ensures that we will never "miss" a schema state and get ourselves into a situation where we cannot continue replicating changes from a remote region.

Requires: ManageIQ/manageiq-schema#266
Alternative to: ManageIQ/manageiq-schema#264

provider_region = MiqRegion.find(s.provider_region)

while !provider_region.migrations_ran.include?(version) do
s.disable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying understand the purpose of 3 lines. Why do we need to disable, enable the subscription and then sleep here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to ensure that the apply process "wakes up" at every migration to consume "queued" changes from the remote region. Really we only need to do this if the subscription is "down" so I can make this conditional.

@carbonin carbonin force-pushed the replication_migration_extension branch from 1a35421 to e2fe201 Compare September 11, 2018 20:55
lib/extensions/ar_migration.rb Outdated Show resolved Hide resolved
@@ -0,0 +1,46 @@
module MigrationSyncHelper
def self.migrations_column_present?
ActiveRecord::Base.connection.columns("miq_regions").map(&:name).include?("migrations_ran")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would detect be better? ActiveRecord::Base.connection.columns("miq_regions").detect { |column| column.name == "migrations_ran" }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should we cache this? Once the column is found, we shouldn't have to look for it anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have a preference so I can change it to the detect version as it looks slightly more performant.

I don't think we should cache because I think it would be shared across multiple calls to migrate which would defeat the purpose.

MigrationSyncHelper.wait_for_remote_region_migration(MiqRegion.find(s.provider_region), version)
end

ret = super
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super.tap?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, kind of would rather this version because we wouldn't be using the return value of super in the tap block.
I think the way I have it is more explicit about returning the previous value where using tap feels more like taking advantage of a side-effect.

@carbonin carbonin force-pushed the replication_migration_extension branch 3 times, most recently from e8b5061 to 2083f6a Compare September 14, 2018 19:58
@carbonin
Copy link
Member Author

Think this one is ready for review as well. The specs will go green once ManageIQ/manageiq-schema#266 is merged. I'm in the process of testing this out on live systems.

return unless (region = MiqRegion.my_region)

new_migrations = ActiveRecord::SchemaMigration.normalized_versions
new_migrations << version if direction == :up
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of any customer that's gone back down in production, but should we handle that case anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as it turns out, ActiveRecord::SchemaMigration.normalized_versions doesn't contain the current version, in both the down and up case. So we need to add it to the list in the up case, but don't need to remove it in the down case.

It's weird.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is strange. Since we are overwriting the column every time, it shouldn't be an issue.

@carbonin
Copy link
Member Author

So while testing this on a live system I discovered that enabling and disabling the subscription doesn't actually work from within the migration context because it takes a lock on the pglogical.local_node relation from within the migration transaction. That conflicts with the process that wants to start the replication process back up which also wants to take that lock.

We effectively deadlock waiting for a change to replicate. One option is to just wait without bouncing the subscription, but I'd like to find something that won't impact migration time so much. Will investigate more tomorrow.

@carbonin carbonin force-pushed the replication_migration_extension branch from 549819d to df231f6 Compare September 19, 2018 20:08
@carbonin carbonin removed the wip label Sep 19, 2018
@carbonin carbonin changed the title [WIP] Add a patch to ActiveRecord::Migration for tracking replicated migrations Add a patch to ActiveRecord::Migration for tracking replicated migrations Sep 19, 2018
@carbonin
Copy link
Member Author

Resolved the deadlock by using a separate dummy model connection to do the disable and enable calls for the subscription. Was able to see this work in a for-real upgrade so this should be good for final review.

@carbonin
Copy link
Member Author

Rails/ApplicationRecord - Models should subclass ApplicationRecord.

So I originally had this, but encountered issues with load order. I was getting things like ArRegion not being defined when running rake environment because we were defining a model within lib/extensions. So since ActiveRecord::Base did the job I went with that instead.

Don't want reviewers to think this wasn't a conscious decision.

@@ -0,0 +1,53 @@
module MigrationSyncHelper
def self.migrations_column_present?
ActiveRecord::Base.connection.columns("miq_regions").detect { |c| c.name == "migrations_ran" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but prefer .any? here over .detect

end

class HelperARClass < ActiveRecord::Base; end
def self.restart_subscription(s)
Copy link
Member

@Fryguy Fryguy Sep 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent?

EDIT: Oh, the class before it make it look like this method is part of that class... Can you put a line break?


ret = super
MigrationSyncHelper.update_local_migrations_ran(version.to_s, direction)
ret
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super.tap do 
  MigrationSyncHelper.update_local_migrations_ran(version.to_s, direction)
end

might read a little better than having a temp var.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdunne actually suggested the same thing and I prefer the temp variable because I see the "return the original object" part of taps behavior as more of a side-effect rather than the primary reason for using it. If my goal is to just return the same value rather than do something with the value returned by super I like this way better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with not using tap since we're not actually using the proposed block variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, what does super return? ret could be named better. I honestly have no idea what it is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually know either. I didn't want to use it for anything, but figured it was good to preserve it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh funny, I see .tap's return of the original value as the main feature...in that sense I read it as more of a "oh and by the way", and it happens to just have a reference to the tapped thing, but it's not required to use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, "I have a thing that I need to return, but before I return it, I need to do this other thing"

I agree with not using tap since we're not actually using the proposed block variable.

@jrafanie we use block syntax all the time without referencing anything that is yielded

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I can see that use case. I prefer the original for clear separation of the super from the post-super work. It's preference.

@@ -0,0 +1,53 @@
module MigrationSyncHelper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this module live inside the ArPglogicalMigration namespace, so it's clear it's not for external use?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could. My intention here was to avoid mixing in more than just the #migrate method to ActiveRecord::Migration, but if we nest this module in ArPglogicalMigration then it seems okay. I'll probably rename it to ArPglogicalMigrationHelper though.

@Fryguy
Copy link
Member

Fryguy commented Sep 19, 2018

All of my requests are code structure, but the code itself looks good to me 👍. Before merging, how does this affect documentation?

return unless MigrationSyncHelper.migrations_column_present?
return unless (region = MiqRegion.my_region)

new_migrations = ActiveRecord::SchemaMigration.normalized_versions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sorted, and if not, does it matter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like it's sorted explicitly [ref], but it doesn't matter to this implementation as we're just checking inclusion.

new_migrations << version if direction == :up
migrations_value = ActiveRecord::Base.connection.quote(PG::TextEncoder::Array.new.encode(new_migrations))

ActiveRecord::Base.connection.exec_query(<<~SQL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but is it better to use .update instead of .exec_query?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No? It doesn't look that way based on the implementation [ref], but I don't really have a preference.

@carbonin carbonin force-pushed the replication_migration_extension branch from d8d2913 to 752dd51 Compare September 25, 2018 19:35

def self.update_local_migrations_ran(version, direction)
return unless migrations_column_present?
return unless (region = MiqRegion.my_region)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still don't like using any models in this code.

attr_reader :region, :subscription, :version

def initialize(subscription, version)
@region = MiqRegion.find_by(:region => subscription.provider_region)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too with the MiqRegion model.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was able to put together something that I think would work here, but ran into tons of issues trying to write the spec. I just went with a model stub in the class, but if this is something you feel strongly about I can probably get something working with another day's worth of work or so.

@Fryguy
Copy link
Member

Fryguy commented Sep 25, 2018

I thought I commented about the MiqRegion thing but I don't see my comment anywhere. In any case, I don't like the use of the MiqRegion model since this is migration code, where we usually state "no models". I'd be fine with a stub model, but I think everything in the PR can be done with plain SQL..even the array stuff.

@carbonin carbonin force-pushed the replication_migration_extension branch 2 times, most recently from 313d4ad to fbbb2be Compare September 26, 2018 15:47
@carbonin carbonin force-pushed the replication_migration_extension branch from fbbb2be to 9b48682 Compare September 26, 2018 18:13
SELECT EXISTS(
SELECT id FROM miq_regions WHERE region = #{ActiveRecord::Base.connection.quote(my_region_number)}
)
SQL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slick use of squiggly heredoc with raw SQL 👍

end

def self.my_region_number
@my_region_number ||= ApplicationRecord.my_region_number
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overheard you describing how you can use ApplicationRecord here but must must use AR::Base elsewhere because reasons... the flip flopping of their usage here seems to need a "why" comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the only places I need to use ActiveRecord::Base are when the code is evaluated at load-time rather than runtime. So technically I could use ApplicationRecord.connection in the definitions of .migrations_column_present? and .my_region_created?.

Here I need to use ApplicationRecord because we include the region-aware modules into that class rather than ActiveRecord::Base.

The place that would cause a problem with constant loading would really be HelperARClass. But I also think I could probably get away with using an anonymous class there if I tried really hard. What do you think?

expect(t.alive?).to be true
subject.region.update_attributes!(:migrations_ran => ActiveRecord::SchemaMigration.normalized_versions << version)

# Wait a max of 5 seconds so we don't disrupt the whole test suite if something terrible happens
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙇

t = Thread.new do
Thread.current.abort_on_exception = true
subject.wait_for_remote_region_migration(0)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test scares me a little. It looks like it should be fine but threading is hard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with merging it as is but it does make me 😨

Copy link
Member

@jrafanie jrafanie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor nits about comments about AR::Base vs. AppliationRecord... but looks great.

In this module we lean towards ActiveRecord::Base to avoid any
conflict with application logic.
We also cannot use ApplicationRecord at load time because it causes
a load error.
@miq-bot
Copy link
Member

miq-bot commented Sep 26, 2018

Some comments on commits carbonin/manageiq@107cd22~...30a2e2c

lib/extensions/ar_migration.rb

  • ⚠️ - 61 - Detected puts. Remove all debugging statements.

spec/lib/extensions/ar_migration_spec.rb

  • ⚠️ - 69 - Detected puts. Remove all debugging statements.

@miq-bot
Copy link
Member

miq-bot commented Sep 26, 2018

Checked commits carbonin/manageiq@107cd22~...30a2e2c with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0
4 files checked, 5 offenses detected

lib/extensions/ar_migration.rb

@gtanzillo gtanzillo added this to the Sprint 96 Ending Oct 8, 2018 milestone Sep 26, 2018
@gtanzillo gtanzillo merged commit cc2043f into ManageIQ:master Sep 26, 2018
simaishi pushed a commit that referenced this pull request Oct 1, 2018
Add a patch to ActiveRecord::Migration for tracking replicated migrations

(cherry picked from commit cc2043f)

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1601015
@simaishi
Copy link
Contributor

simaishi commented Oct 1, 2018

Hammer backport details:

$ git log -1
commit 62980873f9c9baa4f0a24fbcee3adc88de0a8c9b
Author: Gregg Tanzillo <[email protected]>
Date:   Wed Sep 26 16:15:39 2018 -0400

    Merge pull request #17919 from carbonin/replication_migration_extension
    
    Add a patch to ActiveRecord::Migration for tracking replicated migrations
    
    (cherry picked from commit cc2043f32d08cd5c273a8866a14f55cef11a1a87)
    
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1601015

@bdunne bdunne assigned gtanzillo and unassigned Fryguy Jan 24, 2019
@carbonin carbonin deleted the replication_migration_extension branch August 16, 2019 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants