-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch saving strategy that does not require unique indexes #15627
Batch saving strategy that does not require unique indexes #15627
Conversation
Add default_batch strategy allowing to do batched SQL while not being concurrent safe.
Add batch_extra_attributes parameter
Rename default_batch to batch saver strategy
Extract unique_index_columns and on_conflict_update as these are only 2 parts that differ between concurrent safe batch strategy using unique indexes and basic batch strategy.
Add batch_extra_attributes to the list of all_attribute_keys, which will allow us to save attributes that were not explicitely defined, but were created by model side effect.
Add basic batch strategy, this strategy does not require unique indexes. But it is not concurrent safe.
Do not duplicate deleted records
Always call Returning so we are able to ouptu :created_records
94c93dd
to
f59cfcb
Compare
def initialize(model_class: nil, manager_ref: nil, association: nil, parent: nil, strategy: nil, saved: nil, | ||
custom_save_block: nil, delete_method: nil, data_index: nil, data: nil, dependency_attributes: nil, | ||
attributes_blacklist: nil, attributes_whitelist: nil, complete: nil, update_only: nil, | ||
check_changed: nil, custom_manager_uuid: nil, custom_db_finder: nil, arel: nil, builder_params: {}, | ||
inventory_object_attributes: nil, unique_index_columns: nil, name: nil, saver_strategy: nil, | ||
parent_inventory_collections: nil, manager_uuids: [], all_manager_uuids: nil, targeted_arel: nil, | ||
targeted: nil, manager_ref_allowed_nil: nil, secondary_refs: {}, use_ar_object: nil, | ||
custom_reconnect_block: nil) | ||
custom_reconnect_block: nil, batch_extra_attributes: []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like there are a ridiculous number of parameters to this initialize
method. Sandy Metz recommends no more than 4. Feels like this is a bad pattern that can be refactored in future PRs. /cc @agrare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chessbyte indeed, we are at the point where this should be a final list, that should cover all the crazy corner cases we have. :-)
So next will be to refactor these, part will got to the Settings, rest should be divided into more objects we will pass here(also consuming settings). There are already several areas we can group, e..g Saver, DatabaseLoader, AttributesBuilder, RecordsMatcher, etc.
extenuating circumstances:
- For many purposes, it's a struct (largely immutable, but not completely).
There is a lot of added methods, but I [nearly?] all constructor args can
be read back.
- As a user of this class, I think of it as a declarative DSL. There are
tons of options (quite well documented), but you don't need all at once,
most of the time I set 4–6 of them.
Do you have any concrete ideas how a better interface could look?
|
Use primary_key for batch update matching, because it's simpler and much faster for processing.
Checked commits Ladas/manageiq@72e9c38~...3127e3d with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0 app/models/manager_refresh/save_collection/saver/sql_helper.rb
|
@cben yes, #15627 (comment) then the interface would look something like:
I've already did a quick walk-through with @agrare , it will get few more iterations to get the grouping and the naming right. I should have time now, since we are near feature complete solution. :-) Also, I am still thinking about the right place to pass Settings, since that is tied to EMS, so maybe we will be passing ManagerRefresh::Settings.new(ems) and we will extract the right section there. Also I am thinking what everything should be in the settings and what belongs to Persistor. (so what needs to be dynamically changeable) |
# @param batch_extra_attributes [Array] Array of symbols marking which extra attributes we want to store into the | ||
# db. These extra attributes might be a product of :use_ar_object assignment and we need to specify them | ||
# manually, if we want to use a batch saving strategy and we have models that populate attributes as a side | ||
# effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ladas do you have an example of needing this currently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for vm, when we change :raw_power_state, other attributes are set, so we need to have
:batch_extra_attributes => [:power_state, :state_changed_on, :previous_state],
hopefully this can be autodiscovered later, while passing the same specs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks
end | ||
|
||
def on_conflict_update | ||
false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we do on_conflict_update with this strategy?
Scratch that, needs unique indexes https://www.postgresql.org/docs/9.5/static/sql-insert.html#SQL-ON-CONFLICT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest looks good to me, I think we all know the parameter list is out of control and needs refactoring and we have plans for doing so.
Batch saving strategy that does not require unique indexes. Using batched SQL we are able to achieve huge saving time improvement.