-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amazon S3 objects inventoring #123
Conversation
ccbfd12
to
525e236
Compare
This pull request is not mergeable. Please rebase and repush. |
525e236
to
e91fc36
Compare
6dce432
to
47eff74
Compare
Parsing information about objects on S3 buckets. Calculating bucket's size and object count.
47eff74
to
59a95ae
Compare
Checked commit aiperon@59a95ae with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0 |
@@ -23,6 +23,8 @@ | |||
:values: | |||
- machine | |||
:ignore_terminated_instances: true | |||
:s3: | |||
:inventory_object_refresh: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old way of inventory refresh is not able to handle large amount of objects.
For example, 100k S3 objects handled by:
- 2 minutes with the new inventory object refresher
- about 2 hours with the old refresher
As far as removing old refresher is the question of time, we've decided to set the default setting to new mechanism usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I'll be doing the same for the other managers soon
hm, 2m vs. 2h, that is quite a bit difference, I wonder what makes it so slow in the old refresh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the time consumed on the next lines:
https://github.com/ManageIQ/manageiq/blob/master/app/models/ems_refresh/save_inventory_helper.rb#L55
https://github.com/ManageIQ/manageiq/blob/master/app/models/ems_refresh/save_inventory_helper.rb#L69
To reproduce that:
- change s3_objects_per_bucket_count to "scaling * 10000" in aws_stubs.rb
- run rspec plugins/managiq-amazon-provider/spec/models/manageiq/providers/amazon/storage_manager/s3/stubbed_refresher_spec.rb
I've rejected idea of optimizing legacy parser's core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I mean the new refresh is 'the optimization' :-D
but I think that in this case you are hitting a 5.0.1 Rails bug, the association.push has there a horrible O(n^2), it's already fixed in rails Master. So with that fixed, the number should not be that big for the old refresh. :-) Should be something around 15 minutes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can try by removing the 'unless @target.include?' here https://github.com/rails/rails/blob/v5.0.1/activerecord/lib/active_record/associations/collection_association.rb#L663
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, association.push is not the major problem there. Iterating through hash is the bigger problem. Like 80 / 20 (iterating hash with find or create / association push).
Yeah, and thanks for the optimized new refresh! It's complex, but at least pretty fast))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm you are right, for me it's 7500s vs 60s, with the rails fix applied. So something is really wrong in the old refresh
There is performance issue with provider deletion. I wasn't able to do it until I delete all of the CloudObjectStoreObject records (>40k) manually in console. Mass-deleting of objects should be implemented on Manager level. |
@aiperon so a similar issue is for cloud.vms, so it's solved by doing dependent => :nullify instead of destroy Then it leaves there the vms as 'archived' So having :dependent => :destroy for a large number of records will be always slow. This is kind of a way around. We should probably just mark provider as (being deleted) while doing all this on background. The difference with Vms is that we never delete them, but always archive them. |
@aiperon looks like a great first version 👍 since the code here got more complex, I think we could use the nice DSL the Ansible is using https://github.com/Ladas/manageiq/blob/35da1fa8e9804259fe133a834a7b48b93245464e/app/models/manageiq/providers/ansible_tower/inventory/parser/automation_manager.rb#L28-L28 process_inventory_collection helper is too clunky, when there is too much logic But this is good to merge, we can do refactoring as another PR, while passing the same specs. :-) |
@Ladas |
So the best should be to somehow mark the Manager as being deleted, which would prevent other operations on it and then do the deletion as a background task. The main issue is that it runs or foreground and blocks the UI, right? Or doing the dependent => :nullify and a background clean up later. But the point of :dependent => :nullify is to keep the zombie. For vms, we need to keep them so e.g. reports and events won't miss the association. |
I suppose it's in a background already. But "somehow" it doesn't do the job. |
Work for Amazon ObjectStorage (S3) provider.
Parsing information about objects on S3 buckets. Calculating bucket's size and object count.
Implemented with both old and inventory parser.
@miq-bot add_label enhancement, euwe/no