Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon S3 objects inventoring #123

Merged
merged 1 commit into from
Feb 21, 2017

Conversation

aiperon
Copy link
Contributor

@aiperon aiperon commented Jan 27, 2017

Work for Amazon ObjectStorage (S3) provider.

Parsing information about objects on S3 buckets. Calculating bucket's size and object count.
Implemented with both old and inventory parser.

@miq-bot add_label enhancement, euwe/no

@miq-bot
Copy link
Member

miq-bot commented Feb 2, 2017

This pull request is not mergeable. Please rebase and repush.

@aiperon aiperon force-pushed the amazon-s3-objects-parsing branch from 525e236 to e91fc36 Compare February 19, 2017 18:30
@aiperon aiperon force-pushed the amazon-s3-objects-parsing branch 3 times, most recently from 6dce432 to 47eff74 Compare February 20, 2017 09:31
@aiperon aiperon changed the title [WIP] Amazon S3 objects listing Amazon S3 objects listing Feb 20, 2017
@aiperon aiperon changed the title Amazon S3 objects listing Amazon S3 objects inventoring Feb 20, 2017
Parsing information about objects on S3 buckets.
Calculating bucket's size and object count.
@aiperon aiperon force-pushed the amazon-s3-objects-parsing branch from 47eff74 to 59a95ae Compare February 20, 2017 09:34
@miq-bot
Copy link
Member

miq-bot commented Feb 20, 2017

Checked commit aiperon@59a95ae with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
8 files checked, 0 offenses detected
Everything looks good. ⭐

@@ -23,6 +23,8 @@
:values:
- machine
:ignore_terminated_instances: true
:s3:
:inventory_object_refresh: true
Copy link
Contributor Author

@aiperon aiperon Feb 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old way of inventory refresh is not able to handle large amount of objects.
For example, 100k S3 objects handled by:

  • 2 minutes with the new inventory object refresher
  • about 2 hours with the old refresher

As far as removing old refresher is the question of time, we've decided to set the default setting to new mechanism usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'll be doing the same for the other managers soon

hm, 2m vs. 2h, that is quite a bit difference, I wonder what makes it so slow in the old refresh

Copy link
Contributor Author

@aiperon aiperon Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the time consumed on the next lines:
https://github.com/ManageIQ/manageiq/blob/master/app/models/ems_refresh/save_inventory_helper.rb#L55
https://github.com/ManageIQ/manageiq/blob/master/app/models/ems_refresh/save_inventory_helper.rb#L69
To reproduce that:

  • change s3_objects_per_bucket_count to "scaling * 10000" in aws_stubs.rb
  • run rspec plugins/managiq-amazon-provider/spec/models/manageiq/providers/amazon/storage_manager/s3/stubbed_refresher_spec.rb

I've rejected idea of optimizing legacy parser's core.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I mean the new refresh is 'the optimization' :-D

but I think that in this case you are hitting a 5.0.1 Rails bug, the association.push has there a horrible O(n^2), it's already fixed in rails Master. So with that fixed, the number should not be that big for the old refresh. :-) Should be something around 15 minutes

Copy link
Contributor

@Ladas Ladas Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, association.push is not the major problem there. Iterating through hash is the bigger problem. Like 80 / 20 (iterating hash with find or create / association push).

Yeah, and thanks for the optimized new refresh! It's complex, but at least pretty fast))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm you are right, for me it's 7500s vs 60s, with the rails fix applied. So something is really wrong in the old refresh

@aiperon
Copy link
Contributor Author

aiperon commented Feb 20, 2017

There is performance issue with provider deletion. I wasn't able to do it until I delete all of the CloudObjectStoreObject records (>40k) manually in console. Mass-deleting of objects should be implemented on Manager level.

@Ladas
Copy link
Contributor

Ladas commented Feb 21, 2017

@aiperon so a similar issue is for cloud.vms, so it's solved by doing dependent => :nullify instead of destroy
https://github.com/Ladas/manageiq/blob/4240b7c1f4d42d7013c1ba11ad6d55eaa84a53bd/app/models/ext_management_system.rb#L37-L37

Then it leaves there the vms as 'archived'

So having :dependent => :destroy for a large number of records will be always slow. This is kind of a way around. We should probably just mark provider as (being deleted) while doing all this on background. The difference with Vms is that we never delete them, but always archive them.

@Ladas
Copy link
Contributor

Ladas commented Feb 21, 2017

@aiperon looks like a great first version 👍 since the code here got more complex, I think we could use the nice DSL the Ansible is using https://github.com/Ladas/manageiq/blob/35da1fa8e9804259fe133a834a7b48b93245464e/app/models/manageiq/providers/ansible_tower/inventory/parser/automation_manager.rb#L28-L28

process_inventory_collection helper is too clunky, when there is too much logic

But this is good to merge, we can do refactoring as another PR, while passing the same specs. :-)

@Ladas Ladas merged commit a2b9fe6 into ManageIQ:master Feb 21, 2017
@aiperon
Copy link
Contributor Author

aiperon commented Feb 21, 2017

@Ladas
I don't think dependent => :nullify is the appropriate way. Potentially, we could get zombies.
We could set cloud_object_store_objects as :dependent => :delete_all. But they're taggable.
Is it possible to describe more complex before_destroy for storage_manager?

@Ladas
Copy link
Contributor

Ladas commented Feb 21, 2017

So the best should be to somehow mark the Manager as being deleted, which would prevent other operations on it and then do the deletion as a background task. The main issue is that it runs or foreground and blocks the UI, right?

Or doing the dependent => :nullify and a background clean up later. But the point of :dependent => :nullify is to keep the zombie. For vms, we need to keep them so e.g. reports and events won't miss the association.

@aiperon
Copy link
Contributor Author

aiperon commented Feb 21, 2017

The main issue is that it runs or foreground and blocks the UI, right?

I suppose it's in a background already. But "somehow" it doesn't do the job.

@blomquisg blomquisg added this to the Sprint 55 Ending Feb 27, 2017 milestone Feb 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants