Subclass VmScan job as ManageIQ::Providers::Amazon::CloudManager::Scanning::Job #581

chessbyte · 2019-12-12T19:16:15Z

State Machine diagram for this subclass of VmScan.

…nning::Job

miq-bot · 2019-12-12T19:56:31Z

Checked commit chessbyte@55f214d with ruby 2.5.5, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0
4 files checked, 0 offenses detected
Everything looks fine. 🏆

chessbyte · 2019-12-12T19:57:50Z

app/models/manageiq/providers/amazon/cloud_manager/scanning/job.rb

+  #     * Then the job is marked as finished - and THAT is why the transition below is needed
+  #
+  #   - In a different worker process, ManageIQ::Providers::Amazon::AgentCoordinatorWorker::Runner
+  #     * checks if a ManageIQ agent is running in the Amazon AWS Cloud, and starts it, if needed


@roliveri @hsong-rh @jerryk55 @Fryguy @agrare why not leave the job in the scanning state here? Then, in ManageIQ::Providers::Amazon::AgentCoordinatorWorker::Runner::ResponseThread#perform_metadata_sync, move to synchronizing state before going to finished state?

chessbyte · 2019-12-12T20:13:25Z

app/models/manageiq/providers/amazon/cloud_manager/scanning/job.rb

+  #       * puts response into SQS queue
+  #
+  #   - In a different worker process, ManageIQ::Providers::Amazon::AgentCoordinatorWorker::Runner starts a ResponseThread
+  #     * The ResponseThread checks the SQS queue intermittently.


@roliveri @hsong-rh @jerryk55 @Fryguy @agrare Can the ResponseThread be replaced by each job looking for its own Response in SQS? Was this design driven by AWS cost minimization? If so, according to https://aws.amazon.com/sqs/pricing, the first million requests per month to SQS are free and the subsequent ones cost virtually nothing.

Why would that be better? It seems more efficient the way it is.

For one thing it requires an entire extra worker. (Though if this dedicated worker morphed into an operations worker and smartstate was a type of operation, then that argument is minimized cc @agrare )

@chessbyte Just to get some ballpark figures, 10,000 amazon instances doing smartstate once per day = 3,650,000 requests (and that's only polling once per Vm, not once every so often), which comes out to a grand total of $1.06 🤑

It's still messier. Multiple jobs reading from the same queue. Then, if the message isn't for them, they have to re-queue the message (unless there's a better way). An extra worker may be better than having the jobs doing more work.

Or the jobs could just check S3 directly, that would eliminate the queue issues.

I still think it's a lot cleaner the way it is. Is having an additional worker such a big deal?

Even if we wanted to read all the responses and then distribute them to each job via MiqQueue or another mechanism, that polling on a timer can be done via the ScheduleWorker and a new schedule. The ScheduleWorker would do nothing more than queue up some work for the Amazon Operations Worker.

Most, if not all, of what we're doing now is for a reason. We'll have to go back and revisit all of the design decisions to validate the feasibility of proposed changes.

The way it is now, works well, and it takes advantage of the queue, to streamline the process.
I still don't see the issue with having the worker. The worker subsystem already handles the coordination and orchestration, so we're not adding complexity. I'm really not seeing any problem that would warrant a change.

We need a central point of control to handle the agent management and queue processing. By its nature, this should be performed in an independent thread of execution, be it a separate process, or threads of an existing process. A worker seemed a logical choice to implement this.

If we can leverage the ScheduleWorker, instead of a dedicated worker, that's fine.
However, I believe the queue operations block, so there's no deed to poll in a dedicated thread.
If this necessitates polling, we'd have to access the queue in a non-blocking manner, which would add complexity to the code.

I'm still wondering it this is a problem that warrants a change.

Maybe because I was not very involved in the initial design, I am reviewing it now since I am refactoring. So, I think having a discussion is warranted. Whether the discussion leads to a design/code change is a separate issue.

The only reason I focused in on this particular SmartState is because it is SO different from everything else - extra workers, unusual job state transitions, the parsing of the XML happening outside of the VmScan job code to name a few things that caught my eye.

That's because Amazon is so different. But I'm not sure if all the differences in the Amazon code are required, or were just the result of expediency of development under given time constraints.

I think the current queue handling is probably the best way to go.

I'm not sure about the agent management. Hui knows most about that - but I recall it took some work to make sure it addressed all the corner cases.

The state transitions can be addressed, but with the advent of provider-specific scan jobs, that may not be a big issue.

It would be good if the XML can be parsed in a common way. I'm not sure if there's a good reason it is not.

chessbyte added refactoring smart state labels Dec 12, 2019

chessbyte requested review from agrare and Fryguy as code owners December 12, 2019 19:16

chessbyte mentioned this pull request Dec 12, 2019

Simplified VmScan base class as all the provider logic has moved into relevant subclasses ManageIQ/manageiq#19607

Merged

6 tasks

chessbyte force-pushed the subclass_vm_scan branch from fe9236f to 3a10e31 Compare December 12, 2019 19:36

Subclass VmScan job as ManageIQ::Providers::Amazon::CloudManager::Sca…

55f214d

…nning::Job

chessbyte force-pushed the subclass_vm_scan branch from 3a10e31 to 55f214d Compare December 12, 2019 19:53

chessbyte commented Dec 12, 2019

View reviewed changes

agrare self-assigned this Jan 6, 2020

agrare approved these changes Jan 6, 2020

View reviewed changes

agrare merged commit 21ee878 into ManageIQ:master Jan 6, 2020

agrare added this to the Sprint 127 Ending Jan 6, 2020 milestone Jan 6, 2020

chessbyte deleted the subclass_vm_scan branch July 9, 2020 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subclass VmScan job as ManageIQ::Providers::Amazon::CloudManager::Scanning::Job #581

Subclass VmScan job as ManageIQ::Providers::Amazon::CloudManager::Scanning::Job #581

chessbyte commented Dec 12, 2019 •

edited

Loading

miq-bot commented Dec 12, 2019

chessbyte Dec 12, 2019 •

edited

Loading

chessbyte Dec 12, 2019

roliveri Dec 12, 2019

Fryguy Dec 13, 2019

Fryguy Dec 13, 2019 •

edited

Loading

roliveri Dec 13, 2019

chessbyte Dec 13, 2019

roliveri Dec 13, 2019

roliveri Dec 13, 2019

chessbyte Dec 13, 2019

roliveri Dec 13, 2019 •

edited

Loading

Subclass VmScan job as ManageIQ::Providers::Amazon::CloudManager::Scanning::Job #581

Subclass VmScan job as ManageIQ::Providers::Amazon::CloudManager::Scanning::Job #581

Conversation

chessbyte commented Dec 12, 2019 • edited Loading

miq-bot commented Dec 12, 2019

chessbyte Dec 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fryguy Dec 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roliveri Dec 13, 2019 • edited Loading

Choose a reason for hiding this comment

chessbyte commented Dec 12, 2019 •

edited

Loading

chessbyte Dec 12, 2019 •

edited

Loading

Fryguy Dec 13, 2019 •

edited

Loading

roliveri Dec 13, 2019 •

edited

Loading