-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subclass VmScan job as ManageIQ::Providers::Amazon::CloudManager::Scanning::Job #581
Conversation
fe9236f
to
3a10e31
Compare
3a10e31
to
55f214d
Compare
Checked commit chessbyte@55f214d with ruby 2.5.5, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0 |
# * Then the job is marked as finished - and THAT is why the transition below is needed | ||
# | ||
# - In a different worker process, ManageIQ::Providers::Amazon::AgentCoordinatorWorker::Runner | ||
# * checks if a ManageIQ agent is running in the Amazon AWS Cloud, and starts it, if needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# * puts response into SQS queue | ||
# | ||
# - In a different worker process, ManageIQ::Providers::Amazon::AgentCoordinatorWorker::Runner starts a ResponseThread | ||
# * The ResponseThread checks the SQS queue intermittently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roliveri @hsong-rh @jerryk55 @Fryguy @agrare Can the ResponseThread be replaced by each job looking for its own Response in SQS? Was this design driven by AWS cost minimization? If so, according to https://aws.amazon.com/sqs/pricing, the first million requests per month to SQS are free and the subsequent ones cost virtually nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would that be better? It seems more efficient the way it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For one thing it requires an entire extra worker. (Though if this dedicated worker morphed into an operations worker and smartstate was a type of operation, then that argument is minimized cc @agrare )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chessbyte Just to get some ballpark figures, 10,000 amazon instances doing smartstate once per day = 3,650,000 requests (and that's only polling once per Vm, not once every so often), which comes out to a grand total of $1.06 🤑
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still messier. Multiple jobs reading from the same queue. Then, if the message isn't for them, they have to re-queue the message (unless there's a better way). An extra worker may be better than having the jobs doing more work.
Or the jobs could just check S3 directly, that would eliminate the queue issues.
I still think it's a lot cleaner the way it is. Is having an additional worker such a big deal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we wanted to read all the responses and then distribute them to each job via MiqQueue or another mechanism, that polling on a timer can be done via the ScheduleWorker and a new schedule. The ScheduleWorker would do nothing more than queue up some work for the Amazon Operations Worker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most, if not all, of what we're doing now is for a reason. We'll have to go back and revisit all of the design decisions to validate the feasibility of proposed changes.
The way it is now, works well, and it takes advantage of the queue, to streamline the process.
I still don't see the issue with having the worker. The worker subsystem already handles the coordination and orchestration, so we're not adding complexity. I'm really not seeing any problem that would warrant a change.
We need a central point of control to handle the agent management and queue processing. By its nature, this should be performed in an independent thread of execution, be it a separate process, or threads of an existing process. A worker seemed a logical choice to implement this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can leverage the ScheduleWorker, instead of a dedicated worker, that's fine.
However, I believe the queue operations block, so there's no deed to poll in a dedicated thread.
If this necessitates polling, we'd have to access the queue in a non-blocking manner, which would add complexity to the code.
I'm still wondering it this is a problem that warrants a change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe because I was not very involved in the initial design, I am reviewing it now since I am refactoring. So, I think having a discussion is warranted. Whether the discussion leads to a design/code change is a separate issue.
The only reason I focused in on this particular SmartState is because it is SO different from everything else - extra workers, unusual job state transitions, the parsing of the XML happening outside of the VmScan job code to name a few things that caught my eye.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because Amazon is so different. But I'm not sure if all the differences in the Amazon code are required, or were just the result of expediency of development under given time constraints.
-
I think the current queue handling is probably the best way to go.
-
I'm not sure about the agent management. Hui knows most about that - but I recall it took some work to make sure it addressed all the corner cases.
-
The state transitions can be addressed, but with the advent of provider-specific scan jobs, that may not be a big issue.
-
It would be good if the XML can be parsed in a common way. I'm not sure if there's a good reason it is not.
State Machine diagram for this subclass of VmScan.