Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REARCH] Container workers #15884

Merged
merged 11 commits into from
Feb 16, 2018
Merged

[REARCH] Container workers #15884

merged 11 commits into from
Feb 16, 2018

Conversation

carbonin
Copy link
Member

Run workers in OpenShift containers.

This change allows the server to start workers by talking to OpenShift and creating deployments and scaling them when needed.

This also stubs out some existing functionality which is process specific and doesn't apply to workers running in containers.

@carbonin
Copy link
Member Author

cc @Fryguy @bdunne @gtanzillo @jrafanie @NickLaMuro

Just a look at the work so far for Openshift based worker containers

@miq-bot
Copy link
Member

miq-bot commented Aug 29, 2017

This pull request is not mergeable. Please rebase and repush.

@carbonin carbonin force-pushed the container_workers branch 2 times, most recently from 93b8ff8 to e7ad45a Compare August 29, 2017 20:54
@carbonin carbonin force-pushed the container_workers branch 2 times, most recently from 088f836 to 55d2d92 Compare August 31, 2017 22:04
require_nested :Runner

include PerEmsTypeWorkerMixin

self.required_roles = ["ems_metrics_collector"]

def self.supports_container?
true
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of this now?

@@ -25,6 +25,7 @@ def ansible_secret_key
def ansible_secret_key=(key)
auth = authentication_for_type(ANSIBLE_SECRET_KEY_TYPE, "Ansible Secret Key")
update_ansible_authentication(auth, :auth_key => key)
key
Copy link
Member

@Fryguy Fryguy Sep 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary as any setter method (ending in =) automatically returns the incoming param

$ irb
[1] pry(main)> def blah=(x)
[1] pry(main)*   "HI"
[1] pry(main)* end
=> :blah=
[2] pry(main)> blah = "NOT HI"
=> "NOT HI"

@@ -1,28 +1,84 @@
require 'kubeclient'

class ContainerOrchestrator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to namespace this under ManageIQ, so as not to confuse any code from the container providers

yield(definition) if block_given?
connection.create_deployment_config(definition)
rescue KubeException => e
raise unless e.message =~ /already exists/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can open this as an RFE on KubeClient where they deal with the "already exists" message and raise a different exception. Also, their exceptions should be namespaced (a second separate issue).

def create_deployment_config(name)
  ...
rescue KubeClient::AlreadyExistsException
  # suppress
end

@carbonin carbonin force-pushed the container_workers branch 2 times, most recently from 85b9f70 to b72c986 Compare September 7, 2017 19:07
@miq-bot
Copy link
Member

miq-bot commented Sep 7, 2017

This pull request is not mergeable. Please rebase and repush.

@miq-bot
Copy link
Member

miq-bot commented Oct 3, 2017

This pull request is not mergeable. Please rebase and repush.

@carbonin carbonin force-pushed the container_workers branch 2 times, most recently from 29166d4 to 5052f7b Compare December 8, 2017 21:52
end

def container_port
3001
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 3000?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, 3000 is exposed from the container and since the UI needs to serve assets as well as run the worker httpd listens on 3000 then points to 3001 for the worker.

end

def container_image_name
"manageiq/manageiq-ui-worker"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you be able to change the org name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop the "manageiq-" from the container name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to put all of this in settings like we did for the awx stuff, but I think I'm going to change that in a follow-up.

As for the name, I don't really have a preference. Thoughts @Fryguy ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carbonin and I discussed moving this into Settings, but I wanted to get this in first, and do that as an incremental improvement.


def prepare
super
MiqApache::Control.start if MiqEnvironment::Command.is_container?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was going away.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

We need it to serve static assets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot about that 👍

end

def self.create_worker_record(*params)
w = init_worker_object(*params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could simplify to init_worker_object(*params).tap(&:save)

end

def container_image_name
"manageiq/manageiq-base-worker"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

end

def worker_deployment_name
@worker_deployment_name ||= begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol


# Can be overriden by including classes
def container_image_name
"manageiq/manageiq-webserver-worker"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

These types of things will probably not be needed when we use containers
exclusively, but try to support both for now.
The idea is that each leaf worker class will include one of these
concerns which will determine how it gets deployed.

ReplicaPerWorker workers are more like cattle. If we need a new one
we just add a new one; if we need one less we just delete one. We
never need to know their identity.

DeploymentPerWorker workers are more like pets, but instead of
using StatefulSets, we are using separate deployments so that we
can provide a particular environment to each of them.

This will be used to give provider workers a provider specific
queue name.

ServiceWorker workers are like ReplicaPerWorker workers, but
fronted by a service. As such they have readiness checks and must
define the service's port.
OpenShift does not allow names over 63 characters
In this container, httpd will listen on port 3000 and will serve
assets and some error documents as well as the UI
This will allow the worker (in the container) to create its own
GUID. This is necessary for worker types with multiple replicas
because we cannot tell an individual replica what its GUID is
supposed to be.

Also, we need the queue_name parameter for the deployment
per worker model. In this case we want to be able to reference
the queue name on the worker instance, but still don't want
to save that instance to the database.

We can't use create_worker_record as is, because run_single_worker.rb
also needs a way to save the worker record when it's running in a container
Before this was failing because the worker instance didn't have
any of the required values set.

Things like the pid, server, and guid will be set by the worker
when it comes up in the separate container and adds the worker row.
The container entrypoint will provide the contents of the EMS_IDS
environment variable to run_single_worker.rb if EMS_IDS is set
Copy link
Member

@NickLaMuro NickLaMuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments. Mostly questions and refactoring suggestions, but nothing seemed incorrect from my un-educated perspective.

require_nested :Runner

include PerEmsTypeWorkerMixin

self.required_roles = ["ems_metrics_collector"]

def self.supports_container?
true
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fryguy 's comment here is a bit old, but this is on the same lines:

Is it possible that we can put this method into the MiqWorker::ReplicaPerWorker modules? Or as a shared module that the new modules you have created here inherit from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could probably move this into the MiqWorker modules in some way rather than having it implemented explicitly in each worker.

Alternatively I could just flip the default implementation in miq_worker.rb and override where we still don't want to start a worker as a container which might provide more useful information at this point. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either is fine, and it is probably simpler to only have it in one spot. Though, if you do invert it in miq_worker.rb, I think maybe a disclaimer comment above the method might be worth while, just so those passing through know it is disabled on certain workers. With the other approach, since you would be adding it to things like MiqWorker::ReplicaPerWorker and including it in the worker class, I think that would be more easily assumed.

end

def self.start_worker(*params)
w = create_worker_record(*params)
w = containerized_worker? ? init_worker_object(*params) : create_worker_record(*params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I see what your doing here, and why, but curious what this means for the future:

  • To clarify, we are doing creating worker records because it is too hard to manage our DB records with the containers that are currently running on OpenShift?
  • Does this mean scripts/tools like evm:status would have to be completely re-tooled to work when running in a container env?
  • Is the plan just to eventually drop these tables?
  • Does this mess any other portions of the MiqServer#monitor_loop, now that there aren't DB records? (looks like you covered stop_worker, but curious if there are other pieces I am missing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, we're still going to have worker records, it's just that they need to be created by the worker container rather than by the server.

This is really because of the worker guid. We can't generate the guid here because we can't tell a replica what guid it should be using. The solution is to have the worker itself create the guid, and by extension the entire record.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just that they need to be created by the worker container rather than by the server.

Doh... I even wrote the code that does that and forgot... okay, guess that pretty much answers most of the questions then huh?

We can't generate the guid here because we can't tell a replica what guid it should be using.

Forgive my lacking of openshift knowledge, but I assume we can't even pass a replica a ENV var if we were to make a change to support that in the run_single_worker.rb script as well? I assume no, but just thought I would check and make sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we can't even pass a replica a ENV var if we were to make a change to support that in the run_single_worker.rb script as well?

Correct, the environment for the container is defined at the deployment config level so each replica will have the same set of environment variables.

@miq-bot
Copy link
Member

miq-bot commented Feb 16, 2018

Checked commits carbonin/manageiq@7c2818b~...7f53a13 with ruby 2.3.3, rubocop 0.52.0, haml-lint 0.20.0, and yamllint 1.10.0
23 files checked, 0 offenses detected
Everything looks fine. 🍪

Copy link
Member

@Fryguy Fryguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, let's do separate PRs for

  • moving the hardcoded images into Settings
  • Getting rid of self.supports_container, or defaulting differently.

@carbonin carbonin changed the title [WIP] [REARCH] Container workers [REARCH] Container workers Feb 16, 2018
@Fryguy Fryguy merged commit 38ede59 into ManageIQ:master Feb 16, 2018
@Fryguy
Copy link
Member

Fryguy commented Feb 16, 2018

@Fryguy Fryguy added this to the Sprint 80 Ending Feb 26, 2018 milestone Feb 16, 2018
carbonin added a commit to carbonin/manageiq that referenced this pull request May 22, 2018
This will allow us to use the G-release architecture for the
H-release container images without undoing all of the work
in ManageIQ#15884

https://www.pivotaltracker.com/story/show/157774394
carbonin added a commit to carbonin/manageiq that referenced this pull request May 29, 2018
This will allow us to use the G-release architecture for the
H-release container images without undoing all of the work
in ManageIQ#15884

https://www.pivotaltracker.com/story/show/157774394
lpichler pushed a commit to lpichler/manageiq that referenced this pull request Jun 6, 2018
This will allow us to use the G-release architecture for the
H-release container images without undoing all of the work
in ManageIQ#15884

https://www.pivotaltracker.com/story/show/157774394
@carbonin carbonin deleted the container_workers branch August 16, 2019 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants