Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move core job management codes to controller in go #462

Closed
typhoonzero opened this issue Nov 1, 2017 · 3 comments
Closed

Move core job management codes to controller in go #462

typhoonzero opened this issue Nov 1, 2017 · 3 comments

Comments

@typhoonzero
Copy link
Collaborator

typhoonzero commented Nov 1, 2017

According to previous discussions, we'd like to put job management, autoscaling, job priority scheduler to "controller", move the python implementation to go.

This change can make the website slimmer, also make the user management pluggable. Modules we should have after this changes:

  • Static web pages for general information.
  • Simple account management.
  • paddlectl simply cache k8s keys and storage service keys (like ceph) and talk directly to k8s api-server using "TrainingJob" resource.
  • controller parse "TrainingJob" resource to paddle job, including master, pserver and trainer.
  • autoscaler runs in background and scale the current jobs in cluster.
  • scheduler determine how much each job can consume using some GPU priority algorithm.
@Yancey1989
Copy link
Collaborator

Additional feature:

  1. Some preiority algorithm for different users or different namespace

scheduler determine how much each job can consume using some GPU priority algorithm

I think this will be implemented in a new Scheduler component in k8s, that's right? Becase only scheduler will assume which node that the Pod will be running.

@typhoonzero
Copy link
Collaborator Author

I think this will be implemented in a new Scheduler component in k8s, that's right?

That's right!

@gongweibao gongweibao self-assigned this Nov 3, 2017
@gongweibao
Copy link
Collaborator

For me:

paddlectl simply cache k8s keys and storage service keys (like ceph) and talk directly to k8s api-server using "TrainingJob" resource.
controller parse "TrainingJob" resource to paddle job, including master, pserver and trainer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants