You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to previous discussions, we'd like to put job management, autoscaling, job priority scheduler to "controller", move the python implementation to go.
This change can make the website slimmer, also make the user management pluggable. Modules we should have after this changes:
Static web pages for general information.
Simple account management.
paddlectl simply cache k8s keys and storage service keys (like ceph) and talk directly to k8s api-server using "TrainingJob" resource.
controller parse "TrainingJob" resource to paddle job, including master, pserver and trainer.
autoscaler runs in background and scale the current jobs in cluster.
scheduler determine how much each job can consume using some GPU priority algorithm.
The text was updated successfully, but these errors were encountered:
Some preiority algorithm for different users or different namespace
scheduler determine how much each job can consume using some GPU priority algorithm
I think this will be implemented in a new Scheduler component in k8s, that's right? Becase only scheduler will assume which node that the Pod will be running.
paddlectl simply cache k8s keys and storage service keys (like ceph) and talk directly to k8s api-server using "TrainingJob" resource.
controller parse "TrainingJob" resource to paddle job, including master, pserver and trainer.
According to previous discussions, we'd like to put job management, autoscaling, job priority scheduler to "controller", move the python implementation to go.
This change can make the website slimmer, also make the user management pluggable. Modules we should have after this changes:
paddlectl
simply cache k8s keys and storage service keys (like ceph) and talk directly to k8s api-server using "TrainingJob" resource.controller
parse "TrainingJob" resource to paddle job, including master, pserver and trainer.The text was updated successfully, but these errors were encountered: