Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we need paddlectl client once we have the kubernetes custom controller? #383

Open
typhoonzero opened this issue Oct 11, 2017 · 15 comments

Comments

@typhoonzero
Copy link
Collaborator

Once we have TPR/CRD declared resource:

apiVersion: paddlepaddle.org/v1
kind: TrainingJob
metadata:
  name: job-1
spec:
  image: "paddlepaddle/paddlecloud-job"
  trainer:
    entrypoint: "python train.py"
    workspace: "/home/job-1/"
    min-instance: 3
    max-instance: 6
    resources:
      limits:
        alpha.kubernetes.io/nvidia-gpu: 1
        cpu: "800m"
        memory: "1Gi"
      requests:
        cpu: "500m"
        memory: "600Mi"
  pserver:
    min-instance: 3
    max-instance: 3
    resources:
      limits:
        cpu: "800m"
        memory: "1Gi"
      requests:
        cpu: "500m"
        memory: "600Mi"

Run kubectl create -f job.yaml is exactly equal to the current paddlectl submit -jobname xxx -gpu xxx ...

The only difference is that paddlectl client is able to upload and download training data files.

@Yancey1989
Copy link
Collaborator

Yancey1989 commented Oct 11, 2017

Cool! Maybe we can use kubectl instead of paddlectl? I have some ideas about this:

  • Advantage

    • Users can use some kubectl features directly, such as kubectl logs, kubectl get pods..., we don't need to implement these features on the cloud server.
    • We can use RBAC instead of Django admin to manage the users.
  • Disadvantage

    • kubectl use YAML as the configuration file, it's hard to use the command-line parameters.

@typhoonzero
Copy link
Collaborator Author

Plus disadvantage:
kubectl exposed too much details of kubernetes that users may never use.

@Yancey1989
Copy link
Collaborator

Yancey1989 commented Oct 11, 2017

An extra suggestion, shall we change the resource name from TrainingJob to Paddle? Maybe it makes more sense.

@gongweibao
Copy link
Collaborator

gongweibao commented Oct 11, 2017

Plus disadvantage:
If a YAML's format is not right, it's hard to find where it is, so it's not convenient for the user to use it.

@typhoonzero
Copy link
Collaborator Author

@Yancey1989 thought TrainingJob is more general, not only paddle training.

@putcn
Copy link

putcn commented Oct 11, 2017

this is an interesting thinking. 👍
my 2 cents are: Can we make paddlectl kind of proxy to kubectl? so that we can do some filtering on the features we don't want to expose to end user before the parameters actually hit kubectl and still keep the same command pattern?

@helinwang
Copy link
Collaborator

helinwang commented Oct 11, 2017

Maybe our local command line can take the yaml as the input. So we don't have to map user's input to the ymal again.

I am more inclined not allowing our user to use kubectl, since what we want to support is just a subset of kubectl (e.g., do we want to allow the user create any Pod?), maybe we can use @putcn 's idea, "make paddlectl kind of proxy to kubectl, so that we can do some filtering"

@typhoonzero
Copy link
Collaborator Author

Support @putcn 's idea! Proxing and filter is simple enough and easy!

@Yancey1989
Copy link
Collaborator

From @helinwang

do we want to allow the user create any Pod

I don't think so, it's not safely and out of our control.

From @putcn

make paddlectl kind of proxy to kubectl, so that we can do some filtering

It's a good idea! We can use cloud server as a proxy, paddlectl convert command-line parameters to YAML and cloud server submit the YAML to kubernetes.

@Yancey1989
Copy link
Collaborator

Maybe I can develop this feature, how about push to the controller branch, so that we can publish a complete feature(auto-scaling) when we merge to the develop branch.

@helinwang
Copy link
Collaborator

@Yancey1989 Sure, that would be awesome!

@pineking
Copy link

That's a great idea, I have one more question, @Yancey1989 why we need cloud server to submit the YAML to kubernetes, could the paddlectl submit the YAML directly?

@Yancey1989
Copy link
Collaborator

Hi @pineking , As the design #378 , PaddleCloud has its own account management , RBAC in kubernetes is too simple, so we can not submit the YAML directly, and I think this is the main reason.

@pineking
Copy link

@Yancey1989 , thanks, I will read the design.

@helinwang
Copy link
Collaborator

helinwang commented Oct 18, 2017

Today's discussion result:

  1. We still need server since it knows about cloud storage. Command line will be backward compatible (internally convert to yaml), support use submit yaml directly. Client will send yaml to server.

  2. Eventually controller will start / scale / kill training job (now controller is only scaling job).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants