PaddlePaddle EDL: Elastic Deep Learning

While many hardware and software manufacturers are working on improving the running time of deep learning jobs, EDL optimizes

the global utilization of the cluster, and
the waiting time of job submitters.

For more about the project EDL, please refer to this invited blog post on the Kubernetes official blog.

EDL includes two parts:

a Kubernetes controller for the elastic scheduling of distributed deep learning jobs, and
making PaddlePaddle a fault-tolerable deep learning framework. This directory contains the Kubernetes controller. For more information about fault-tolerance, please refer to the design.

We deployed EDL on a real Kubernetes cluster, dlnel.com, opened for graduate students of Tsinghua University. The performance test report of EDL on this cluster is here.

Build

glide install --strip-vendor
go build -o path/to/output github.com/paddlepaddle/edl/cmd/edl

Name		Name	Last commit message	Last commit date
Latest commit History 846 Commits
.tools		.tools
cmd/edl		cmd/edl
pkg		pkg
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
glide.lock		glide.lock
glide.yaml		glide.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaddlePaddle EDL: Elastic Deep Learning

Build

About

Releases

Packages

Languages

License

m3ngyang/edl

Folders and files

Latest commit

History

Repository files navigation

PaddlePaddle EDL: Elastic Deep Learning

Build

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages