DGL Operator

The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network distributed or non-distributed training on Kubernetes. Please check out here for an introduction to DGL and dgl distributed training philosophy.

🛠Prerequisites

Kubernetes >= 1.16

🚀Installation

You can deploy the operator with default settings by running the following commands:

git clone https://github.com/Qihoo360/dgl-operator
cd dgl-operator
kubectl create -f deploy/v1alpha1/dgl-operator.yaml

You can check whether the DGL Job custom resource is installed via:

kubectl get crd

The output should include dgljobs.qihoo.net like the following:

NAME                                       AGE
...
dgljobs.qihoo.net                          1m
...

🔬Creating a DGL Job

You can create a DGL job by defining an DGLJob config file. See GraphSAGE.yaml or GraphSAGE_dist.yaml example config file for launching a single-node or multi-node GraphSAGE training job. You may change the config file based on your requirements.

# standalone GraphSAGE
cat examples/v1alpha1/GraphSAGE.yaml
# or a distributed version
cat examples/v1alpha1/GraphSAGE_dist.yaml

Deploy the DGLJob resource to start training:

# standalone GraphSAGE
kubectl create -f examples/v1alpha1/GraphSAGE.yaml
# or a distributed version
kubectl create -f examples/v1alpha1/GraphSAGE_dist.yaml

💭 Reference

Please check out these previous works that helped inspire the creation of DGL Operator

PaddleFlow/paddle-operator - Elastic Deep Learning Training based on Kubernetes by Leveraging EDL and Volcano.
kubeflow/mpi-operator - Kubernetes Operator for Allreduce-style Distributed Training.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
api/v1alpha1		api/v1alpha1
config		config
controllers		controllers
deploy/v1alpha1		deploy/v1alpha1
docs		docs
examples		examples
hack		hack
kubectl-download		kubectl-download
python/dglrun		python/dglrun
watcher-loop		watcher-loop
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
PROJECT		PROJECT
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DGL Operator

🛠Prerequisites

🚀Installation

🔬Creating a DGL Job

💭 Reference

About

Releases

Packages

Languages

License

Qihoo360/dgl-operator

Folders and files

Latest commit

History

Repository files navigation

DGL Operator

🛠Prerequisites

🚀Installation

🔬Creating a DGL Job

💭 Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages