Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update #2

Merged
merged 68 commits into from
Jun 28, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
e176964
Lint fix for e2e package except dot import lint failures
nikita15p Jun 18, 2019
33655a1
UT cases for Job Package
thandayuthapani Jun 18, 2019
10b9069
Lint fix for admission package
nikita15p Jun 18, 2019
d63c817
Update README
thandayuthapani Jun 18, 2019
c17f586
Fix review comments
thandayuthapani Jun 18, 2019
c05f3a6
Adding Events for Action
Jun 18, 2019
85f2798
Cleanup codes
TommyLike Jun 19, 2019
b1b6023
Use submodule to integrate helm chart
TommyLike Jun 19, 2019
1a7fd0c
Merge pull request #240 from TommyLike/feature/sub_module
k82cn Jun 20, 2019
ecc5d69
Merge branch 'master' into bug/cleanup
k82cn Jun 20, 2019
2f237b1
Merge branch 'master' into events
k82cn Jun 20, 2019
610ab9d
Merge branch 'master' into utjob
k82cn Jun 20, 2019
214c579
Merge pull request #239 from TommyLike/bug/cleanup
volcano-sh-bot Jun 20, 2019
43fa73d
Merge branch 'master' into readme
k82cn Jun 20, 2019
c3b3a73
Merge branch 'master' into events
k82cn Jun 21, 2019
dcf8043
Make first job keep running
TommyLike Jun 21, 2019
b6f2952
Merge pull request #236 from thandayuthapani/readme
volcano-sh-bot Jun 21, 2019
a2f3082
Merge branch 'master' into bug/fix_statement_error
k82cn Jun 21, 2019
358c6e3
Merge pull request #243 from TommyLike/bug/fix_statement_error
volcano-sh-bot Jun 21, 2019
f84243a
Merge branch 'master' into utjob
k82cn Jun 21, 2019
2797241
Merge pull request #235 from thandayuthapani/utjob
volcano-sh-bot Jun 21, 2019
16cc271
Merge branch 'master' into events
k82cn Jun 21, 2019
9a8392b
Add UT cases for pkg/controllers/job/state package
thandayuthapani Jun 21, 2019
f0d25aa
Merge branch 'master' into e2elintfix
nikita15p Jun 21, 2019
60e31af
Merge pull request #245 from thandayuthapani/utjob
volcano-sh-bot Jun 21, 2019
657a7df
Merge branch 'master' into events
k82cn Jun 21, 2019
fe03590
Merge pull request #238 from Rajadeepan/events
volcano-sh-bot Jun 24, 2019
627fd94
Merge branch 'master' into e2elintfix
k82cn Jun 24, 2019
9a8fe35
UT cases for garbagecollector package
thandayuthapani Jun 24, 2019
3671c89
Lint fix for apis, cache, job and state package of controller
nikita15p Jun 24, 2019
e218ddd
Merge pull request #248 from nikita15p/ControllerGoLintFix
volcano-sh-bot Jun 24, 2019
bbacb11
Merge branch 'master' into e2elintfix
nikita15p Jun 24, 2019
03f4f18
Adding UT test cases to queue controller
Jun 24, 2019
b021dcb
Merge branch 'master' into utjob
thandayuthapani Jun 24, 2019
d87d735
UT cases for pkg/controllers/cache package
thandayuthapani Jun 24, 2019
462b1d7
Address Review Comments
thandayuthapani Jun 24, 2019
b092211
Merge pull request #249 from Rajadeepan/utqueue
volcano-sh-bot Jun 24, 2019
8aaca49
Merge branch 'master' into utjob
asifdxtreme Jun 24, 2019
a098a25
Merge branch 'master' into ut
asifdxtreme Jun 24, 2019
371e619
Merge branch 'master' into e2elintfix
asifdxtreme Jun 24, 2019
9ddbc90
Merge pull request #250 from thandayuthapani/ut
volcano-sh-bot Jun 24, 2019
719037c
Merge branch 'master' into utjob
thandayuthapani Jun 24, 2019
9b40520
Adding UT test cases to apis package
Jun 24, 2019
d3fcdc7
Merge pull request #251 from Rajadeepan/apisut
volcano-sh-bot Jun 25, 2019
94e23f8
Merge branch 'master' into utjob
k82cn Jun 25, 2019
d7034f2
Merge pull request #247 from thandayuthapani/utjob
volcano-sh-bot Jun 25, 2019
b065c76
Merge branch 'master' into e2elintfix
nikita15p Jun 25, 2019
0c44e1a
Merge pull request #234 from nikita15p/e2elintfix
volcano-sh-bot Jun 25, 2019
b10beba
Improving the code coverage for admission controller pkg
shivramsrivastava Jun 26, 2019
e318bd0
fix vk-controller cache
wangyuqing4 May 30, 2019
56fa5f2
fix pc panic
wangyuqing4 Jun 3, 2019
78b194c
optimize resyncTask
wangyuqing4 Jun 12, 2019
cff2aee
Adding UT for cli job package
Jun 26, 2019
573d5a5
Merge branch 'master' into errtask
k82cn Jun 27, 2019
1670bbd
Merge pull request #256 from lminzhw/errtask
volcano-sh-bot Jun 27, 2019
c2a6ce0
Merge branch 'master' into admin_ut_improved
asifdxtreme Jun 27, 2019
06762ca
Merge branch 'master' into utclijob
asifdxtreme Jun 27, 2019
f7b4c8e
KubeCon 2019 China Demo.
Jun 27, 2019
4d8751b
Merge branch 'master' into kubecon-2019-china-demo
k82cn Jun 27, 2019
02bef05
Merge pull request #257 from Rajadeepan/utclijob
volcano-sh-bot Jun 27, 2019
ab86c13
Merge pull request #259 from k82cn/kubecon-2019-china-demo
volcano-sh-bot Jun 27, 2019
deb83f0
Merge pull request #252 from shivramsrivastava/admin_ut_improved
volcano-sh-bot Jun 27, 2019
1e5f432
fix initialize job status
wangyuqing4 Jun 3, 2019
a5e70b8
fix Pending job minA
wangyuqing4 Jun 5, 2019
6d4a3f7
fix golint
lminzhw Jun 28, 2019
525aa30
fix some words in README doc
soolaugust Jun 28, 2019
9efa246
Merge pull request #262 from soolaugust/patch-1
volcano-sh-bot Jun 28, 2019
2ee41aa
Merge pull request #255 from lminzhw/job_status
volcano-sh-bot Jun 28, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 0 additions & 30 deletions .cid/volcano.yml

This file was deleted.

3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "installer/helm"]
path = installer/helm
url = https://github.com/volcano-sh/charts.git
32 changes: 25 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Volcano is a batch system built on Kubernetes. It provides a suite of mechanisms
Kubernetes that are commonly required by many classes of batch & elastic workload including:

1. machine learning/deep learning,
2. bioinformatics/genomics, and
2. bioinformatics/genomics
3. other "big data" applications.

These types of applications typically run on generalized domain
Expand Down Expand Up @@ -53,16 +53,34 @@ the open source community.

## Quick Start Guide

The easiest way to deploy Volcano is to use the Helm chart.
The easiest way to deploy Volcano is to use the Helm chart. Volcano can be deployed by cloning code and also by adding helm repo.

## Using Volcano Helm Repo

Add helm repo using following command,

```
helm repo add volcano https://volcano-sh.github.io/charts
```

Install Volcano using following command,

```
helm install volcano/volcano --namespace <namespace> --name <specified-name>
e.g :
helm install volcano/volcano --namespace volcano-trial --name volcano-trial
```

## Cloning Code
### Pre-requisites

First of all, clone the repo to your local path:

```
# mkdir -p $GOPATH/src/volcano.sh/
# cd $GOPATH/src/volcano.sh/
# git clone https://github.com/volcano-sh/volcano.git
# git clone --recursive https://github.com/volcano-sh/volcano.git
```

### 1. Volcano Image
Expand Down Expand Up @@ -92,14 +110,14 @@ try command ```kind load docker-image <image-name>:<tag> ``` for each of the ima
Secondly, install helm chart.

```
helm install installer/chart --namespace <namespace> --name <specified-name>
helm install installer/helm/chart/volcano --namespace <namespace> --name <specified-name>
For eg :
helm install installer/chart --namespace volcano-trial --name volcano-trial
e.g :
helm install installer/helm/chart/volcano --namespace volcano-trial --name volcano-trial
```

To Verify your installation run the following commands:
To verify your installation run the following commands:

```
#1. Verify the Running Pods
Expand Down
26 changes: 26 additions & 0 deletions docs/samples/kubecon-2019-china/drf/nginx-1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: nginx-1
labels:
app: nginx-1
spec:
# modify replicas according to your case
replicas: 8
selector:
matchLabels:
app: nginx-1
template:
metadata:
labels:
app: nginx-1
spec:
schedulerName: volcano
containers:
- name: nginx-1
image: nginx
resources:
requests:
cpu: "1000m"
limits:
cpu: "1000m"
26 changes: 26 additions & 0 deletions docs/samples/kubecon-2019-china/drf/nginx-2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: nginx-2
labels:
app: nginx-2
spec:
# modify replicas according to your case
replicas: 8
selector:
matchLabels:
app: nginx-2
template:
metadata:
labels:
app: nginx-2
spec:
schedulerName: volcano
containers:
- name: nginx-2
image: nginx
resources:
requests:
cpu: "1000m"
limits:
cpu: "1000m"
25 changes: 25 additions & 0 deletions docs/samples/kubecon-2019-china/drf/nginx.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: nginx
labels:
app: nginx
spec:
# modify replicas according to your case
replicas: 8
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
resources:
requests:
cpu: "1000m"
limits:
cpu: "1000m"
77 changes: 77 additions & 0 deletions docs/samples/kubecon-2019-china/gang/mpi-example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: lm-mpi-job
labels:
# 根据业务需要设置作业类型
"volcano.sh/job-type": "MPI"
spec:
# 设置最小需要的服务 (小于总replicas数)
minAvailable: 4
schedulerName: volcano
plugins:
# 提供 ssh 免密认证
ssh: []
# 提供运行作业所需要的网络信息,hosts文件,headless service等
svc: []
# 如果有pod被 杀死,重启整个作业
policies:
- event: PodEvicted
action: RestartJob
tasks:
- replicas: 1
name: mpimaster
# 当 mpiexec 结束,认识整个mpi作业结束
policies:
- event: TaskCompleted
action: CompleteJob
template:
spec:
# Volcano 的信息会统一放到 /etc/volcano 目录下
containers:
- command:
- /bin/sh
- -c
- |
MPI_HOST=`cat /etc/volcano/mpiworker.host | tr "\n" ","`;
mkdir -p /var/run/sshd; /usr/sbin/sshd;
mpiexec --allow-run-as-root --host ${MPI_HOST} -np 3 mpi_hello_world;
image: swr.cn-north-1.myhuaweicloud.com/hwstaff_z00383385/example-mpi:0.0.1
name: mpimaster
ports:
- containerPort: 22
name: mpijob-port
workingDir: /home
resources:
requests:
cpu: "500m"
limits:
cpu: "500m"
restartPolicy: OnFailure
imagePullSecrets:
- name: default-secret
- replicas: 3
name: mpiworker
template:
spec:
containers:
- command:
- /bin/sh
- -c
- |
mkdir -p /var/run/sshd; /usr/sbin/sshd -D;
image: swr.cn-north-1.myhuaweicloud.com/hwstaff_z00383385/example-mpi:0.0.1
name: mpiworker
ports:
- containerPort: 22
name: mpijob-port
workingDir: /home
resources:
requests:
cpu: "1000m"
limits:
cpu: "1000m"
restartPolicy: OnFailure
imagePullSecrets:
- name: default-secret

25 changes: 25 additions & 0 deletions docs/samples/kubecon-2019-china/gang/nginx.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: nginx
labels:
app: nginx
spec:
# modify replicas according to your case
replicas: 6
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
resources:
requests:
cpu: "1000m"
limits:
cpu: "1000m"
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: lm-horovod-job
labels:
"volcano.sh/job-type": Horovod
spec:
minAvailable: 4
schedulerName: volcano
plugins:
ssh: []
svc: []
# 如果有pod被 杀死,重启整个作业
policies:
- event: PodEvicted
action: RestartJob
tasks:
- replicas: 1
name: master
policies:
- event: TaskCompleted
action: CompleteJob
template:
spec:
containers:
- command:
- /bin/sh
- -c
- |
WORKER_HOST=`cat /etc/volcano/worker.host | tr "\n" ","`;
mkdir -p /var/run/sshd; /usr/sbin/sshd;
mpiexec --allow-run-as-root --host ${WORKER_HOST} -np 3 python tensorflow_mnist_lm.py;
image: swr.cn-north-1.myhuaweicloud.com/hwstaff_z00383385/horovod-tf-mnist:0.5
name: master
ports:
- containerPort: 22
name: job-port
resources:
requests:
cpu: "500m"
memory: "1024Mi"
limits:
cpu: "500m"
memory: "1024Mi"
restartPolicy: OnFailure
imagePullSecrets:
- name: default-secret
- replicas: 3
name: worker
template:
spec:
containers:
- command:
- /bin/sh
- -c
- |
mkdir -p /var/run/sshd; /usr/sbin/sshd -D;
image: swr.cn-north-1.myhuaweicloud.com/hwstaff_z00383385/horovod-tf-mnist:0.5
name: worker
ports:
- containerPort: 22
name: job-port
resources:
requests:
cpu: "1000m"
memory: "2048Mi"
limits:
cpu: "1000m"
memory: "2048Mi"
restartPolicy: OnFailure
imagePullSecrets:
- name: default-secret
---
Loading