Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update training operator release process #1347

Merged
merged 1 commit into from
Aug 13, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Changelog

## [v1.1.1](https://github.com/kubeflow/tf-operator/tree/v1.1.1) (2021-08-03)
## [v1.2.0](https://github.com/kubeflow/tf-operator/tree/v1.2.0) (2021-08-03)

[Full Changelog](https://github.com/kubeflow/tf-operator/compare/v1.1.0...v1.1.1)
[Full Changelog](https://github.com/kubeflow/tf-operator/compare/v1.1.0...v1.2.0)

## Features

Expand Down
43 changes: 43 additions & 0 deletions docs/release/release.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
from github import Github
import re


class ChangelogGenerator:
def __init__(self, github_repo):
# Replace <your_github_token> with your Github Token
self._github = Github('<your_github_token>')
self._github_repo = self._github.get_repo(github_repo)

def generate(self, pr_id):
pr = self._github_repo.get_pull(pr_id)

return "{title} ([#{pr_id}]({pr_link}), @{user})".format(
title=pr.title,
pr_id=pr_id,
pr_link=pr.html_url,
user=pr.user.login
)


# generated by `git log <oldTag>..HEAD --oneline`
payload = '''
6f1e96c4 Update container image for v1.1.1 (#1328)
47a74b73 add a specific version of tensorflow_datasets (#1305)
e3061132 Remove vendor folder (#1288)
eb362bd8 Fix invalid pointer when tfjob is deleted (#1285)
0c41b273 fix get_logs pod_names type and iteration blocking (#1280)
af5bdd58 Add job namespace to `tf_operator_jobs_*` counters (#1283)
6fd9489e fix custom_api.delete_namespaced_custom_object args (#1281)
c095f7a9 feat: upgrade kubeflow common and volcano version (#1276)
13b17b0e Use remote Kustomize build option in standalone installation instructions (#1266)
faf34868 fix: Remove the dup comment tag (#1274)
9a297876 add podgroups rule in cluster-role.yaml (#1272)
58c9bc4a Fix: the "follow" of TFJobClient.get_logs (#1254)
3d9e7c8a Add task type annotation for pods when EnableGangScheduling is true. (#1268)
8d179f70 Fix: Remove Github CD workflow (#1263)
'''

g = ChangelogGenerator("kubeflow/tf-operator")
for pr_match in re.finditer(r"#(\d+)", payload):
pr_id = int(pr_match.group(1))
print("* {}".format(g.generate(pr_id)))
156 changes: 103 additions & 53 deletions docs/release/releasing.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,103 @@
# Releasing the TFJob operator

Permissions

* You need to be a member of [email protected] to have access to the GCP
resources used for releasing.

* You need write permissions on the repository to create a release branch.


Look at the [postsubmit dashboard](https://k8s-testgrid.appspot.com/sig-big-data#kubeflow-tf-operator-postsubmit)
to find the latest green postsubmit.


Use the GitHub UI to cut a release branch
* Name the release branch v{MAJOR}.${MINOR}-branch

Checkout the release branch

We build TFJob operator by running the E2E test workflow.

Look at the [postsubmit dashboard](https://k8s-testgrid.appspot.com/sig-big-data#kubeflow-tf-operator-postsubmit)
to find the latest green postsubmit.

Check out that commit (in this example, we'll use `6214e560`):

Run the E2E test workflow using our release cluster

[kubeflow/testing#42](https://github.com/kubeflow/testing/issues/42) will simplify this.

```
submit_release_job.sh ${COMMIT}
```

You can monitor the workflow using the Argo UI. For our release cluster, we don't expose the Argo UI publicly, so you'll need to connect via kubectl port-forward:

```
kubectl -n kubeflow-releasing port-forward `kubectl -n kubeflow-releasing get pods --selector=app=argo-ui -o jsonpath='{.items[0].metadata.name}'` 8080:8001
```

[kubeflow/testing#43](https://github.com/kubeflow/testing/issues/43) is tracking setup of IAP to make this easier.

Make sure the Argo workflow completes successfully.
Check the junit files to make sure there were no actual test failures.
The junit files will be in [gs://kubeflow-releasing-artifacts](https://console.cloud.google.com/storage/browser/kubeflow-releasing-artifacts/logs/kubeflow_tf-operator/tf-operator-release/?project=kubeflow-releasing).
* The build artifacts will be in a directory named after the build number

If the tests pass use the GitHub UI to create a release tagged v{MAJOR}-{MINOR}-{PATCH}
* If its an RC append -RC.N
* In the notes create a link to the Docker image in GCR
* For the label use the `sha256` and not the label so it is immutable.

To release new ksonnet configs with the image following [kubeflow/kubeflow/releasing.md](https://github.com/kubeflow/kubeflow/blob/master/releasing.md).
# Releasing the training operator

## Prerequisite

1. Permissions
- You need to be a member of [email protected].
- You need write permissions on the repository to create a release tag/branch.

2. Prepare your Github Token

3. Install Github python dependencies to generate changlog
```
pip install PyGithub
```

### Release Process

1. Make sure the last commit you want to release past `kubeflow-tf-operator-postsubmit` testing.

1. Check out that commit (in this example, we'll use `6214e560`).

1. Depends on what version you want to release,
- Major or Minor version - Use the GitHub UI to cut a release branch and name the release branch `v{MAJOR}.${MINOR}-branch`
- Patch version - You don't need to cut release branch.

1. Create a new PR against the release branch to change container image in manifest to point to that commit hash.

```
images:
- name: kubeflow/training-operator
newName: kubeflow/training-operator
newTag: ${commit_hash}
```

> note: post submit job will always build a new image using the `PULL_BASE_HASH` as image tag.

1. Create a tag and push tag to upstream.

```
git tag v1.2.0
git push upstream v1.2.0
```

1. Run following code and fetch online git commits from last release (v1.1.0) to current release (v1.2.0).

```
git log v1.1.0..v1.2.0 --oneline
```

1. Copy above commit history to `release.py` and replace `<your_github_token>` with your Github token.
Run this python scripts to generate changelogs.

```
from github import Github
import re


class ChangelogGenerator:
def __init__(self, github_repo):
# Replace <your_github_token> with your Github Token
self._github = Github('<your_github_token>')
self._github_repo = self._github.get_repo(github_repo)

def generate(self, pr_id):
pr = self._github_repo.get_pull(pr_id)

return "{title} ([#{pr_id}]({pr_link}), @{user})".format(
title=pr.title,
pr_id=pr_id,
pr_link=pr.html_url,
user=pr.user.login
)


# generated by `git log <oldTag>..<newTag> --oneline`
payload = '''
6f1e96c4 Update container image for v1.2.0 (#1328)
47a74b73 add a specific version of tensorflow_datasets (#1305)
e3061132 Remove vendor folder (#1288)
eb362bd8 Fix invalid pointer when tfjob is deleted (#1285)
0c41b273 fix get_logs pod_names type and iteration blocking (#1280)
af5bdd58 Add job namespace to `tf_operator_jobs_*` counters (#1283)
6fd9489e fix custom_api.delete_namespaced_custom_object args (#1281)
c095f7a9 feat: upgrade kubeflow common and volcano version (#1276)
13b17b0e Use remote Kustomize build option in standalone installation instructions (#1266)
faf34868 fix: Remove the dup comment tag (#1274)
9a297876 add podgroups rule in cluster-role.yaml (#1272)
58c9bc4a Fix: the "follow" of TFJobClient.get_logs (#1254)
3d9e7c8a Add task type annotation for pods when EnableGangScheduling is true. (#1268)
8d179f70 Fix: Remove Github CD workflow (#1263)
'''

g = ChangelogGenerator("kubeflow/tf-operator")
for pr_match in re.finditer(r"#(\d+)", payload):
pr_id = int(pr_match.group(1))
print("* {}".format(g.generate(pr_id)))
```

1. Cut release from tags and copy results from last step. You can group commits into `Features`, `Bugs` etc.
See example [v1.2.0 release](https://github.com/kubeflow/tf-operator/releases/tag/v1.2.0)

1. Send a PR to update [CHANGELOG.md](../../CHANGELOG.md)