Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc] (WIP) Automatic release pipeline #1861

Closed
wants to merge 5 commits into from
Closed

[Misc] (WIP) Automatic release pipeline #1861

wants to merge 5 commits into from

Conversation

rexwangcc
Copy link
Collaborator

@rexwangcc rexwangcc commented Sep 11, 2020

Related issue = #1674

[Click here for the format server]


1. The Current Release Process

image

Potential Problems:

  • The whole process is manual and requires a substantial amount of developer's focus and effort.
  • The process is not transactional, failures at any step require developer interfere and potential rollback operations. This can confuse developers and is not scalable as it limits the release process to a small set of experienced core developers.
  • The process takes a wide span of time, and can be affected by other git operations happening during the process.
  • The release process bounces between different CI/CD providers and hosts (namely Jenkins, Github [Actions], readthedocs and PyPI) thus becomes error prone.

2. Implementation Plan

The actual release automation shall be implemented iteratively, so the migration is more smooth.

  1. Implement a CLI system that
    • Triggers the build (in Jenkins?)
    • Updates the version in CMakeList.txt
    • Runs CMake to re-generate docs AS @yuanming-hu suggested, a more portable way (so the runner does not have to setup CMAKE in order to run cmake .) is to port the logic that reads CMakeLists.txt to docs/conf.py so it can generate and dump the version file on its own. Implementing it right now. (cross-referencing the functions is tricky given the unorganized nature of misc directory. Since there're some refactor plans for v0.8.0, I'm simply copy-pasting duplicate code between conf.py and misc/ for now)
    • Generates the changelog content automatically.
    • Creates the release PR and commits automatically.
    • Waits for developers’ merge OR waits for the builds on PR to pass and merge (this could be done via https://github.com/pascalgn/automerge-action)
    • Automatic way to merge the PR
    • Bumps the stable branch
    • Cuts off a new release
    • Publishes a new package on PyPI
  2. The “release” CLI command is called by a cron job periodically, for instance, twice a week.
  3. Github Action that can be triggered manually and developers can choose from (major, minor, patch)
  4. Notifies the developers when things have gone wrong through emails or other methods.
  5. Turn on release automation.

3. Open Questions

  1. How to deal with the situation when some PRs get merged to master while the release workflow starts but not yet finished?
    • This cli uses commit hash whenever possible so this race condition won't happen.
  2. How to distinguish between major, minor and bug fixes?
    • (weekly) Cron Job always makes patch releases, minor and major can only be triggered manually.
  3. Semantic versioning is complicated, should we really automate it?
  4. Do we delete release branches after merge?
  5. Do we consider release candidates (e.g. vx.y.z-rc1) in addition to vMAJOR.MINOR.PATCH?
  6. (unrelated) I noticed that misc/ is becoming more and more like a trash bin, and barely maintained/organized, would it be more useful to refactor it to some extent, at least some code in misc can be re-used cross modules?

Alternatives

  1. Github action: https://github.com/marketplace/actions/create-pull-request
    • This action does not provide a way to dynamically render the content of the release PR, instead only takes static strings. Let alone we need to run CMAKE before making the PR.

@codecov
Copy link

codecov bot commented Sep 11, 2020

Codecov Report

Merging #1861 (d27e0e5) into master (c1df404) will increase coverage by 0.90%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1861      +/-   ##
==========================================
+ Coverage   43.23%   44.14%   +0.90%     
==========================================
  Files          44       44              
  Lines        6314     6121     -193     
  Branches     1092     1092              
==========================================
- Hits         2730     2702      -28     
+ Misses       3413     3250     -163     
+ Partials      171      169       -2     
Impacted Files Coverage Δ
python/taichi/lang/ast_checker.py 70.58% <0.00%> (-1.64%) ⬇️
python/taichi/testing.py 75.00% <0.00%> (-0.72%) ⬇️
python/taichi/lang/linalg.py 89.33% <0.00%> (-0.67%) ⬇️
python/taichi/lang/meta.py 62.31% <0.00%> (-0.54%) ⬇️
python/taichi/lang/__init__.py 41.94% <0.00%> (-0.51%) ⬇️
python/taichi/misc/util.py 17.48% <0.00%> (-0.26%) ⬇️
python/taichi/main.py 22.95% <0.00%> (-0.04%) ⬇️
python/taichi/misc/task.py 0.00% <0.00%> (ø)
python/taichi/lang/shell.py 0.00% <0.00%> (ø)
python/taichi/tools/patterns.py 0.00% <0.00%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c1df404...d27e0e5. Read the comment docs.

Copy link
Collaborator

@archibate archibate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 concern, if one of these stages fails, will the script terminate properly and allow human to take care of its rest steps manually?

misc/ci_release_pipeline.py Show resolved Hide resolved
@yuanming-hu
Copy link
Member

Thank you so much! This will save a lot of manual work (and errors made by a human (or me)).

  • How to deal with the situation when some PRs get merged to master while the release workflow starts but not yet finished?

    • This cli uses commit hash whenever possible so this race condition won't happen.

I need to figure out how to let Jenkins build a specific commit. Will update you tomorrow.

  • How to distinguish between major, minor and bug fixes?

    • (weekly) Cron Job always makes patch releases, minor and major can only be triggered manually.
  • Semantic versioning is complicated, should we really automate it?

I think we can rely on the script for patch version bumps. Minor/Major version bumps will not happen more than once per 3 months, so it's fine to do that manually.

  • Do we delete release branches after merge?

That sounds like a good plus, but not a must-have.

  • Do we consider release candidates (e.g. vx.y.z-rc1) in addition to vMAJOR.MINOR.PATCH?

In the foreseeable future, we can simply release a new patch version in that case :-)

  • (unrelated) I noticed that misc/ is becoming more and more like a trash bin, and barely maintained/organized, would it be more useful to refactor it to some extent, at least some code in misc can be re-used cross modules?

Right, now that folder is really a recycling center. Let's clean it up in v0.8 roadmap :-)

@rexwangcc
Copy link
Collaborator Author

rexwangcc commented Sep 11, 2020

1 concern, if one of these stages fails, will the script terminate properly and allow human to take care of its rest steps manually?

Thanks, that is a really valid concern. I assume the cli will be invoked on volatile Github Actions/Jenkins worker VMs.

So the short answer is that steps 1-4 will likely be executed on a cloned (or forked) repo on the VM, when any of them failed, if the developers are notified, they could jump in, fix issues and re-run the job without affecting the repo at all. (they might have to delete the branch that was pushed to remote origin in step 2) If the failure falls into steps 5-8, since they have side-effects to this remote repo, the developers will need to take over from where the ball gets dropped and continue the release.

Ideally, the whole release pipeline should be a single transaction and rolls back if any of the steps fail, but I realized it's a bit hard (time-wise and potentially over-engineering) to implement as the first iteration, so I ended up implementing it as a "minimal automation that requires human beings to investigate if it goes wrong" pipeline.

Also this is still WIP so suggestions are welcome! :octocat:

@rexwangcc
Copy link
Collaborator Author

rexwangcc commented Sep 11, 2020

Thank you so much! This will save a lot of manual work (and errors made by a human (or me)).

  • How to deal with the situation when some PRs get merged to master while the release workflow starts but not yet finished?

    • This cli uses commit hash whenever possible so this race condition won't happen.

I need to figure out how to let Jenkins build a specific commit. Will update you tomorrow.

Thanks! Sorry, the wording might be a bit confusing, what i meant is that, if this cli is going to be invoked by Github Actions, the moment the action gets triggered, the commit hash is locked and will be used through this release so we won't be bitten by the race condition. If we are going to run this from Jenkins, yes, we need to look into how to let Jenkins build a specific commit.

@yuanming-hu
Copy link
Member

yuanming-hu commented Sep 12, 2020

Thank you so much for everything here! Based on what you have explored, I have a few ideas to simplify the workflow:

Replace Jenkins with zhen

We can use zhen.csail.mit.edu (Ubuntu 18.04, 2x NVIDIA GTX 1080 GPU, 10 CPU cores, 32 GB memory), a machine at our lab that nobody uses, exclusively for building Taichi on Linux with CUDA.

I'll try to deploy a self-hosted GitHub runner on zhen, and then we can abandon Jenkins :-) The benefit is that we no longer need to interact with Jenkins anymore. As long as the pre-commit GitHub actions passes, we are safe to release. This also allows us to test the CUDA backend automatically.

(zhen can also be used for benchmarking Taichi, but that's another story @xumingkuan . Later we may also want to move the format server from kun to zhen so that only one machine is needed for everything in Taichi.)

Update: zhen is up and running as the CUDA buildbot (#1863).

Simplify doc version generation

It seems that instead of letting CMake generate a docs/version file, we can ask docs/conf.py to read the version numbers from CMakeLists.txt, as you have done in this PR using regex. This will save us from invoking cmake ...

I just now confirmed that readthedocs clones the whole repo, instead of just fetching the docs folder. (https://readthedocs.org/projects/taichi/builds/11867416/), so CMakeLists.txt in the root folder should probably be readable.

Release Linux builds on Travis

We used to use Travis for OS X builds on the release commits and Jenkins for Linux. Since now we can confirm Linux + CUDA works within the PR (we will use GtiHub actions on zhen instead of Jenkins so this will be possible), we can use Travis for both Linux and OS X.

Uploading to PyPI needs a password, and Travis seems to have good mechanisms to protect the password (not 100% sure how). With self-hosted GitHub action runners, protecting the password seems to be tricky.

@CLAassistant
Copy link

CLAassistant commented Apr 12, 2021

CLA assistant check
All committers have signed the CLA.

@feisuzhu
Copy link
Contributor

Closing ancient PR.

@feisuzhu feisuzhu closed this Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants