-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate Travis or move CI away #652
Comments
I can try a port over to GitHub actions; it is supposed to be pretty painless. I think we'll loose the ability to target specific hardware though. GitLab CI has a minutes cap, and we use a lot of minutes. |
That is for cloud (and there's a process to request more); no limit on what we run on our own hardware. We could run Docker+QEMU to simulate architectures for which native hardware is hard to come by. We could also consider paying for Travis time, but we should probably migrate that which is easy to migrate away. We could use Azure Pipelines for macOS, Windows, and containerized Linux, but it's all x86-64. |
Not sure if it costs more/less or about the same what Travis is going to charge, but I know in CliMA they use https://buildkite.com/pricing |
Maybe I'm missing something, but how is this functionally different from GitLab-CI? Both have an open source runner that we would install on our (on-premise or cloud) hardware. |
I'm not sure how much it differs, since I am not familiar with the functionalities that GitLab-CI offers either. I suppose that they used it because they could easily set up the runner for GPUs on the uni's cluster. |
I played around with GitHub actions and it's pretty easy to set up. Perhaps we do something like this:
|
Maybe @simonbyrne can answer how buildkite is different from GitLab-CI? |
We went with Buildkite as we were able to get it to play nice with our cluster. Basically we have a cron job on the cluster which polls the Buildkite API to check if there are new jobs (it is behind a firewall, so we can't use webhooks). When there are new jobs we create a corresponding Slurm job for each (with options to enable different jobs to request specific # of tasks / gpus): we launch We use Bors to handle our merging, and the Buildkite jobs are only triggered when you request a merge (this prevents random people from opening a PR and getting access to our cluster). Overall it works pretty well, scales nicely (we regularly have 100 or so agents running without problems) and is free for open source projects. Our scripts to make this work are here: https://github.com/CliMA/slurm-buildkite. They are somewhat specific to our use case, but I'm happy to answer questions if you wanted to adapt it. We looked into self-hosted GitHub Actions, but couldn't figure out a way to make sure a specific runner would run a specific job (the relevant issues we were stuck on are actions/runner#510 and actions/runner#620). Additionally, scaling runners looks cumbersome (you have to keep registering and unregistering runners, which we don't have to do with buildkite: you can just start a new agent and it adds it to the pool). I only quickly looked into Gitlab CI: from what I could tell it has the same problems as GitHub Actions (but may be wrong). |
Thanks, Simon. We use GitLab-CI for PETSc and have about 60 configurations that run (across various machines) as part of each pipeline. GitLab has "merge trains", which is somehow similar to Bors (but a native UI feature). ECP has GitLab-CI running via Slurm at DOE facilities. I could track down the scripts, but it's done using the custom executor (after an attempt to MR a more HPC-specific executor prior to |
@jakebolewski did look into the ECP GitLab CI + Slurm integration, but I think in the end we decided it would it would require significant effort on behalf of the cluster admins, whereas we could run the buildkite agent under existing user permissions. If it had already been set up on our cluster I imagine we would have used it. |
@jedbrown, any objection to moving our libCEED only tests on Linux, OSX, the different hardware, Python, and Julia testing to GitHub actions for now? We could easily move our OCCA and LIBXSMM integration testing to Noether. Then we'd only have to make a choice about where to do the MFEM and Nek5000 example tests. I can fiddle with this on the side as I work tomorrow. |
That sounds good. We can put MFEM and Nek5000 on Noether. Best would be to keep the commits pinned as you've done with caching in Travis. |
The new pricing model has limited credit for open source so we may either need to start paying or move elsewhere (presumably GitHub Actions and/or GitLab-CI).
https://blog.travis-ci.com/2020-11-02-travis-ci-new-billing
The text was updated successfully, but these errors were encountered: