Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.gitlab-ci.yml #134

Open
wants to merge 133 commits into
base: master
Choose a base branch
from
Open

.gitlab-ci.yml #134

wants to merge 133 commits into from

Conversation

dHannasch
Copy link
Collaborator

@dHannasch dHannasch commented Sep 3, 2019

This sets up a created repo for GitLab Continuous Integration, if it is hosted on GitLab.
The tests get run, and the generated documentation is put up on GitLab Pages.

This unfortunately adds a new user input ci_https_proxy in order to make it all work. On the other hand, there might be ways to use that for other, non-GitLab CI too; I don't know enough about CI to know.

This also adds a new user input docs_require_package. It's not ideal since the user might not know upfront whether they'll want to run their package's code in building the documentation, but we want to avoid the expense of installing their package when building the documentation if it can be avoided.

- virtualenv --version
- pip3 --version
- uname --all
# Some distributions do not include the standard lsb_release command.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about lsb_release -a || true (like in .travis.yml).

@@ -0,0 +1,37 @@
image: alpine
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this would be some sort of compromise between your project's specific needs and what'd be general purpose.

Is there any reason you don't use an ubuntu image? Since that's what travis uses it would be a natural thing to use for users already used to travis.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing about the base image, in a more serious setup you'd use a custom image to avoid the overhead of installing tox and pip every time. Not sure what's the best way to tackle this. Maybe there's already a "fixed up and maintained" ubuntu image that could be used.

Copy link
Collaborator Author

@dHannasch dHannasch Sep 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know literally nothing about alpine, except that https://gitlab.com/pages/sphinx used it as the base image for whatever reason.

It looks like Alpine is used because it's smaller and makes installation (if needed) faster: https://thenewstack.io/alpine-linux-heart-docker/

apt-get update spends almost the same amount of time to update the package index cache as Alpine spends on performing the entire system install or upgrade.

You make a good point about efficiency...now that I look a little more, https://devopstutodoc.readthedocs.io/en/latest/hebergeurs/gitlab/gitlab.html uses an image that at least already has Python installed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing about the base image, in a more serious setup you'd use a custom image to avoid the overhead of installing tox and pip every time. Not sure what's the best way to tackle this. Maybe there's already a "fixed up and maintained" ubuntu image that could be used.

I have not been able to find any such thing, so ultimately I gave up and just made my own on DockerHub. It's very, very simple. https://hub.docker.com/r/dahanna/python.3.7-git-tox-alpine/dockerfile

I don't know why such a thing didn't already exist, it seems simple and obvious, but there it is.

- mv dist/docs/ public/
artifacts:
paths:
- public
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you might want to put other stuff in public? You could simply have paths: - dist?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation doesn't really make this clear, but for GitLab Pages the folder name must be "public"; otherwise you get weird errors. See e.g. https://gitlab.com/gitlab-org/gitlab-ce/issues/40686

Hmm. It might be worth putting a comment in .gitlab-ci.yml warning users "yes it sure does look like you could change this to anything you want, but you can't." I expect most people looking at .gitlab-ci.yml would immediately assume, just like you did, that they could change the path to be anything.

README.rst Outdated Show resolved Hide resolved
@dHannasch
Copy link
Collaborator Author

.gitlab-ci.yml can be seen in action at https://gitlab.com/dHannasch/python-nameless. As of this moment, the CI is failing on sphinx -b linkcheck, because the README has a link to the GitLab Pages documentation, which takes 30 minutes to deploy. I can't think of any way around that; I think maybe everyone on GitLab will just have to deal with failing CI for the first thirty minutes after cookiecutting a new repo. It doesn't always take the full 30 minutes; it just went through for python-nameless while I was writing this comment.

@ionelmc
Copy link
Owner

ionelmc commented Sep 23, 2019

So to move this a bit forward, I would like to discuss 2 things:

Who is this for, and what are the goals?

Should it test multiple pythons? If I host an opensource project on gitlab.com it should right?

Would this have any users except you?

Having a base image. I've used gitlab-ci in the past and we had a base image (https://hub.docker.com/r/ionelmc/docker-in-docker/dockerfile) - of course that had very different goals (it was a private docker-compose project) but we could have something like that.

There is a question of maintenance that immediately provokes the "why don't we use the official python image?". Well ... the official python image is still not good and unlikely to ever be. It took them 2 years of discussions to accept that PGOs are actually a good thing.

(I just noticed that you already have a python image, but I'd still like to discuss the official image thing tho - what to do if multiple pythons are needed?)

@dHannasch
Copy link
Collaborator Author

This is for anyone who uses GitLab CI, much like how the Travis option is for anyone who uses Travis CI.

Ultimately testing as many Pythons as possible would be ideal. (Of course, e.g. Travis doesn't test all Pythons, so I figure testing only some is a good place to start. I'm thinking to start with 3.7 and 3.6.) I haven't sorted out the details, but it's possible to use a different image for each job...

build:precise:
  image: precise:base
 <<: *build_definition

build:trusty:
  image: trusty:base
  <<: *build_definition

So I think that can be done. But sorting out how to handle the multiple images seems secondary to actually having some images, which at the moment, I don't. I mean, I have images, but just a handful that I built by hand.

I think we want Alpine images here, to save those precious build minutes over many builds. That's also why I'm thinking a separate image for each Python --- people can enable only what they need and thus download only the images they need. (We can already set tox to ignore Pythons that are not installed, so I don't think this will need to touch the tox.ini, it'll just run tests based on whichever Python is installed on the image.)

I sort of assumed that all the necessary images would just be sitting around on Docker Hub somewhere, but I have not found them yet.

A complicating factor is that, as you pointed out, we want to avoid the overhead of installing tox and git and such every time. I've made a base image with tox and git https://cloud.docker.com/repository/docker/dahanna/python.3.7-git-tox-alpine, but it would also be good to have images with a few specific libraries that are commonly used and painful to install, principally numpy and scipy. Once we start talking about a number of images greater than one, it seems like the source of those images shouldn't be "hey here's this image @dHannasch put on Docker Hub", but I haven't found anyone putting together anything more organized. It seems like https://hub.docker.com/_/python doesn't want to be in the business of including libraries, which leaves...well, some individual libraries maintain images with that one library preinstalled https://hub.docker.com/u/pythonpillow, but I haven't found anyone tackling the use-case of "I have a list of libraries I want which has length less than four but greater than one."

As you can see, right now it's a thing that works https://gitlab.com/dHannasch/python-nameless running tests and building docs using a single (Python 3.7) image. As for expanding the functionality to multiple images...lots of questions there.

@ionelmc
Copy link
Owner

ionelmc commented Sep 25, 2019

Hmmm interesting, I didn't knew that you can set individual images for jobs. But I think installing tox is a fair compromise and on parity with Travis. The python:X.Y-buster images inherit buildpack-deps so they already have git.

I guess the oficial images would be fine for now. The image size isn't an actual issue - layers would be cached - it's not like the ci worker would be downloading layers all day long. Alpine is not faster, as a matter of fact it can be slower in some scenarios: http://www.etalabs.net/compare_libcs.html

@dHannasch
Copy link
Collaborator Author

dHannasch commented Sep 25, 2019

I didn't knew that you can set individual images for jobs. But I think installing tox is a fair compromise and on parity with Travis.

I don't follow what tox has to do with images? I mean, obviously we want to install tox on any image, such as https://hub.docker.com/r/dahanna/python.3.7-git-tox-alpine/dockerfile. But installing tox doesn't install any Python versions.

I get No results for python:X.Y-buster on Docker Hub. I don't see any such thing on https://hub.docker.com/_/python. Can you link to that? That sounds like a Python(x,y) image? I don't know why I never thought of that...looking now I don't see a Python(x,y) image, but I do see an Anaconda image.

https://hub.docker.com/r/jcrist/alpine-conda
https://jcrist.github.io/conda-docker-tips.html
I wonder if using an Anaconda base image might be better.

...oh, or maybe you just meant e.g. python:3.7-buster. But now that I'm thinking about using a Python distribution, that might be the way to go.

The image size isn't an actual issue - layers would be cached

I don't follow...I mean, if everyone uses this cookiecutter (or at least, everyone uses a common set of Docker images for CI), then great, I'm all for that. Almost every random shared runner will have the right Docker image cached, so we don't have to worry about downloading it. I'm all for that. But to get there, we need something that works well for people now. It seems like the probability of any given shared runner having this particular Docker image cached when someone triggers a build is ~0. After the build it will presumably be cached, but then the next few hundred builds will be other people using other images, so by the time that runner gets allocated to someone else using this particular image, it'll have long been kicked out of the cache.

I mean...right? I don't know much about how runners work. But it seems like caching isn't going to save you unless and until you're one of the most popular images in the world --- the standard Ubuntu image, the standard Alpine image, the standard Python image.

Alpine is not faster, as a matter of fact it can be slower in some scenarios: http://www.etalabs.net/compare_libcs.html

I never thought that Alpine/musl ran C faster until you shared that link. But, um. That link you shared says it is faster. I'm having a really hard time parsing this sentence here.

...I mean, that link doesn't necessarily mean Alpine/musl really is faster. The author admits "I have tried to be fair and objective, but as I am the author of musl, that may have influenced my choice of which aspects to compare." But the page you linked to definitely appears to be arguing that it is faster. It does say Alpine/musl is slower for big allocation-and-free and regex compiles. Are those the "some scenarios" you're talking about? I'm very confused. ...musl is the Alpine one, right? At least, when I search for instructions for how to build Alpine images, they say the reason the builds look different is because Alpine uses musl.

More broadly, I hadn't considered at all whether any given operating system might actually be faster at running Python itself. Looking now, I don't see any clear information on that. The only thing I do know makes a difference is image size (for downloading). Differences in speed of running Python might potentially make a bigger difference, but I don't see where to get more information on that.

As for image size...I mean, observationally it seems to make a big difference. I've been known to use 2GB images such as https://hub.docker.com/r/ufoym/deepo/, and you can see the difference with the naked eye. Obviously some libraries have lots of long-running tests, so that the time to download the image becomes pretty irrelevant. But those libraries aren't harmed by a smaller image, and newly-created libraries will usually be small and with few tests.

(The other big slowdown is creating tox environments, obviously, but I think that can be managed with judicious use of --sitepackages if we have Docker images with key libraries preinstalled, plus maybe we can build a couple different environments in parallel, or something.)

I don't have any particular attachment to Alpine specifically (except that apparently http://www.etalabs.net/compare_libcs.html says it's faster) --- I just started with that because that's what the standard stuff like https://gitlab.com/pages/sphinx used, and people talking about accelerating builds say things like

Whenever possible, use a tiny Linux distribution for images that run your CI jobs. Alpine Linux is probably the most popular option, but there are others.

...without saying what those "others" are.

I just learned that Alpine is maintained (?) by Docker (or at least, the creator works for Docker, it's not totally clear), which might explain why it gets more attention.
https://thenewstack.io/alpine-linux-heart-docker/

(The other thing people talk about it preinstalling libraries, like you mentioned for git and tox. For things like scipy, I'm not sure if there's a nice way to do it, but if nothing else, if we have a standard set of images with a standard naming convention, we can use a script like your bootstrap.py to inspect the dependencies and rewrite .gitlab-ci.yml to use an image that has a couple of libraries the package depends on.)

@ionelmc
Copy link
Owner

ionelmc commented Sep 28, 2019

Well, musl is faster on few things, glibc is faster on most other things. But it eats more memory. Probably insignificant differences on the python runtime compared to the differences that PGO's make - so it looks they finally added them in the official python images.

So regarding Ubuntu as the default choice:

  • it allows ppl move projects from github/travis to gitlab - an Ubuntu based image obviously helps a lot with bridging the differences.
  • it has build tools preinstalled and a wide array of available libraries - this means you can practically install any dependency you want (not so easy with the much leaner Alpine).
  • you'd think that caching won't be effective with the bulky Ubuntu image but perhaps is reasonable to expect that the CI worker has a fast internet connection.

And yeah, I meant 3.7-buster, 2.7-buster etc (X.Y as variables).

@dHannasch
Copy link
Collaborator Author

it allows ppl move projects from github/travis to gitlab - an Ubuntu based image obviously helps a lot with bridging the differences.

Eh? What exactly is the use-case that you're picturing here, and how would an Ubuntu image help? Someone decides to move CI from Travis to GitLab, and they decide to regenerate their repo using this cookiecutter. It gives them a gitlab-ci.yml which runs their tox tests. And...?
I should mention that I know very little about how Travis works, is there some context here that I'm missing?

it has build tools preinstalled and a wide array of available libraries - this means you can practically install any dependency you want (not so easy with the much leaner Alpine).

...okay? That sounds like something that makes a difference to people writing Dockerfiles, but that someone writing a typical Python library will never know or care about.

Build tools will certainly be needed for libraries that have C extensions, but that just goes back to the general problem of selecting an image that has the dependencies of the package (same as if we needed numpy or scipy or, for that matter, Cython).

If your point is that we should be sure to allow people to easily swap out for any image if they want a different one with different dependencies installed, then yes, that's a good point. I just put in a || so the openssh bit will work regardless of the operating system. I think that's the only line that depends on the OS.

you'd think that caching won't be effective with the bulky Ubuntu image but perhaps is reasonable to expect that the CI worker has a fast internet connection.

Maybe. I don't have hard numbers. (I mean, at the moment I don't have images to compare to each other in the first place, I've just been comparing to big machine-learning images.)

… GitLab can run its CI pipeline.

Delete .gitlab-ci.yml if we are not using GitLab.

If we put null as the default, then for some reason that argument is required instead of defaulting to null.

If the user sets an SSH_PRIVATE_KEY, then set up SSH.

Use prebuilt dahanna/python.3.7-git-tox-alpine with git and tox already installed.

Deploy GitLab Pages even if tests fail, because if we include a link to the documentation in the README, then sphinx linkcheck will fail, so the pages will never get deployed, so sphinx linkcheck will keep failing.

Use python -m sphinx, so that we can preinstall sphinx and use it with --sitepackages.
@ionelmc
Copy link
Owner

ionelmc commented Nov 3, 2020

I have pushed some commits to prevent appveyor blowing up to 30minute builds in certain failure scenarios. Feel free to edit (perhaps a rebase is better?).

@dHannasch
Copy link
Collaborator Author

I have pushed some commits to prevent appveyor blowing up to 30minute builds in certain failure scenarios. Feel free to edit (perhaps a rebase is better?).

Looks good to me, but I'm curious, why in this branch? Just using it as a test case to see whether it works here before putting it into master?

(Also, I'm not sure what you want me to rebase?)

@ionelmc
Copy link
Owner

ionelmc commented Nov 3, 2020

Well since this is a PR this triggers builds on my appveyor account - and that blocks all my other projects on appveyor. Do you have cancel permissions on appveyor?

@dHannasch
Copy link
Collaborator Author

Well since this is a PR this triggers builds on my appveyor account - and that blocks all my other projects on appveyor. Do you have cancel permissions on appveyor?

I do, as you can see from the last few builds:
https://ci.appveyor.com/project/ionelmc/cookiecutter-pylibrary/builds/36081973
https://ci.appveyor.com/project/ionelmc/cookiecutter-pylibrary/builds/36073061
https://ci.appveyor.com/project/ionelmc/cookiecutter-pylibrary/builds/35820811

I figured I'd let one run after you fixed the CI, just to get a green checkmark, but obviously it's not urgent, so feel free to cancel or whatever if it's getting in the way.

But I think you're trying to say something that isn't getting through because I know nothing about AppVeyor. E.g. 4092064 I have no idea what ErrorActionPreference is, but the commit message says "Bail out on errors. Prevents strange failure modes." which sounds like a generically-good thing that we'd want in master. Is there a reason we wouldn't want that in master?

@dHannasch
Copy link
Collaborator Author

dHannasch commented Nov 3, 2020

Actually, now I see you did put those commits in master? Wouldn't that automatically put them in all pull requests? Travis does that refs/pull/134/merge thing. That's why the Travis build started failing in the first place, and why the Travis build was immediately fixed as soon as you fixed it on master. Does AppVeyor not merge in master when it runs tests?

@ionelmc
Copy link
Owner

ionelmc commented Nov 4, 2020

Oooof. My head was a mess from angrily digging into stackoverflow for that damn problem. Can't believe powershell is worse than bash!

@dHannasch
Copy link
Collaborator Author

FYI, I'm playing around with this at https://gitlab.com/library-cookiecutters/cookiecutter-pylibrary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants