-
Notifications
You must be signed in to change notification settings - Fork 9
FEEDBACK WANTED: which directories / files / file patterns should be excluded from the Ansible installation? #65
Comments
Maybe we should decide what to include instead? Since we have no control over what collection maintainers package into artifacts, maintaining a comprehensive exclusion list seems like something that could turn into a full-time job. My include list would contain:
I must admit that it has been a while since I looked at the collection artifact, so do not be afraid to point out any omissions in my list because I am already assuming I forgot something important ;) Edit: Just to make it clear: I do not propose to filter the contents of sdist tarballs (because legal). My inclusion list applies to things like wheels that target end-users. |
I like this take from @tadeboro because Otherwise I think it would be fair to link this other discussion topic about reducing the size of the package and improving the installation performance: #29 For the record, here's the unncessary/hidden files and directories that are currently deleted from Fedora packaging for Ansible 5.3.0: https://gist.github.com/dmsimard/edbbf4206b12ac77a694f96eefd60e41 (368 hits) Unless mistaken, there is absolutely no use to keeping them in the package and it would be much cleaner to not include them at all -- even if they are not problematic in amount or in size. In fact, the bulk of the package comes from sanity, unit and integration tests which make up more than 20k files:
I originally thought that we shouldn't ship tests at all but my opinion has matured a bit as I worked on downstream packaging and discussed with maintainers for other distributions:
With that in mind and including what @felixfontein originally suggested, I came up with the following table (not final, just my first opinion):
|
@tadeboro you definitely forgot @dmsimard I don't think we can exclude anything from the tarball, since it's our source distribution, until legal approves that. If someone wants to take a look at the current tarball: https://files.pythonhosted.org/packages/d3/67/ceac5a18e6b675e7ea5330a0737625593ab8f9706c1ed31071c29b62322c/ansible-5.3.0.tar.gz |
Still haven't heard back on that topic... |
Tricky question. At first, I thought it would be a good idea. But now I think you shouldn't leave out anything. You see, if the Ansible package contains collection xyz, it should contain it as it was released imho. Otherwise, you'll have a difference between installing Ansible and installing ansible-core plus the collection. (You'll have a difference in where it's installed, but not in what.) I think the best way would be to urge collection authors to strip down their releases by making use of If the collection authors think there's a good reason to not exclude a file or directory, the same reason might apply to not exclude it from Ansible. |
For the record:
So there are more docs and tests than modules. |
That's basically impossible: the artefact uploaded to Galaxy serves as the source distribution of the collection (at least that's how it works for community collections), and thus it must contain everything needed to build and develop on the collection. If Galaxy would allow to host both a source distribution and a built version, that would be possible. (At least that is how I understand the legal situation. I'm no lawyer.) |
Sounds like we need a definition of what a release on galaxy should be. Is it a source distribution or is it something for end users?
Linux distributions often have a something like package Sounds like a good idea, but would take some time to implement imho.
Where do you see a legal problem? I see a technical and organizational problem here, but not a legal one. |
From an external standpoint I'm also in favor of an include list, that has all the basic files/directories listed (like @tadeboro proposed) and can be extended by a collection maintainer if needed. My rationale for this is, that I expect that people forget to manage an explicit exclude list, when they add a On the flip-side this approach could break collections, that need stuff outside the normal directories. Currently I cannot think of a case, where this would be an issue (for the collection I maintain). Maybe this is also a case that should break, since it is generally not expected to extend the collection folder structure by individual folders? It should be easy to do the right thing. I think the right thing is to keep the packages small and to maintain a clean structure of folders/files for Ansible collections. |
I'd like to step back and think again. What's the goal? Well, it's
So are we looking for the best solution, that is should Ansible 6 contain only the minimum of required files? This would be nice, I like the purity of this. But maybe we should think: What's good enough to reach this goal? For example, while all those As stated above, the bulk of non-code is in
As you say, some of this documentation is redundant. On the other hand, for someone working in an air-gaped environment it might be useful to have scenario guides, filter guides, etc. available offline. But I don't really think so. All in all, I tend to removing I think you should start with removing |
Good point. In that case I'd like this to be a hard coded exclude list and not be configurable by a collection maintainer. If this is configurable it will take some time for people to pick this up and update their repos. Once they have done that, they discover Ansible switched to a "better" (inclusion list) approach. And everything starts anew. This would bring the disadvantages from both solutions to collection maintainers. |
I don't think we have a choice here. This is basically dictated by the GPL - or at least that's how I understood it. Again: I'm not a lawyer.
So probably mean
I mentioned above what I think I understood what the problem is. I am not a lawyer. |
The changelogs/changelog.yaml file is used by the build tool (antsibull). There is no need to actually install it for end-users. But it must be in the artefact uploaded to Galaxy. |
That is a valid point. I don't think collections should use files from outside the default directories (i.e. plugins/, roles/ and meta/), but right now they can. I guess we should include that into the requirements for collections included in Ansible (https://github.com/ansible-collections/overview/blob/main/collection_requirements.rst). |
People interested in that can always download the ansible tarball from PyPi and extract it. It contains all the files that are part of the collections, including all docs/ folders. If you have an air-gapped environment, you have likely downloaded it anyway (if you didn't only downloaded the wheels).
I'm fine with removing these (and a few more obvious ones, like |
I just wanted to point out that removing
I think this would be a very good idea 👍 Your original suggestion looks OK to me, and I can't think of something to add. However, I think it should be documented somewhere that those files and directories will be removed for collections included in Ansible. |
I created #70 for this. |
@mariolenz I mainly added these since for them I am very sure that they can be removed without impacting functionality. Assuming a collection does not do some very naughty things :) |
What about excluding dotfiles/folders in general. That will catch other CI systems and even |
@markuman Thinking about this a bit more, I think I would only do that in the collection root. I would avoid excluding files from other directories, especially |
(Also some collections included in Ansible are hosted on GitLab instances. Not sure whether they have a |
Right now we have two different strategies:
Would be great if you could quickly indicate which of these two choices you prefer - or if you prefer a third one, add a comment :) For a quick reaction, use 🎉 for 1 and 🚀 for 2. |
I suggest 2 for Ansible 6 and then 1 for Ansible 7. This would give collections some time to adapt (if needed). |
@mariolenz for Ansible 7 we'll have to reevaluate the approach anyway, depending on how smooth (or not) things go with Ansible 6. |
The feedback seems to be pretty clear in favor of 2. I'll start formulating something we can vote on (a classic yes/no vote :) ) in this comment; feel free to comment on it if you want to fix the formulation. If "public opinion" doesn't change until say this weekend, I'll create a voting issue on this next week so we'll have a decision in two weeks after that.
Does this sound good? |
My understanding is that these files would still be in the source tarball (just not installed) which I believe is worth mentioning for the sake of clarity.
I didn't think about this before but came to the realization while I was trying to figure out if this could have an impact on the new collection signature verification feature. The one thing I ponder on are the implications that installing What would be the right approach ? |
New version:
|
I already thought a bit about this some time ago. The installed collections will obviously no longer verify, since they are not complete. Which is unfortunate. But at the same time, these collections were not installed with the ansible-galaxy "package manager", but with pip, so if some validation is done it should be done with pip and not with ansible-galaxy. (Some OS packages for Ansible will already now have a similar problem since they move README files around or not include every file.) I would say:
|
@ansible-community/steering-committee the above question by @dmsimard is a pretty important one, this is something you all should look at and think about. |
There's a new feature in development that will allow to exactly specify which files to include: ansible/ansible#78422 |
So far nobody ever complained about missing files, so I think we are on a good track. I think having one issue for this is enough, so let's keep #126 which is newer. |
Summary
Since we want to strip down the installed Ansible package for Ansible 6 (see WIP PR: ansible-community/antsibull-build#342), we should discuss which files, directories and file patterns to leave out.
I would really like to avoid having a list of special cases for special directories, so let's start with something as generic as possible. Some things that really have to stay are
MANIFEST.json
(needed to retrieve the collections' version) andmeta/runtime.yml
(needed for plugin routing).README
,LICENSE
andCOPYING
files should better be kept as well. Directories likeplugins/
androles/
should only be touched with care.As a first approximation, I've added the following to the ignore list. Files are just named, directories have a trailing slash. All patterns are relative to the collection root.
.gitignore
(this is needed for development only).github/
(this usually contains GitHub Actions workflows and other development-related stuff).azure-pipelines/
(same: CI configuration, which is development-related)changelogs/
(whilechangelogs/changelog.yaml
is needed during build, we don't need it in the install; other files aren't needed as well)docs/
(this is somewhat debatable: often this contains automatic documentation extracted from the modules and plugins, see for example https://github.com/ansible-collections/community.aws/tree/main/docs. These docs are redundant. There are also other docs, like scenario guides, filter guides, etc.; see for example https://github.com/ansible-collections/amazon.aws/tree/main/docs/docsite/rst and https://github.com/ansible-collections/community.general/tree/main/docs/docsite/rst. Including these might make sense, though on the other hand who looks for documentation in your Python site-packages directory?)tests/
(unit and integration tests, as well as sanity test ignore files - these should never be needed for end-users)If you think some of the above should not be excluded, or think some more should be excluded, feel free to mention that here! The idea is to gather a lot of feedback so we can come up with a definite list of things that should be excluded (and things that should not be excluded).
The text was updated successfully, but these errors were encountered: