Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine tree artifact semantics #440

Open
ghost opened this issue Oct 18, 2021 · 7 comments
Open

Determine tree artifact semantics #440

ghost opened this issue Oct 18, 2021 · 7 comments
Labels
design P2 An issue that should be worked on when time is available tree-artifacts
Milestone

Comments

@ghost
Copy link

ghost commented Oct 18, 2021

While implementing tree artifact support for pkg_zip, I've encountered a few cases that the rules_pkg community prolly needs to sort out so that handling is consistent among the various pkg_foo rules.

A few open questions from pkg_zip to seed discussion:

  • Collision detection
    • Should pkg_foo perform collision detection at runtime to detect (and fail?) when a tree artifact file collides with a file from either pkg_files or another tree artifact? (I believe yes.)
    • What about directories? I assume we should allow two distinct tree artifacts to both stage files to a single directory.
  • Permissions
    • Should files' permissions be inherited from the attributes bound to the corresponding pkg_files rule?
    • What about directories? Do we stick to defaults (0o755/root/root) and just allow override via pkg_mkdirs?
  • Empty directories
    • What do we do if a tree artifact contains an empty directory? Do we preserve it in the final packaged artifact or ignore it?
@nacl
Copy link
Collaborator

nacl commented Nov 1, 2021

What about directories? Do we stick to defaults (0o755/root/root) and just allow override via pkg_mkdirs?

If the TreeArtifacts are sufficiently complex, they will contain many directories with permissions that need to be set, and determining these reliably is difficult during build-time.

It may be better to augment pkg_files with an attribute to control TreeArtifact directory permissions.

@nacl
Copy link
Collaborator

nacl commented Dec 6, 2021

The collision handling problem has a solution proposed in #476 (comment).

@nacl nacl added design P2 An issue that should be worked on when time is available labels Dec 6, 2021
@nacl
Copy link
Collaborator

nacl commented Jan 24, 2022

I'm going to start tracking tasks here; will update as more tickets are created:

Collision handling

Should pkg_foo perform collision detection at runtime to detect (and fail?) when a tree artifact file collides with a file from either pkg_files or another tree artifact? (I believe yes.)
What about directories? I assume we should allow two distinct tree artifacts to both stage files to a single directory.

Agreed.

Prioritize higher. This is seen by external users.

Permissions management

Should files' permissions be inherited from the attributes bound to the corresponding pkg_files rule?

This seems to be the intuitive default, but allowing for overrides for individual subset(s) of files is likely a good thing. However, nobody has specifically asked for this, so deferring until it has increased importance. It may also require some cleverness in the package builder scripts.

What about directories? Do we stick to defaults (0o755/root/root) and just allow override via pkg_mkdirs?

Directories inside tree artifacts should already be adjustable with pkg_mkdirs. I feel that this is adequate and intuitive enough for this use case. No action should need to be taken, unless we want to provide this logic in pkg_files itself.

For the named directory itself, there is currently no possible override: using pkg_mkdirs results in a detected collision. I've seen this be a problem myself in my own internal use case, so of likely higher importance.

Empty directory management

What do we do if a tree artifact contains an empty directory? Do we preserve it in the final packaged artifact or ignore it?

Intuition says keep or control with a flag, but it may not be possible to do this right now. See bazelbuild/bazel#14094 for more details.

@aiuto
Copy link
Collaborator

aiuto commented Feb 12, 2022

I think the most sustainable method is to make raw tree artifacts (that is, if encountered in build_*.py)

  • ignore empty directories
  • give all files a default mode

If users need more, they should wrap tree artifact target in a pkg_files with
options to keep empties, and change permissions in various ways. Then we
can apply it to the generated manifest, rather than in each of the builders.

@aiuto aiuto added this to the 1.0 milestone Feb 14, 2022
@nickbreen
Copy link

What about directories? Do we stick to defaults (0o755/root/root) and just allow override via pkg_mkdirs?

Directories inside tree artifacts should already be adjustable with pkg_mkdirs. I feel that this is adequate and intuitive enough for this use case. No action should need to be taken, unless we want to provide this logic in pkg_files itself.

pkg_mkdirs does not appear to effective. It can add directories with the specified attributes, but does not appear to be able to change attributes.

I've put together a test-case for handling of tree artifacts: https://github.com/nickbreen/rules_pkg_bug.

My naive expectation would be that whatever the directory and file permissions are in the TreeArtifact's contents are preserved. This is codified in the above test case: the rule that generates a tree artifact also generates a file that lists the permissions of the directories and files in the tree artifact. In particular it's worth noting that at this point umask is honoured.

It appears that pkg_files applies the specified attributes to all files and directories in the tree artifact. With attributes defaulted, this results in directories losing their execute bits.

@aiuto
Copy link
Collaborator

aiuto commented Jul 19, 2022

Interesting points.

pkg_mkdirs does not appear to effective. It can add directories with the specified attributes, but does not appear to be able to change attributes.

Right. That is WAI, but the use case for changing attributes in a tree artifact is a real one.

I've put together a test-case for handling of tree artifacts: https://github.com/nickbreen/rules_pkg_bug.

Thanks.

My naive expectation would be that whatever the directory and file permissions are in the TreeArtifact's contents are preserved. This is codified in the above test case: the rule that generates a tree artifact also generates a file that lists the permissions of the directories and files in the tree artifact. In particular it's worth noting that at this point umask is honoured.

umask being honored may be true, but it is not sufficient. In general in Bazel, once you add remote cross platform build and caching, you can't trust the file permissions created by the execution host. We need to have rule capability to make what you want explicit.

It appears that pkg_files applies the specified attributes to all files and directories in the tree artifact. With attributes defaulted, this results in directories losing their execute bits.

That sounds like a few things:

  • defaulting dirs to not-execute is simply a bug
  • not having a distinct default mode for directories from files is probably a feature request

But the real thing missing is a compossible way to modify a set of files. That is, you should be able to have a rule which takes a tree artifact as input, then lets you filter out files, rebase things, reset owners and modes by regex, .... And that should take any other valid sources (filegroups, pkg_files) in addition to tree artifacts.

@nickbreen
Copy link

But the real thing missing is a compossible way to modify a set of files. That is, you should be able to have a rule which takes a tree artifact as input, then lets you filter out files, rebase things, reset owners and modes by regex, .... And that should take any other valid sources (filegroups, pkg_files) in addition to tree artifacts.

filter_directory would be the place to do this (and is where I initially assumed one would do it) it does it almost all of that already: it just does not have an attributes parameter.

umask being honored may be true, but it is not sufficient...

I mention umask only as its behaviour might suffice as a model (though I can't find the source I was reading earlier):

  • new directories: 0777 ∧ ¬umask
  • new files: 0666 ∧ ¬umask

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design P2 An issue that should be worked on when time is available tree-artifacts
Projects
None yet
Development

No branches or pull requests

3 participants