-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib.fileset
should have a way to filter directories
#271307
Comments
I would love to have a way to do this! I'm slightly partial to the Maybe we could change the name slightly to be something like |
That might work. Here's another alternative though:
This would returns all files within subdirectories named "target". Does it satisfy the requirements?
The problem with this one is that it's not as powerful as a predicate anymore. If you wanted to only get files within directories like But, I think that might be fine, all use cases I know of only need a constant directory name (like Would this work for you @JRMurr? |
I don't like to bunch up many operations into a single call, because composition tends to be more flexible and self-explanatory. Nonetheless, maybe we have a case here where combined functionality is not to the detriment of flexibility, and does not result in ambiguity? (Note: check again) fileset.filterNodes {
directory = { name, hasExt, ... }: ... /*bool*/;
file = { name, hasExt, ... }: ... /*bool*/;
} Still not getting good vibes tbh.
Perhaps by joining these filter functions, the still somewhat arbitrary meaning appears more consistent, and is easier to get right? |
This requires quite a long stretch of creativity to match it to the original problem. I would not be opposed to a helper function that takes care of this elaborate thinking, ie something like
Hmm, I just noticed that I got the plural |
I've also thought about this. Here's how I think it could look: fileset.directoryFilter {
# Whether to recurse into a directory
recurseInto = { name, components, subpath, ... }: ... /*bool*/;
# Whether to include a file
includeFile = { name, dirComponents, dirSubpath, components, subpath, hasExt, type, ... }: ... /*bool*/;
} ./. Then we could have withoutDirectories = dirs: directoryFilter {
directory = path;
recurseInto = { name, ... }: ! elem name dirs;
includeFile = _: true;
})
withoutDirectories [ "target" ] ./. Furthermore we can easily implement fileFilter = pred: directoryFilter {
recurseInto = { ... }: true;
includeFile = pred;
} And this "fixes" #269517 because it can be used to implement a replacement for It's actually not even that bad, because three major problems with
I'm liking this! |
Hi @infinisil, I've been taking a look at this proposal based on your comments in #306371 This basically seems like an even more powerful |
@andrewhamon Nice! Considering that there's already 3 👍's, I think we should definitely go for it. Would you be up to PRing that? Would be very appreciated, I'll review it :D |
Yea, I'd be happy to! I could use a little clarification on the intention behind the predicate args you listed. Also... do we need all of those? Or could we scope it down to a minimal set? I think a minimal set would be: recurseInto = { name, components, ... }: ... /*bool*/;
includeFile = { name, components, type, ... }: ... /*bool*/; I'm assuming that Also, I do wonder if "components" is the best name. It makes sense to me, but maybe "pathComponents", "pathSegments" or "pathParts" could be more clear and less generic. |
Thinking about this a bit more, I think it should be just like this for now: recurseInto = { components, ... }: <...>;
includeFile = { name, hasExt, type, ... }: <...>; We shouldn't pass a I think it should be |
If To reiterate, my use case is "select any I thought the main reservation is the performance footgun for use cases that could be much better solved in a different way - with the proposed |
i.e. this is extremely stupid, but its performance should not be surprising to anyone: # include all files in some/dir
fileset.directoryFilter {
recurseInto = { ... }: true;
includeFile = { components, ... }: components == ["some" "dir"];
} ./. I am sure that no matter the API, people can invent a stupid way to use it. Please don't let that hold back knowledgeable users. |
Ah sorry, I didn't fully think through your use case! It turns out it's trickier than I anticipated, but I believe it would still be doable with the proposed interface: rec {
# All files under a path that are not within a directory of a specific name
# Doesn't have to recurse into the directories not included
withoutDirectories = dirs:
directoryFilter {
recurseInto = { components, ... }: ! elem (last components) dirs;
includeFile = { ... }: true;
};
# By inverting using `difference`, we get all files _within_ specific directories!
# Unfortunately this implementation does need to recurse into all such directories
withinDirectories = dirs: path:
difference path (withoutDirectories dirs path);
# All Rust files under any `bin`
rustUnderBin = intersection
(withinDirectories [ "bin" ] ./.)
(fileFilter (file: file.hasExt "rs") ./.)
} However, we can actually have a more efficient implementation of # Filter out all `.git` directories, without recursing into any of them!
difference ./. (withinDirectories [ ".git" ] ./.) So instead of |
My initial take is:
Just to clarify, you mean some other implementation than the one you demonstrated? That makes sense, though I can't quite picture what that faster impl would look like. I need to mull this over a bit. |
I feel like the potential speedup from an optimal That said, there is a fair bit about |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/filtering-source-trees-with-nix-and-nixpkgs/19148/6 |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/easy-source-filtering-with-file-sets/29117/23 |
Problem
Sometimes one needs to filter out directories with a specific name throughout the source directory.
For example, you might have a structure like
And you want to remove all of the
target
directories.Currently, the
lib.fileset
library does not have any function to make this work. To do that currently you need to work around it with either:readDir
andlib.fileset.unions
I don't want to write outRequirements
Potential solutions
Directory name predicate
This creates a file set containing all files within directories named "target". E.g.
./foo/target/some/file.txt
would be included because one offoo
,target
andsome
satisfies the predicate. Does it satisfy the requirements?✔️ It doesn't need to recurse into the filtered out directories, but the selection needs to be inverted to actually filter out the ones with
target
:❌ This is not intuitive, because one might expect an inverted predicate to return the complement set:
But this doesn't work: This would include
./foo/target/some/file.txt
, sincefoo
satisfies the predicate.Give the
fileFilter
predicate access to the file's directoriesLike already described in #269504. This would allow filtering certain directories:
This way
./foo/target/some/file.nix
would not be included, becausetarget
is in./foo/target/some
. Does it satisfy the requirements?Does a solution satisfying both requirements exist?
I don't know yet
This issue is sponsored by Antithesis ✨
The text was updated successfully, but these errors were encountered: