Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make scan --ignore FILENAME apply to blobs in Git repositories #17

Closed
bradlarsen opened this issue Dec 16, 2022 · 3 comments
Closed

Make scan --ignore FILENAME apply to blobs in Git repositories #17

bradlarsen opened this issue Dec 16, 2022 · 3 comments
Assignees
Labels
content discovery Related to enumerating or specifying content to scan enhancement New feature or request

Comments

@bradlarsen
Copy link
Collaborator

The scan command currently has a --ignore FILENAME option, which allows one to specify a gitignore-style rules files for paths to ignore when scanning. Those ignore rules are only applied to plain files that are scanned, and not blobs found within Git repositories. Those rules should also apply to Git blobs.

This is probably dependent on #16 being completed first.

@bradlarsen bradlarsen added the enhancement New feature or request label Dec 16, 2022
@bradlarsen bradlarsen self-assigned this Dec 30, 2022
@bradlarsen bradlarsen added the content discovery Related to enumerating or specifying content to scan label Apr 5, 2023
@bradlarsen bradlarsen added this to the v0.15.0 milestone Aug 23, 2023
@bradlarsen bradlarsen removed this from the v0.15.0 milestone Jan 18, 2024
@bradlarsen
Copy link
Collaborator Author

This feature could be useful when dealing with scanning monorepos on a per-project basis: #119

@bradlarsen
Copy link
Collaborator Author

To implement this today, the most expedient approach:

Some complications:

  • It seems like the GitIgnore struct would have to be duplicated between the filesystem enumerator and git enumerator, since the ignore crate doesn't expose the one that it uses
  • There are some corner cases in the semantics. If a path cannot be determined for a blob for whatever reason, should there be a warning?
  • The best that Nosey Parker could do is filter against the pathname for a blob from the commit where it was first introduced. But a blob may have multiple different paths in its entire history; only the first pathname would be used when making the "should ignore?" decision for the blob.

@bradlarsen
Copy link
Collaborator Author

There is also a general oddity or surprising behavior about Nosey Parker's ignore rules. The ignore rules are .gitignore-style rules. The semantics of those rules are that they are relative to the directory that contains the .gitignore file. However, Nosey Parker uses this format to specify global rules: they are not intended to be directory-specific. The end result of this is that, essentially, all Nosey Parker ignore rules have to start with **/.

Perhaps the entire path-based ignore mechanism needs some rethinking in Nosey Parker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content discovery Related to enumerating or specifying content to scan enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant