Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.gitignore is not respected? #199

Closed
schuellerf opened this issue Jan 15, 2025 · 13 comments
Closed

.gitignore is not respected? #199

schuellerf opened this issue Jan 15, 2025 · 13 comments
Labels
S: triage Issue needs triage.

Comments

@schuellerf
Copy link
Contributor

It seems like pyspelling does not respect .gitignore.

Are there plans to implement something like --respect_gitignore or make this the default not to spellcheck files & directories listed in .gitignore?

@gir-bot gir-bot added the S: triage Issue needs triage. label Jan 15, 2025
@facelessuser
Copy link
Owner

.gitignore isn't really for the same type of things. I guess generally, if you are ignoring stuff from the project, you might want to also avoid spell checking those files, but that is not always the case. I often spell check by built documentation which I also ignore from my project.

The PySpelling is generally versioning system agnostic. I'm not sure that I have any immediate plans to change that. It does start to open a door for requests to support other version systems as well: SVN, Mercurial, Bazaar, etc.

Generally, narrowing your file patterns is normally sufficient, but I don't really know anything about your environment, so it is hard to really advise beyond that.

@schuellerf
Copy link
Contributor Author

I got your point!
The main reason for the request was, that the behavior is different to other pre-commit hooks.

With pass_filenames: false in the .pre-commit-hooks.yaml this leads to maintaining the exceptions in .gitignore and .spellcheck.yml -> sources: to have the same behavior for all pre-commit hooks.

… and you are totally right, this only works for git, the same as pre-commit itself 🤔

@facelessuser
Copy link
Owner

Yeah, but the precommit hook was low maintenance to add.

@schuellerf
Copy link
Contributor Author

I can also suggest a PR if that helps 😅

@facelessuser
Copy link
Owner

facelessuser commented Jan 15, 2025

I'm inclined to probably wait to see if this was a frequent requested thing first. I don't think the project is hindered by the lack of this functionality, and if we were to start down this road, I'd want to get it right. That's a lot of testing.

I definitely have experience in this area as I have a file glob lib that I maintain and that is already used in this library, but that doesn't mean I'm anxious to implement and maintain a gitignore implementation that will definitely add to the maintenance cost. I'm sure the spec probably has some specific quirks that would need to be matched.

@schuellerf
Copy link
Contributor Author

Sure, no time-pressure here :-D

Actually I'd just open .gitignore and use pathspec.PathSpec.from_lines() or similar.

I thought about this some more and it might be easier to just support handing over a list of [FILES …] on the command line and using pass_filenames: true (default) from pre-commit. That would be even version control agnostic and pyspelling can then check single files in addition to the --source option. Should I propose a PR for that?

@facelessuser
Copy link
Owner

facelessuser commented Jan 16, 2025

So, I'll be honest, I'm still skeptical about the overall usefulness of this request.

  1. In my experience with the library, what you are describing is not really something I run into. Obviously, I can't speak for everyone, but I haven't really been presented with any real-world scenarios that people are struggling with.

    PySpelling lets you setup specific jobs, often for a very specific file type as you want to setup the filters appropriately for that filetype. 99% of .gitignore, in my experience, doesn't even apply to the job I'm doing. It's often a few config files or a specific folder. Since PySpelling requires an explicit source and/or pattern, it's rarely looking at most of the ignored files anyway as the included patterns greatly narrow the scope.

    I doubt most people are feeding their entire repository to PySpelling. Most are explicitly saying **/*.filetype|!ignore/these-few-exceptions. I can't speak for everyone though. Would utilizing .gitignore possibly alleviate some one-off exclusions? Sure! Is it really a large problem currently? I honestly down know.

  2. Some jobs may desire to look at some build generated files for processing.For instance, I build my documentation and spellcheck the HTML as I find that more reliable and easier to filter than raw Markdown. Those are definitely files that usually are in my .gitignore. I can definitely see some not wanting this by default depending on how they use it.

  3. Changing pass_filenames: false to true by default may break other user's expectations.

  4. I imagine with the approach of feeding in all files to source, you may run into a command line limit issue on some systems? Is it really more efficient to send the entire repository to PySpelling on command line to then just filter out one file type with a few exceptions if required?

Generally, I'm rarely running into an issue where I"m like, "oh, I wish PySpelling just ignored what was in my .gitignore". Granted, it would help if I had people highlight some pains they are experiencing that require this as I've yet to see real-world cases. Right now, this feels like a hypothetical.

Before we talk PR, I guess I'd like to hear about real-world pains to first justify such a change.

I think some testing and evaluation can be done on the side to alleviate said pains and see how practical it turned out to be. It may make a convincing argument.

Assuming it was decided, yes there is a real sizable problem to be solved, a tested solution has been evaluated and tested yielding good results, then I think we could talk PR.

If I had to guess, and we wanted .gitignore logic, an option for a job to include gitignore logic per job, or even a global option for those certain they want it everywhere may be more useable than forcing everything through source on command line. Whether pathspec or a homegrown version based on what we already have available (I don't think it would be too difficult, but it would require testing) could then be decided.

@schuellerf
Copy link
Contributor Author

I agree!
The request is too specific and specifying some directories twice is not a big deal. I'll close this issue.

Just one remark on the pass_filenames. pre-commit usually depends on this if you have a big project and start introducing pre-commit it would only check the files you change within your commit/PR, hand over only the files you did change and not complain about the whole project.
So this might be a request that will arrive if people use pyspelling from within pre-commit more often, but this discussion would definitely be off-topic of this issue…

@schuellerf schuellerf closed this as not planned Won't fix, can't repro, duplicate, stale Jan 16, 2025
@facelessuser
Copy link
Owner

@schuellerf I actually think the "only changed" file thing is actually not a bad idea, though it would require me to start bringing version control support to some degree. I have an experiment here that did just that: #193, but I haven't pulled the trigger on anything like that yet.

Some people use PySpelling in an action (https://github.com/rojopolis/spellcheck-github-actions). They've taken to providing a wrapper to provide "only changed" files for now.

I definitely recognize that there may be a desire for something like this. I've generally been waiting to see how things develop and see how people use the tool. Originally, this was just written for myself, but made public. It surprised me how much use outside of myself it has received 🙂. I'm happy to add new things, but I often like to see how real desire there is for certain things before increasing maintenance cost.

@schuellerf
Copy link
Contributor Author

Maybe really just supporting a list of files on the command line will suffice for pre-commit pass_filenames and will solve the "only changed" files with pre-commit and other use cases?
pyspelling --name myTask File1 File2 …

@schuellerf
Copy link
Contributor Author

(this way pre-commit does the heavy lifting to check if a file was changed or not)

@facelessuser
Copy link
Owner

Maybe really just supporting a list of files on the command line will suffice for pre-commit pass_filenames and will solve the "only changed" files with pre-commit and other use cases?
pyspelling --name myTask File1 File2 …

You can overload command line limits this way. If they were dumped to a file you could read in, or something like that, it would be more practical.

@schuellerf
Copy link
Contributor Author

I fully agree - even stdin would be better.
That's just the default behavior of pre-commit https://pre-commit.com/#hooks-pass_filenames
and handled there (https://github.com/pre-commit/pre-commit/blob/main/pre_commit/xargs.py#L76) 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S: triage Issue needs triage.
Projects
None yet
Development

No branches or pull requests

3 participants