-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH copyright-notice: check in the first 4096 bytes instead of 1024 #11927
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems okay to me. Thanks for contributing! cc @MichaReiser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay, although I'm not sure why the limit even exists. The only performance issue that I can see is if the regex uses a wildcard at the start. Any regex without a wildcard at the start should end by just testing a few characters.
Okay it seems that the somewhat arbitrary limit is coming from the original lint rule. I think the limit there makes sense because the rule actually reads the file. The situation for ruff is different because we already have the file in memory. Anyway, thanks for contributing! |
I'm not sure either cc @BurntSushi Not a lot of context in the original implementation discussion at #4701 |
Here's the regex:
Specifically:
And the search is using |
Summary
related to #5306
The check right now only checks in the first 1024 bytes, and that's really not enough when there's a docstring at the beginning of a file.
A more proper fix might be needed, which might be more complex (and I don't have the
rust
skills to implement that). But this temporary "fix" might enable more users to use this.Context: We want to use this rule in https://github.com/scikit-learn/scikit-learn/ and we got blocked because of this hardcoded rule (which TBH took us quite a while to figure out why it was failing since it's not documented).
Test Plan
This is already kinda tested, modified the test for the new byte number.