-
-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New hook 'destroyed-symlinks' #511
Conversation
No idea why coverage report is different for pypy, will investigate it. |
I've finally managed to find a time to dig into pypy issue. The reason was the following. In CPython in code like this: print('something', file=open(somefile, 'a')) file object losts its last reference it this function call and it is destroyed forcing file buffers to flush. So, for some reason, this doesn't happen in pypy. Workaround would be using explicit |
).split(b'\0'): | ||
splitted = line.split(b' ') | ||
if splitted and splitted[0] == ORDINARY_CHANGED_ENTRIES_MARKER: | ||
# variable names are taken from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these variable names are terrible, I don't particularly care if they match upstream let's make them readable
import sys | ||
from operator import methodcaller | ||
from subprocess import check_call | ||
from subprocess import check_output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rest of the project prefers not to from
import from the standard library
def find_destroyed_symlinks(autofix: bool) -> Sequence[bytes]: | ||
destroyed_links = [] | ||
for line in check_output( | ||
['git', 'status', '--porcelain=v2', '-z'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# by something like trailing-whitespace and/or | ||
# mixed-line-ending hooks so we need to go deeper | ||
index_size = int( | ||
check_output(['git', 'cat-file', '-s', hI]).strip(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a cmd_output
helper in pre_commit_hooks/util.py
that probably makes most of this easier to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kwargs.setdefault('stderr', subprocess.PIPE)
This eats stderr output which is probably not the best idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's just the default, you can configure it otherwise. and actually it's probably better as a pipe so the errors only display when the tool breaks
index_size = int( | ||
check_output(['git', 'cat-file', '-s', hI]).strip(), | ||
) | ||
# Most filesystems limit path length to 4096 bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
windows allows paths longer than this with UNC naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, there definitely should be a limit in order to avoid wasting RAM in case when symlink was deliberately replaced with big file, which, on the other hand is not good idea in git so I don't know what is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can probably limit the read to the original symlink destination filename's length (that's how windows usually represents these on disk anyway iirc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can probably limit the read to the original symlink destination filename's length
Plus a few bytes for various newlines. That's a good idea.
if autofix: | ||
check_call([ | ||
'git', | ||
'update-index', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre-commit's stance is to never modify the staging area, I believe this does that so it can't be allowed -- printing this command seems fine though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, update-index is redundant here and even harmful — if the file was modified by some other hooks, symlinks would be broken. Instead, I should just unstage the file. Is this allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just leave it and print a command for the user to run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if this is opt-in feature (which it is)? May be rename --autofix
to --unstage
to make it more explicit?
Printing the command (and may be some help) if --autofix
is not enabled is good idea though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's kinda two problems:
- pre-commit won't notice the differences
- it might get auto committed when wrong
if there's a way to modify the file to look like it's the symlink again in the unstaged portion that would be best
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed --autofix
, now only the message with an instruction is displayed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there's a way to modify the file to look like it's the symlink again in the unstaged portion that would be best
If it was possible, git would have done this. That's a problem.
.pre-commit-hooks.yaml
Outdated
entry: destroyed-symlinks | ||
language: python | ||
types: [file] | ||
pass_filenames: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should still take filenames and filter the results to that -- that way during merge conflict resolution if someone else makes a mistake this doesn't blame the user performing the merge
source_repo = tmpdir.join('src') | ||
os.makedirs(source_repo, exist_ok=True) | ||
test_repo = tmpdir.join('test') | ||
with source_repo.as_cwd(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that you can use git -C source_repo ...
instead of as_cwd()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know but this way it's a bit more readable IMO.
6a43b59
to
0ba63bb
Compare
*itertools.chain( | ||
('git', 'status', '--porcelain=v2', '-z', '--'), | ||
files, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be:
cmd_output('git', 'status', '--porcelain=v2', '-z', '--', *files)
README.md
Outdated
Detects symlinks which are changed to regular files with a content of a path which that symlink was pointing to. | ||
This usually happens on Windows in case when user without a permission for creating symlinks clones repository with symlinks. | ||
The following argument is available: | ||
- `--autofix` - unstage detected broken symlinks so they won't be commited. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this option is gone now
mode_HEAD == PERMS_LINK, | ||
mode_index != PERMS_LINK, | ||
mode_index != PERMS_NONEXIST, | ||
)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all((
here is going to be slower than putting and
to join these conditions
|
||
|
||
def normalize_content(content: bytes) -> bytes: | ||
return content.rstrip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably just inline this
print('You should unstage affected files:') | ||
print( | ||
' git reset HEAD -- {}'.format( | ||
' '.join(map(shlex.quote, destroyed_links)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't use map
/ filter
/ reduce
-- a list comprehension is more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tastes differ.
print( | ||
'And retry commit. As a long term solution ' | ||
'you may try to explicitly tell git that your ' | ||
'envionment does not support symlinks:', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
environment
'you may try to explicitly tell git that your ' | ||
'envionment does not support symlinks:', | ||
) | ||
print(' git config --local core.symlinks false') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc --local
is the default for this command
pre_commit_hooks/util.py
Outdated
@@ -13,7 +15,11 @@ def added_files() -> Set[str]: | |||
return set(cmd_output(*cmd).splitlines()) | |||
|
|||
|
|||
def cmd_output(*cmd: str, retcode: Optional[int] = 0, **kwargs: Any) -> str: | |||
def cmd_output( | |||
*cmd: Union[str, bytes], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mixing here used to have problems on windows, I'm not sure if it's still a problem in python3 though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted. It was needed in the previous version where everything was bytes
. I don't have Windows nearby to check at the moment, however, tests weren't failing.
They use os.fsdecode (subprocess.py#L561) which, quoting the doc, decode the path-like filename from the filesystem encoding with 'surrogateescape' error handler, or 'strict' on Windows; return str
unchanged (good bad example of mixing str
and bytes
).
0ba63bb
to
fe30ede
Compare
… regular files with a content of a path which that symlink was pointing to; move zsplit to util
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! |
The hook detects symlinks which are changed to regular files with a content of a path which that symlink was pointing to.
This usually happens on Windows in case when user without a permission for creating symlinks clones repository with symlinks.
Some implementation notes:bytes
is used for filenames in order to avoid having any troubles with encodings. After all, the hook check for destroyed symlinks, not for bad file encoding.;)
P. S. If you don't think that this hook would be useful for general audience, I'm fine with hosting it in the separate repo. Just wanted to try.