Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid traversing NTFS junction points in git clean -dfx #1976

Closed
wants to merge 6 commits into from

Conversation

dscho
Copy link
Member

@dscho dscho commented Dec 11, 2018

This addresses #607.

@dscho dscho requested a review from benpeart December 11, 2018 13:17
@dscho
Copy link
Member Author

dscho commented Dec 11, 2018

This PR supersedes #1414 and #1415.

It seems to be not exactly rare on Windows to install NTFS junction
points (the equivalent of "bind mounts" on Linux/Unix) in worktrees,
e.g. to map some development tools into a subdirectory.

In such a scenario, it is pretty horrible if `git clean -dfx` traverses
into the mapped directory and starts to "clean up".

Let's just not do that. Let's make sure before we traverse into a
directory that it is not a mount point (or junction).

This addresses git-for-windows#607

Signed-off-by: Johannes Schindelin <[email protected]>
Windows' equivalent to "bind mounts", NTFS junction points, can be
unlinked without affecting the mount target. This is clearly what users
expect to happen when they call `git clean -dfx` in a worktree that
contains NTFS junction points: the junction should be removed, and the
target directory of said junction should be left alone (unless it is
inside the worktree).

Signed-off-by: Johannes Schindelin <[email protected]>
We will use this in the next commit to implement an FSCache-aware
version of is_mount_point().

Signed-off-by: Johannes Schindelin <[email protected]>
When FSCache is active, we can cache the reparse tag and use it directly
to determine whether a path refers to an NTFS junction, without any
additional, costly I/O.

Note: this change only makes a difference with the next commit, which
will make use of the FSCache in `git clean` (contingent on
`core.fscache` set, of course).

Signed-off-by: Johannes Schindelin <[email protected]>
The `git clean` command needs to enumerate plenty of files and
directories, and can therefore benefit from the FSCache.

Signed-off-by: Johannes Schindelin <[email protected]>
Copy link

@benpeart benpeart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives me pause:

	/*
	 * we might have removed this as part of earlier
	 * recursive directory removal, so lstat() here could
	 * fail with ENOENT.
	 */
	if (lstat(abs_path.buf, &st))
		continue;

With fscache enabled, if we've already cached the result then this lstat() will return success even if the file has been "removed this as part of earlier recursive directory removal." If I'm reading the code right, that means we could end up trying to delete the file multiple times. This is after the interactive_main_loop() so I don't think we'd prompt the user for already deleted files but it looks like that could generate warnings in remove_dirs().

@dscho dscho force-pushed the dont-clean-junctions branch from 901b11b to e004e1b Compare December 11, 2018 14:55
@dscho
Copy link
Member Author

dscho commented Dec 11, 2018

If I'm reading the code right, that means we could end up trying to delete the file multiple times.

Hmm. That's a good point. We use fscache_lstat(), and that might end up accessing cached, no longer valid data...

Let me think whether we can avoid that somehow.

path.c Outdated Show resolved Hide resolved
}
current_dev = st.st_dev;

/* Now look at the current directoru */
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Now look at the current directoru */
/* Now look at the parent directory */

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it really is the current directory, not the parent directory, that we're looking at... but of course, it is a directory, not a directoru... 😄

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code obscures this a bit, but I am positive that this is the parent directory. We append '/.' to path above, then we append '.' here, which makes '/..'.

If the '/.' above is not needed, I suggest removing it. Maybe a 'dirname' method (which strips the tail component from a path which must not end on '/..') would make this more readable.

strbuf_addstr(path, "/.");
if (lstat(path->buf, &st)) {
/*
* If we cannot access the current directory, we cannot say
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we cannot access the current directory, and return false here, will we then proceed to clean files in this directory even though it might actually be a mount point? If so, maybe we should act on the side of caution and check is_definitely_not_a_mount_point instead.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for Linux mount points, right? Let's ignore this is a corner case.

@dscho
Copy link
Member Author

dscho commented Feb 25, 2019

@drizzd since you offered to help... After @benpeart 's comments, I think the only sensible way forward is to teach FSCache a way to invalidate entries, and to do exactly that in mingw_rmdir() and mingw_unlink().

From my perspective, the trickiest thing here is to get the invalidation right in light of multi-threading. I.e. I would want to have placeholder entries for the invalidated entries that would trigger a re-read, but that would have to be done in a way that fscache_merge() does the right thing.

That's what is holding up this Pull Request.

Co-Authored-By: dscho <[email protected]>
@drizzd
Copy link

drizzd commented Mar 9, 2019

I did some digging regarding the lstat check. I think it is redundant today.

The check and the comment were added in d871c86 (contained in Git 1.5.4-rc0). At the time, the loop iterated directly over the directory entries found by read_directory. For example, it would loop over ["a/", "a/b"]. After removing "a" recursively, the check makes sure that we do not try to remove "a/b" again.

In 6b1db43 (contained in Git 2.13.2), a new method correct_untracked_entries was added which removes entries contained in other entries. For example, "a/b" is removed from ["a/", "a/b"]. Therefore, we will not encounter "a/b" after removing "a".

@dayknchung

This comment has been minimized.

@drizzd
Copy link

drizzd commented Jul 20, 2019

Just for the record, directory symlinks created with mklink /D have file mode S_IFCHR. Therefore, git-clean treats them as regular files. It will therefore remove them as it would remove symlinks on Linux. As far as I am concerned, I see no need to change this behavior.

@drizzd
Copy link

drizzd commented Jul 20, 2019

I created a new PR #2268 since I was unable to push to dscho/dont-clean-junctions. Maybe one of the prerequisites are not met.

From https://help.github.com/en/articles/committing-changes-to-a-pull-request-branch-created-from-a-fork:

You can only make commits on pull request branches that:

  • are opened in a repository that you have push access to and that were created from a fork of that repository
  • have permission granted from the pull request creator
  • don't have branch restrictions that will prevent you from committing

@PhilipOakley
Copy link

@drizzd

Just for the record, ... etc.

Given how much discussion symlinks have had, is there a particular place (e.g. wiki https://github.com/git-for-windows/git/wiki/Symbolic-Links - last update 1yr ago) folk should be directed to that will record the latest clarifications, implications, and internal implementations? e.g. a collation of the various comments now that we have a cleaner PR

@dscho
Copy link
Member Author

dscho commented Jul 22, 2019

Closing in favor of #2268

@dscho dscho closed this Jul 22, 2019
@dscho dscho deleted the dont-clean-junctions branch July 22, 2019 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants