Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symlink support #156

Merged
merged 21 commits into from
May 24, 2015
Merged

Symlink support #156

merged 21 commits into from
May 24, 2015

Conversation

kblees
Copy link

@kblees kblees commented May 20, 2015

This with git-for-windows/msys2-runtime#10 should fix issue #117

I only tested this as admin with UAC disabled, so feedback from end users with UAC is welcome.

Design goals:

  • Symlinks should be disabled by default (due to the many quirks with Windows symlinks such as permissions, UAC, D/F conflicts, '..' path resolution etc.).
  • Existing functionality should not be affected if symlinks are disabled (i.e. guard symlink-specific code with if (has_symlinks)).
  • Minimize changes to core git code and the test suite.
  • Don't try to resolve symlinks manually (i.e. let Windows do the job by opening the symlink and using Get*ByHandle()).

To run the test-suite with symlinks enabled:

export MSYS=winsymlinks:nativestrict
make test # or prove...

kblees added 6 commits May 11, 2015 19:54
…size

strbuf_readlink() calls readlink() twice if the hint argument specifies the
exact size of the link target (e.g. by passing stat.st_size as returned by
lstat()). This is necessary because 'readlink(..., hint) == hint' could
mean that the buffer was too small.

Use hint + 1 as buffer size to prevent this.

Signed-off-by: Karsten Blees <[email protected]>
strbuf_readlink() refuses to read link targets that exceed PATH_MAX (even
if a sufficient size was specified by the caller).

As some platforms support longer paths, remove this restriction (similar
to strbuf_getcwd()).

Signed-off-by: Karsten Blees <[email protected]>
GetFileAttributes cannot handle paths with trailing dir separator. The
current [l]stat implementation calls GetFileAttributes twice if the path
has trailing slashes (first with the original path passed to [l]stat, and
and a second time with a path copy with trailing '/' removed).

With Unicode conversion, we get the length of the path for free and also
have a (wide char) buffer that can be modified.

Remove trailing directory separators before calling the Win32 API.

Signed-off-by: Karsten Blees <[email protected]>
With respect to symlinks, the current stat() implementation is almost the
same as lstat(): except for the file type (st_mode & S_IFMT), it returns
information about the link rather than the target.

Implement stat by opening the file with as little permissions as possible
and calling GetFileInformationByHandle on it. This way, all link resoltion
is handled by the Windows file system layer.

If symlinks are disabled, use lstat() as before, but fail with ELOOP if a
symlink would have to be resolved.

Signed-off-by: Karsten Blees <[email protected]>
With the new mingw_stat() implementation, do_lstat() is only called from
mingw_lstat() (with follow == 0). Remove the extra function and the old
mingw_stat()-specific (follow == 1) logic.

Signed-off-by: Karsten Blees <[email protected]>
@dscho
Copy link
Member

dscho commented May 21, 2015

Thank you, this looks really good. I set aside some time to review tomorrow.

@dscho
Copy link
Member

dscho commented May 23, 2015

Thank you, this looks really good. I set aside some time to review tomorrow.

The time I set aside of course got eaten up by unforeseen real life issues. Sorry about that. Reviewing now.

@dscho
Copy link
Member

dscho commented May 23, 2015

This Pull Request is really a pleasant gift, thank you so much! It is non-intrusive (only two test cases are touched, in a well-described manner), exactly as I hoped it would be possible.

Will let my VM run the tests now.

kblees added 15 commits May 24, 2015 00:17
When obtaining lstat information for reparse points, we need to call
FindFirstFile() in addition to GetFileInformationEx() to obtain the type
of the reparse point (symlink, mount point etc.). However, currently there
is no error handling whatsoever if FindFirstFile() fails.

Call FindFirstFile() before modifying the stat *buf output parameter and
error out if the call fails.

Note: The FindFirstFile() return value includes all the data that we get
from GetFileAttributesEx(), so we could replace GetFileAttributesEx() with
FindFirstFile(). We don't do that because GetFileAttributesEx() is about
twice as fast for single files. I.e. we only pay the extra cost of calling
FindFirstFile() in the rare case that we encounter a reparse point.

Note: The indentation of the remaining reparse point code will be fixed in
the next patch.

Signed-off-by: Karsten Blees <[email protected]>
Move S_IFLNK detection to file_attr_to_st_mode() and reuse it in fscache.

Implement DT_LNK detection in dirent.c and the fscache readdir version.

Signed-off-by: Karsten Blees <[email protected]>
Git typically doesn't trust the stat.st_size member of symlinks (e.g. see
strbuf_readlink()). However, some functions take shortcuts if st_size is 0
(e.g. diff_populate_filespec()).

In mingw_lstat() and fscache_lstat(), make sure to return an adequate size.

The extra overhead of opening and reading the reparse point to calculate
the exact size is not necessary, as git doesn't rely on the value anyway.

Signed-off-by: Karsten Blees <[email protected]>
Dynamic loading of DLL functions is duplicated in several places.

Add a set of macros to simplify the process.

Signed-off-by: Karsten Blees <[email protected]>
The retry pattern is duplicated in three places. It also seems to be too
hard to use: mingw_unlink() and mingw_rmdir() duplicate the code to retry,
and both of them do so incompletely. They also do not restore errno if the
user answers 'no'.

Introduce a retry_ask_yes_no() helper function that handles retry with
small delay, asking the user, and restoring errno.

mingw_unlink: include _wchmod in the retry loop (which may fail if the
file is locked exclusively).

mingw_rmdir: include special error handling in the retry loop.

Signed-off-by: Karsten Blees <[email protected]>
Symlinks on Windows don't work the same way as on Unix systems. E.g. there
are different types of symlinks for directories and files, creating
symlinks requires administrative privileges etc.

By default, disable symlink support on Windows. I.e. users explicitly have
to enable it with 'git config [--system|--global] core.symlinks true'.

The test suite ignores system / global config files. Allow testing *with*
symlink support by checking if native symlinks are enabled in MSys2 (via
'MSYS=winsymlinks:nativestrict').

Reminder: This would need to be changed if / when we find a way to run the
test suite in a non-MSys-based shell (e.g. dash).

Signed-off-by: Karsten Blees <[email protected]>
_wunlink() / DeleteFileW() refuses to delete symlinks to directories. If
_wunlink() fails with ERROR_ACCESS_DENIED, try _wrmdir() as well.

Signed-off-by: Karsten Blees <[email protected]>
MSVCRT's _wrename() cannot rename symlinks over existing files: it returns
success without doing anything. Newer MSVCR*.dll versions probably do not
have this problem: according to CRT sources, they just call MoveFileEx()
with the MOVEFILE_COPY_ALLOWED flag.

Get rid of _wrename() and call MoveFileEx() with proper error handling.

Signed-off-by: Karsten Blees <[email protected]>
If symlinks are enabled, resolve all symlinks when changing directories,
as required by POSIX.

Note: Git's real_path() function bases its link resolution algorithm on
this property of chdir(). Unfortunately, the current directory on Windows
is limited to only MAX_PATH (260) characters. Therefore using symlinks and
long paths in combination may be problematic.

Note: GetFinalPathNameByHandleW() was introduced with symlink support in
Windows Vista. Thus, for compatibility with Windows XP, we need to load it
dynamically and behave gracefully if it isnt's available.

Signed-off-by: Karsten Blees <[email protected]>
Implement readlink() by reading NTFS reparse points. Works for symlinks
and directory junctions. If symlinks are disabled, fail with ENOSYS.

Signed-off-by: Karsten Blees <[email protected]>
Implement symlink() that always creates file symlinks. Fails with ENOSYS
if symlinks are disabled or unsupported.

Note: CreateSymbolicLinkW() was introduced with symlink support in Windows
Vista. For compatibility with Windows XP, we need to load it dynamically
and fail gracefully if it isnt's available.

Signed-off-by: Karsten Blees <[email protected]>
Symlinks on Windows have a flag that indicates whether the target is a file
or a directory. Symlinks of wrong type simply don't work. This even affects
core Win32 APIs (e.g. DeleteFile() refuses to delete directory symlinks).

However, CreateFile() with FILE_FLAG_BACKUP_SEMANTICS doesn't seem to care.
Check the target type by first creating a tentative file symlink, opening
it, and checking the type of the resulting handle. If it is a directory,
recreate the symlink with the directory flag set.

It is possible to create symlinks before the target exists (or in case of
symlinks to symlinks: before the target type is known). If this happens,
create a tentative file symlink and postpone the directory decision: keep
a list of phantom symlinks to be processed whenever a new directory is
created in mingw_mkdir().

Limitations: This algorithm may fail if a link target changes from file to
directory or vice versa, or if the target directory is created in another
process.

Signed-off-by: Karsten Blees <[email protected]>
The SVN library doesn't seem to support symlinks, even if symlinks are
enabled in MSys and Git. Use 'cp' instead of 'ln -s'.

Signed-off-by: Karsten Blees <[email protected]>
In test msysgit#49, $(pwd) must match $(readlink), which is an MSys utility.

Signed-off-by: Karsten Blees <[email protected]>
@kblees
Copy link
Author

kblees commented May 24, 2015

Updated the PR with the fixes suggested by @dscho. Previous version (and per-commit discussion) can be found here: https://github.com/kblees/git/commits/kb/symlinks-v0.

@dscho
Copy link
Member

dscho commented May 24, 2015

Thank you so much! My machine is currently building the new msys2-runtime version, which is required for this to work, but I will merge now already lest I forget ;-)

dscho added a commit that referenced this pull request May 24, 2015
@dscho dscho merged commit 49b9363 into git-for-windows:master May 24, 2015
@dscho
Copy link
Member

dscho commented May 24, 2015

msys2-runtime is uploaded.

dscho added a commit that referenced this pull request Jan 22, 2018
dscho added a commit that referenced this pull request Feb 16, 2018
dscho added a commit that referenced this pull request Mar 23, 2018
dscho added a commit that referenced this pull request Apr 3, 2018
dscho added a commit that referenced this pull request May 29, 2018
dscho added a commit that referenced this pull request May 29, 2018
PKRoma pushed a commit to PKRoma/git-for-windows that referenced this pull request Jun 22, 2018
PKRoma pushed a commit to PKRoma/git-for-windows that referenced this pull request Jun 22, 2018
PKRoma pushed a commit to PKRoma/git-for-windows that referenced this pull request Jun 22, 2018
dscho added a commit that referenced this pull request Aug 22, 2018
dscho added a commit that referenced this pull request Aug 23, 2018
dscho added a commit that referenced this pull request Aug 23, 2018
dscho added a commit that referenced this pull request Aug 23, 2018
jamill pushed a commit to jamill/git that referenced this pull request Aug 28, 2018
jamill pushed a commit to jamill/git that referenced this pull request Aug 28, 2018
jamill pushed a commit to jamill/git that referenced this pull request Sep 5, 2018
git-for-windows-ci pushed a commit that referenced this pull request Sep 10, 2018
jamill pushed a commit to jamill/git that referenced this pull request Sep 11, 2018
git-for-windows-ci pushed a commit that referenced this pull request Sep 24, 2018
dscho added a commit that referenced this pull request Oct 10, 2018
dscho added a commit that referenced this pull request Oct 12, 2018
jeffhostetler pushed a commit to jeffhostetler/git that referenced this pull request Jul 9, 2019
This PR matches the commits that Junio is tracking from [this patch series](https://public-inbox.org/git/[email protected]/).
The only commit required that is extra is a merge between `vfs-2.22.0`
and `ds/commit-graph-incremental` because we took `ds/close-object-store`
a bit earlier (we needed it). Also, there was a small conflict in `packfile.[c|h]`
related to `ds/multi-pack-index-expire`.

Here is a copy of the cover letter for that series:

The commit-graph is a valuable performance feature for repos with large
commit histories, but suffers from the same problem as git repack: it
rewrites the entire file every time. This can be slow when there are
millions of commits, especially after we stopped reading from the
commit-graph file during a write in 43d3561 (commit-graph write: don't die
if the existing graph is corrupt).

Instead, create a "chain" of commit-graphs in the
.git/objects/info/commit-graphs folder with name graph-{hash}.graph. The
list of hashes is given by the commit-graph-chain file, and also in a "base
graph chunk" in the commit-graph format. As we read a chain, we can verify
that the hashes match the trailing hash of each commit-graph we read along
the way and each hash below a level is expected by that graph file.

When writing, we don't always want to add a new level to the stack. This
would eventually result in performance degradation, especially when
searching for a commit (before we know its graph position). We decide to
merge levels of the stack when the new commits we will write is less than
half of the commits in the level above. This can be tweaked by the
`--size-multiple` and `--max-commits` options.

The performance is necessarily amortized across multiple writes, so I tested
by writing commit-graphs from the (non-rc) tags in the Linux repo. My test
included 72 tags, and wrote everything reachable from the tag using 
`--stdin-commits`. Here are the overall perf numbers:

```
write --stdin-commits:         8m 12s
write --stdin-commits --split:    28s
write --split && verify --shallow: 60s
```
jakubgs added a commit to status-im/infra-role-beacon-node-windows that referenced this pull request Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants