-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebase to v2.48.1 #725
Rebase to v2.48.1 #725
Conversation
Calculate the number of symrefs, loose vs packed, and the maximal/accumulated length of local vs remote branches. Signed-off-by: Jeff Hostetler <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
In ac8acb4 (sparse-index: complete partial expansion, 2022-05-23), 'expand_index()' was updated to expand the index to a given pathspec. However, the 'path_matches_pattern_list()' method used to facilitate this has the side effect of initializing or updating the index hash variables ('name_hash', 'dir_hash', and 'name_hash_initialized'). This operation is performed on 'istate', though, not 'full'; as a result, the initialized hashes are later overwritten when copied from 'full'. To ensure the correct hashes are in 'istate' after the index expansion, change the arg used in 'path_matches_pattern_list()' from 'istate' to 'full'. Note that this does not fully solve the problem. If 'istate' does not have an initialized 'name_hash' when its contents are copied to 'full', initialized hashes will be copied back into 'istate' but 'name_hash_initialized' will be 0. Therefore, we also need to copy 'full->name_hash_initialized' back to 'istate' after the index expansion is complete. Signed-off-by: Victoria Dye <[email protected]>
Add test case to demonstrate that `git index-pack -o <idx-path> pack-path` fails if <idx-path> does not end in ".idx" when `--rev-index` is enabled. In e37d0b8 (builtin/index-pack.c: write reverse indexes, 2021-01-25) we learned to create `.rev` reverse indexes in addition to `.idx` index files. The `.rev` file pathname is constructed by replacing the suffix on the `.idx` file. The code assumes a hard-coded "idx" suffix. In a8dd7e0 (config: enable `pack.writeReverseIndex` by default, 2023-04-12) reverse indexes were enabled by default. If the `-o <idx-path>` argument is used, the index file may have a different suffix. This causes an error when it tries to create the reverse index pathname. The test here demonstrates the failure. (The test forces `--rev-index` to avoid interaction with `GIT_TEST_NO_WRITE_REV_INDEX` during CI runs.) Signed-off-by: Jeff Hostetler <[email protected]>
With this commit, we gather statistics about the sizes of commits, trees, and blobs in the repository, and then present them in the form of "hexbins", i.e. log(16) histograms that show how many objects fall into the 0..15 bytes range, the 16..255 range, the 256..4095 range, etc. For commits, we also show the total count grouped by the number of parents, and for trees we additionally show the total count grouped by number of entries in the form of "qbins", i.e. log(4) histograms. Signed-off-by: Jeff Hostetler <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
These seem to be custom tests to microsoft/git as they break without these changes, but these changes are not needed upstream. Signed-off-by: Derrick Stolee <[email protected]>
Add a test verifying that sparse-checkout (with and without sparse index enabled) treat untracked files & directories correctly when changing sparse patterns. Specifically, it ensures that 'git sparse-checkout set' * deletes empty directories outside the sparse cone * does _not_ delete untracked files outside the sparse cone Signed-off-by: Victoria Dye <[email protected]>
Teach index-pack to silently omit the reverse index if the index file does not have the standard ".idx" suffix. In e37d0b8 (builtin/index-pack.c: write reverse indexes, 2021-01-25) we learned to create `.rev` reverse indexes in addition to `.idx` index files. The `.rev` file pathname is constructed by replacing the suffix on the `.idx` file. The code assumes a hard-coded "idx" suffix. In a8dd7e0 (config: enable `pack.writeReverseIndex` by default, 2023-04-12) reverse indexes were enabled by default. If the `-o <idx-path>` argument is used, the index file may have a different suffix. This causes an error when it tries to create the reverse index pathname. Since we do not know why the user requested a non-standard suffix for the index, we cannot guess what the proper corresponding suffix should be for the reverse index. So we disable it. The t5300 test has been updated to verify that we no longer error out and that the .rev file is not created. TODO We could warn the user that we skipped it (perhaps only if they TODO explicitly requested `--rev-index` on the command line). TODO TODO Ideally, we should add an `--rev-index-path=<path>` argument TODO or change `--rev-index` to take a pathname. TODO TODO I'll leave these questions for a future series. Signed-off-by: Jeff Hostetler <[email protected]>
Prefetch the value of GIT_TRACE2_DST_DEBUG during startup and before we try to open any Trace2 destination pathnames. Normally, Trace2 always silently fails if a destination target cannot be opened so that it doesn't affect the execution of a Git command. The command should run normally, but just not generate any trace data. This can make it difficult to debug a telemetry setup, since the user doesn't know why telemetry isn't being generated. If the environment variable GIT_TRACE2_DST_DEBUG is true, the Trace2 startup will print a warning message with the `errno` to make debugging easier. However, on Windows, looking up the env variable resets `errno` so the warning message always ends with `...tracing: No error` which is not very helpful. Prefetch the env variable at startup. This avoids the need to update each call-site to capture `errno` in the usual `saved-errno` variable. Signed-off-by: Jeff Hostetler <[email protected]>
Create `struct large_item` and `struct large_item_vec` to capture the n largest commits, trees, and blobs under various scaling dimensions, such as size in bytes, number of commit parents, or number of entries in a tree. Each of these have a command line option to set them independently. Signed-off-by: Jeff Hostetler <[email protected]>
Include the pathname of each blob or tree in the large_item_vec to help identify the file or directory associated with the OID and size information. This pathname is computed during the path walk, so it reflects the first observed pathname seen for that OID during the traversal over all of the refs. Since the file or directory could have moved (without being modified), there may be multiple "correct" pathnames for a particular OID. Since we do not control the ref traversal order, we should consider it to be a "suggested pathname" for the OID. Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Jeff Hostetler <[email protected]>
Computing `git name-rev` on each commit, tree, and blob in each of the various large_item_vec can be very expensive if there are too many refs, especially if the user doesn't need the result. Lets make it optional. The `--no-name-rev` option can save 50 calls to `git name-rev` since we have 5 large_item_vec's and each defaults to 10 items. Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Jeff Hostetler <[email protected]>
This topic branch brings in a new, experimental built-in command to assess the dimensions of a local repository. It is experimental and subject to change! It might grow new options, change its output, or even be moved into `git diagnose --analyze` or something like that. The hope is that this command, which was inspired by `git sizer` (https://github.com/github/git-sizer), will be helpful not only in diagnosing issues with large repositories, but also in modeling what shapes and sizes of repositories can be handled by Git (and as a corollary: where Git needs to improve to be able to accommodate the natural growth of repositories). Signed-off-by: Johannes Schindelin <[email protected]>
This backports the `ds/advice-sparse-index-expansion` patches into `microsoft/git` which _just_ missed the v2.46.0 window. Signed-off-by: Johannes Schindelin <[email protected]>
Cherry-pick rev-index fixes from v2.41.0.vfs.0.5 into v2.42.0.*
While using the reset --stdin feature on windows path added may have a \r at the end of the path that wasn't getting removed so didn't match the path in the index and wasn't reset. Signed-off-by: Kevin Willford <[email protected]>
Prefetch the value of GIT_TRACE2_DST_DEBUG during startup and before we try to open any Trace2 destination pathnames. Normally, Trace2 always silently fails if a destination target cannot be opened so that it doesn't affect the execution of a Git command. The command should run normally, but just not generate any trace data. This can make it difficult to debug a telemetry setup, since the user doesn't know why telemetry isn't being generated. If the environment variable GIT_TRACE2_DST_DEBUG is true, the Trace2 startup will print a warning message with the `errno` to make debugging easier. However, on Windows, looking up the env variable resets `errno` so the warning message always ends with `...tracing: No error` which is not very helpful. Prefetch the env variable at startup. This avoids the need to update each call-site to capture `errno` in the usual `saved-errno` variable.
It has been a long-standing practice in Git for Windows to append `.windows.<n>`, and in microsoft/git to append `.vfs.0.0`. Let's keep doing that. Signed-off-by: Johannes Schindelin <[email protected]>
…sitories (#667) This command is inspired by [`git sizer`](https://github.com/github/git-sizer), having the advantage of being much closer to the internals of Git. The intention is to provide a built-in command that can be used to analyze large repositories for performance and scaling problems, for growth over time, and to correlate with other measurements (in particular with Trace2 data collected e.g. via https://github.com/git-ecosystem/trace2receiver/).
Since we really want to be based on a `.vfs.*` tag, let's make sure that there was a new-enough one, i.e. one that agrees with the first three version numbers of the recorded default version. This prevents e.g. v2.22.0.vfs.0.<some-huge-number>.<commit> from being used when the current release train was not yet tagged. It is important to get the first three numbers of the version right because e.g. Scalar makes decisions depending on those (such as assuming that the `git maintenance` built-in is not available, even though it actually _is_ available). Signed-off-by: Johannes Schindelin <[email protected]>
This header file will accumulate GVFS-specific definitions. Signed-off-by: Kevin Willford <[email protected]>
This does not do anything yet. The next patches will add various values for that config setting that correspond to the various features offered/required by GVFS. Signed-off-by: Kevin Willford <[email protected]> gvfs: refactor loading the core.gvfs config value This code change makes sure that the config value for core_gvfs is always loaded before checking it. Signed-off-by: Kevin Willford <[email protected]>
This takes a substantial amount of time, and if the user is reasonably sure that the files' integrity is not compromised, that time can be saved. Git no longer verifies the SHA-1 by default, anyway. Signed-off-by: Kevin Willford <[email protected]> Update for 2023-02-27: This feature was upstreamed as the index.skipHash config option. This resulted in some changes to the struct and some of the setup code. In particular, the config reading was moved to prepare_repo_settings(), so the core.gvfs bit check was moved there, too. Signed-off-by: Kevin Willford <[email protected]> Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Kevin Willford <[email protected]>
Prevent the sparse checkout to delete files that were marked with skip-worktree bit and are not in the sparse-checkout file. This is because everything with the skip-worktree bit turned on is being virtualized and will be removed with the change of HEAD. There was only one failing test when running with these changes that was checking to make sure the worktree narrows on checkout which was expected since we would no longer be narrowing the worktree. Update 2022-04-05: temporarily set 'sparse.expectfilesoutsideofpatterns' in test (until we start disabling the "remove present-despite-SKIP_WORKTREE" behavior with 'core.virtualfilesystem' in a later commit). Signed-off-by: Kevin Willford <[email protected]>
I will intend to send this upstream after the 2.47.0 release cycle, but this should get to our microsoft/git users for maximum impact. Customers have been struggling with explaining why the sparse index expansion advice message is showing up. The advice to run 'git clean' has not always helped folks, and sometimes it is very unclear why we are running into trouble. These changes introduce a way to log a reason for the expansion into the trace2 logs so it can be found by requesting that a user enable tracing. While testing this, I created the most standard case that happens, which is to have an existing directory match a sparse directory in the index. In this case, it showed that two log messages were required. See the last commit for this new log message. Together, these two places show this kind of message in the `GIT_TRACE2_PERF` output (trimmed for clarity): ``` region_enter | index | label:clear_skip_worktree_from_present_files_sparse data | sparse-index | ..skip-worktree sparsedir:<my-sparse-path>/ data | index | ..sparse_path_count:362 data | index | ..sparse_lstat_count:732 region_leave | index | label:clear_skip_worktree_from_present_files_sparse data | sparse-index | expansion-reason:failed to clear skip-worktree while sparse ``` I added some tests to demonstrate that these logs are recorded, but it also seems difficult to hit some of these cases.
These two tests in t5616-partial-clone.sh are actually already broken and there are comments supporting that. Those comments were focused on the GIT_TEST_FULL_NAME_HASH variable, but they also apply to this one. We will want to avoid issues here. Signed-off-by: Derrick Stolee <[email protected]>
In preparation for allowing both the --shallow and --path-walk options in the 'git pack-objects' builtin, create a new 'edge_aggressive' option in the path-walk API. This option will help walk the boundary more thoroughly and help avoid sending extra objects during fetches and pushes. The only use of the 'edge_hint_aggressive' option in the revision API is within mark_edges_uninteresting(), which is usually called before between prepare_revision_walk() and before visiting commits with get_revision(). In prepare_revision_walk(), the UNINTERESTING commits are walked until a boundary is found. We didn't use this in the past because we would mark objects UNINTERESTING after doing the initial commit walk to the boundary. While we should be marking these objects as UNINTERESTING, we shouldn't _emit_ them all via the path-walk algorithm or else our delta calculations will get really slow. Based on these observations, the way we were handling the UNINTERESTING flag in walk_objects_by_path() was overly complicated and buggy. A lot of it can be removed and simplified to work with this new approach. It also means that we will see the UNINTERESTING boundaries of paths when doing a default path-walk call, changing some existing test cases. Signed-off-by: Derrick Stolee <[email protected]>
In some instances (particularly the `read_object` hook), the `cmd` attribute is set to an `strdup()`ed value. This value needs to be released in the end! Since other users assign a non-`strdup()`ed value, be careful to add _another_ attribute (called `to_free`) that can hold a reference to such a string that needs to be released once the sub process is done. Signed-off-by: Johannes Schindelin <[email protected]>
There does not appear to be anything particularly incompatible about the --shallow and --path-walk options of 'git pack-objects'. If shallow commits are to be handled differently, then it is by the revision walk that defines the commit set and which are interesting or uninteresting. However, before the previous change, a trivial removal of the warning would cause a failure in t5500-fetch-pack.sh when GIT_TEST_PACK_PATH_WALK is enabled. The shallow fetch would provide more objects than we desired, due to some incorrect behavior of the path-walk API, especially around walking uninteresting objects. To also cover the symmetrical case of pushing from a shallow clone, add a new test to t5538-push-shallow.sh that confirms the correct behavior of pushing only the new object. This works to validate both the --path-walk and --no-path-walk case when toggling the GIT_TEST_PACK_PATH_WALK environment variable. This test would have failed in the --path-walk case if we created it before the previous change. Signed-off-by: Derrick Stolee <[email protected]>
This fixes a leak that is not detected by Git's test suite (but by microsoft/git's). Signed-off-by: Johannes Schindelin <[email protected]>
It can be notoriously difficult to detect if delta bases are being computed properly during 'git push'. Construct an example where it will make a kilobyte worth of difference when a delta base is not found. We can then use the progress indicators to distinguish between bytes and KiB depending on whether the delta base is found and used. Signed-off-by: Derrick Stolee <[email protected]>
An internal customer reported a segfault when running `git sparse-checkout set` with the `index.sparse` config enabled. I was unable to reproduce it locally, but with their help we debugged into the failing process and discovered the following stacktrace: ``` #0 0x00007ff6318fb7b0 in rehash (map=0x3dfb00d0440, newsize=1048576) at hashmap.c:125 #1 0x00007ff6318fbc66 in hashmap_add (map=0x3dfb00d0440, entry=0x3dfb5c58bc8) at hashmap.c:247 #2 0x00007ff631937a70 in hash_index_entry (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:122 #3 0x00007ff631938a2f in add_name_hash (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:638 #4 0x00007ff631a064de in set_index_entry (istate=0x3dfb00d0400, nr=8291, ce=0x3dfb5c58bc8) at sparse-index.c:255 #5 0x00007ff631a06692 in add_path_to_index (oid=0x5ff130, base=0x5ff580, path=0x3dfb4b725da "<redacted>", mode=33188, context=0x5ff570) at sparse-index.c:307 #6 0x00007ff631a3b48c in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41f60, base=0x5ff580, depth=2, pathspec=0x5ff5a0, fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:46 #7 0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41e80, base=0x5ff580, depth=1, pathspec=0x5ff5a0, fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80 #8 0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41ac8, base=0x5ff580, depth=0, pathspec=0x5ff5a0, fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80 #9 0x00007ff631a06a95 in expand_index (istate=0x3dfb00d0100, pl=0x0) at sparse-index.c:422 #10 0x00007ff631a06cbd in ensure_full_index (istate=0x3dfb00d0100) at sparse-index.c:456 #11 0x00007ff631990d08 in index_name_stage_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21, stage=0, search_mode=EXPAND_SPARSE) at read-cache.c:556 #12 0x00007ff631990d6c in index_name_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21) at read-cache.c:566 #13 0x00007ff63180dbb5 in sanitize_paths (argc=185, argv=0x3dfb0030018, prefix=0x0, skip_checks=0) at builtin/sparse-checkout.c:756 #14 0x00007ff63180de50 in sparse_checkout_set (argc=185, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:860 #15 0x00007ff63180e6c5 in cmd_sparse_checkout (argc=186, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:1063 #16 0x00007ff6317234cb in run_builtin (p=0x7ff631ad9b38 <commands+2808>, argc=187, argv=0x3dfb0030018) at git.c:548 #17 0x00007ff6317239c0 in handle_builtin (argc=187, argv=0x3dfb0030018) at git.c:808 #18 0x00007ff631723c7d in run_argv (argcp=0x5ffdd0, argv=0x5ffd78) at git.c:877 #19 0x00007ff6317241d1 in cmd_main (argc=187, argv=0x3dfb0030018) at git.c:1017 #20 0x00007ff631838b60 in main (argc=190, argv=0x3dfb0030000) at common-main.c:64 ``` The very bottom of the stack being the `rehash()` method from `hashmap.c` as called within the `name-hash` API made me look at where these hashmaps were being used in the sparse index logic. These were being copied across indexes, which seems dangerous. Indeed, clearing these hashmaps and setting them as not initialized fixes the segfault. The second commit is a response to a test failure that happens in `t1092-sparse-checkout-compatibility.sh` where `git stash pop` starts to fail because the underlying `git checkout-index` process fails due to colliding files. Passing the `-f` flag appears to work, but it's unclear why this name-hash change causes that change in behavior.
This fixes a leak that is not detected by Git's own test suite (but by microsoft/git's, in the t9210-scalar.sh test). Signed-off-by: Johannes Schindelin <[email protected]>
This pull request aims to correct a pretty big issue when dealing with UNINTERESTING objects in the path-walk API. They somehow were only exposed when trying to perform a push from a shallow clone. This will require rewriting the upstream version so this is avoided from the start, but we can do a forward fix for now. The key issue is that the path-walk API was not walking UNINTERESTING trees at the right time, and the way it was being done was more complicated than it needed to be. This changes some of the way the path-walk API works in the presence of UNINTERSTING commits, but these are good changes to make. I had briefly attempted to remove the use of the `edge_aggressive` option in `struct path_walk_info` in favor of using the `--objects-edge-aggressive` option in the revision struct. When I started down that road, though, I somehow got myself into a bind of things not working correctly. I backed out to this version that is working with our test cases. I tested this using the thin and big pack tests in `p5313` which had the same performance as before this change. The new change is that in a shallow clone we can get the same `git push` improvements. I was hung up on testing this for a long time as I wasn't getting the same results in my shallow clone as in my regular clones. It turns out that I had forgotten to use `--no-reuse-delta` in my test command, so it was picking the deltas that were given by the initial clone instead of picking new ones per the algorithm. 🤦🏻
Git v2.48.0 has become even more stringent about leaks. Signed-off-by: Johannes Schindelin <[email protected]>
The check for dubious ownership has one particular quirk on Windows: if running as an administrator, files owned by the Administrators _group_ are considered owned by the user. The rationale for that is: When running in elevated mode, Git creates files that aren't owned by the individual user but by the Administrators group. There is yet another quirk, though: The check I introduced to determine whether the current user is an administrator uses the `CheckTokenMembership()` function with the current process token. And that check only succeeds when running in elevated mode! Let's be a bit more lenient here and look harder whether the current user is an administrator. We do this by looking for a so-called "linked token". That token exists when administrators run in non-elevated mode, and can be used to create a new process in elevated mode. And feeding _that_ token to the `CheckTokenMembership()` function succeeds! Signed-off-by: Johannes Schindelin <[email protected]>
This adds a new sub-sub-command for `test-tool`, simply passing through the command-line arguments to the `is_path_owned_by_current_user()` function. Signed-off-by: Johannes Schindelin <[email protected]>
The --path-walk option in `git pack-objects` is implied by the pack.usePathWalk=true config value. This is intended to help the packfile generation within `git push` specifically. While this config does enable the path-walk feature, it does not lead to the expected levels of compression in the cases it was designed to handle. This is due to the default implication of the --reuse-delta option as well as auto-GC. In the performance tests used to evaluate the --path-walk option, such as those in p5313, the --no-reuse-delta option is used to ensure that deltas are recomputed according to the new object walk. However, it was assumed (I assumed this) that when the objects were loose from client-side operations that better deltas would be computed during this operation. This wasn't confirmed because the test process used data that was fetched from real repositories and thus existed in packed form only. I was able to confirm that this does not reproduce when the objects to push are loose. Careful use of making the pushed commit unreachable and loosening the objects via `git repack -Ad` helps to confirm my suspicions here. Independent of this change, I'm pushing for these pipeline agents to set `gc.auto=0` before creating their Git objects. In the current setup, the repo is adding objects and then incrementally repacking them and ending up with bad cross-path deltas. This approach can help scenarios where that makes sense, but will not cover all of our users without them choosing to opt-in to background maintenance (and even then, an incremental repack could cost them efficiency). In order to make sure we are getting the intended compression in `git push`, this change enforces the spawned `git pack-objects` process to use `--no-reuse-delta`. As far as I can tell, the main motivation for implying the --reuse-delta option by default is two-fold: 1. The code in send-pack.c that executes 'git pack-objects' is ignorant of whether the current process is a client pushing to a remote or a remote sending a fetch or clone to a client. 2. For servers, it is critical that they trust the previously computed deltas whenever possible, or they could overload their CPU resources. There's also the side that most servers use repacking logic that will replace any bad deltas that are sent by clients (or at least, that's the hope; we've seen that repacks can also pick bad deltas). This commit also adds a test case that demonstrates that `git -c pack.usePathWalk=true push` now avoids reusing deltas. To do this, the test case constructs a pack with a horrendously inefficient delta object, then verifies that the pack on the receiving side of the `push` fails to have such an inefficient delta. The test case would probably be a lot more readable if hex numbers were used instead of octal numbers, but alas, `printf "\x<hex>"` is not portable, only `printf "\<octal>"` is. For example, dash's built-in `printf` function simply prints `\x` verbatim while bash's built-in happily converts this construct to the corresponding byte. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
The check for dubious ownership has one particular quirk on Windows: if running as an administrator, files owned by the Administrators _group_ are considered owned by the user. The rationale for that is: When running in elevated mode, Git creates files that aren't owned by the individual user but by the Administrators group. There is yet another quirk, though: The check I introduced to determine whether the current user is an administrator uses the `CheckTokenMembership()` function with the current process token. And that check only succeeds when running in elevated mode! Let's be a bit more lenient here and look harder whether the current user is an administrator. We do this by looking for a so-called "linked token". That token exists when administrators run in non-elevated mode, and can be used to create a new process in elevated mode. And feeding _that_ token to the `CheckTokenMembership()` function succeeds!
Update the WinGet release workflow to match the updating manifest in `microsoft/winget-pkgs`, where there are now four installation options: - x86_64 / x64 with machine scope - x86_64 / x64 with user scope - aarch64 / arm64 with machine scope - aarch64 / arm64 with user scope Signed-off-by: Matthew John Cheetham <[email protected]>
The --path-walk option in 'git pack-objects' is implied by the pack.usePathWalk=true config value. This is intended to help the packfile generation within 'git push' specifically. While this config does enable the path-walk feature, it does not lead the expected levels of compression in the cases it was designed to handle. This is due to the default implication of the --reuse-delta option as well as auto-GC. In the performance tests used to evaluate the --path-walk option, such as those in p5313, the --no-reuse-delta option is used to ensure that deltas are recomputed according to the new object walk. However, it was assumed (I assumed this) that when the objects were loose from client-side operations that better deltas would be computed during this operation. This wasn't confirmed because the test process used data that was fetched from real repositories and thus existed in packed form only. I was able to confirm that this does not reproduce when the objects to push are loose. Careful use of making the pushed commit unreachable and loosening the objects via 'git repack -Ad' helps to confirm my suspicions here. Independent of this change, I'm pushing for these pipeline agents to set 'gc.auto=0' before creating their Git objects. In the current setup, the repo is adding objects and then incrementally repacking them and ending up with bad cross-path deltas. This approach can help scenarios where that makes sense, but will not cover all of our users without them choosing to opt-in to background maintenance (and even then, an incremental repack could cost them efficiency). In order to make sure we are getting the intended compression in 'git push', this change makes the --path-walk option imply --no-reuse-delta when the --reuse-delta option is not provided. As far as I can tell, the main motivation for implying the --reuse-delta option by default is two-fold: 1. The code in send-pack.c that executes 'git pack-objects' is ignorant of whether the current process is a client pushing to a remote or a remote sending a fetch or clone to a client. 2. For servers, it is critical that they trust the previously computed deltas whenever possible, or they could overload their CPU resources. There's also the side that most servers use repacking logic that will replace any bad deltas that are sent by clients (or at least, that's the hope; we've seen that repacks can also pick bad deltas). The --path-walk option at the moment is not compatible with reachability bitmaps, so is not planned to be used by Git servers. Thus, we can reasonably assume (for now) that the --path-walk option is assuming a client-side scenario, either a push or a repack. The repack option will be explicit about the --reuse-delta option or not. One thing to be careful about is background maintenance, which uses a list of objects instead of refs, so we condition this on the case where the --path-walk option will be effective by checking that the --revs option was provided. Alternative options considered included: * Adding _another_ config ('pack.reuseDelta=false') to opt-in to this choice. However, we already have pack.usePathWalk=true as an opt-in to "do the right thing to make my data small" as far as our internal users are concerned. * Modify the chain between builtin/push.c, transport.c, and builtin/send-pack.c to communicate that we are in "push" mode, not within a fetch or clone. However, this seemed like overkill. It may be beneficial in the future to pass through a mode like this, but it does not meet the bar for the immediate need. Reviewers, please see git-for-windows#5171 for the baseline implementation of this feature within Git for Windows and thus microsoft/git. This feature is still under review upstream.
Tests in t7900 assume the state of the `maintenance.strategy` config setting; set/unset by previous tests. Correct this by explictly unsetting and re-setting the config at the start of the tests. Signed-off-by: Matthew John Cheetham <[email protected]>
Introduce a new maintenance task, `cache-local-objects`, that operates on Scalar or VFS for Git repositories with a per-volume, shared object cache (specified by `gvfs.sharedCache`) to migrate packfiles and loose objects from the repository object directory to the shared cache. Older versions of `microsoft/git` incorrectly placed packfiles in the repository object directory instead of the shared cache; this task will help clean up existing clones impacted by that issue. Migration of packfiles involves the following steps for each pack: 1. Hardlink (or copy): a. the .pack file b. the .keep file c. the .rev file 2. Move (or copy + delete) the .idx file 3. Delete/unlink: a. the .pack file b. the .keep file c. the .rev file Moving the index file after the others ensures the pack is not read from the new cache directory until all associated files (rev, keep) exist in the cache directory also. Moving loose objects operates as a move, or copy + delete. Signed-off-by: Matthew John Cheetham <[email protected]>
Add the `cache-local-objects` maintenance task to the list of tasks run by the `scalar run` command. It's often easier for users to run the shorter `scalar run` command than the equivalent `git maintenance` command. Signed-off-by: Matthew John Cheetham <[email protected]>
Introduce a new maintenance task, `cache-local-objects`, that operates on Scalar or VFS for Git repositories with a per-volume, shared object cache (specified by `gvfs.sharedCache`) to migrate packfiles and loose objects from the repository object directory to the shared cache. Older versions of `microsoft/git` incorrectly placed packfiles in the repository object directory instead of the shared cache; this task will help clean up existing clones impacted by that issue. Fixes #716
Range-diff relative to tentative/vfs-2.48.0
No big surprises there, basically a lot of |
Ah, Meson. You again. Signed-off-by: Johannes Schindelin <[email protected]>
Over at git-for-windows#5411, I am still doing the Git for Windows v2.48.1 release (which is held up essentially by git-for-windows/build-extra#591). Once that is done, this here PR can be "merged" via pushing.