Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GitStatusCache when it's installed. #208

Merged
merged 27 commits into from
Jan 15, 2017

Conversation

cmarcusreid
Copy link
Collaborator

Cache status information in external git-status-cache process and retrieve using git-status-cache-posh-client scripts. This pull request uses git-status-cache-posh-client's Get-GitStatusFromCache function when the module is installed. This function returns status information in a format that's easily consumable within posh-git.

The external cache is built on top of libgit2 and dramatically improves performance on large repositories. Cache hits do not become more expensive with larger repositories. Performance of both cache hits and cache misses with the cache is significantly faster than current status retrieval.

Inspired by:

My primary motivator was making posh-git usable on some very large repositories I interact with every day; this solution has worked great and has been baking for a while. Recognizing that there are previous attempts at this type of improvement I've attempted to incorporate feedback from the earlier conversations.

I'm happy to address any additional comments or feedback!


Performance

Cost for serving a cache hit in the cache process is generally between 0.1-0.3 ms, but this metric doesn't include the overhead involved in a full request.

The following measurements were taken on git repositories containing the specified file count. Each file was a text file containing a single sentence of text. Each case was run 5 times and the numbers reported below are averages. Each individual measurement was taken with a high resolution timer at 1 ms precision.

Request from git-status-cache-posh-client

The measurements below capture:

  • Serializing a request to JSON in PowerShell.
  • Sending the JSON request over the named pipe.
  • Compute the current git status (cache miss) or retrieving it from the cache (cache hit) int the cache process.
  • Sending the JSON response over the named pipe back to the PowerShell client.
  • Deserializing the JSON response into a PSCustomObject
No files modified
cache miss cache hit
10 file repository 3.0 ms 1.0 ms
100 file respository 3.0 ms 1.0 ms
1,000 file respository 5.2 ms 1.0 ms
10,000 file respository 22.4 ms 1.0 ms
100,000 file repository 176.6 ms 1.0 ms
10 files modified
cache miss cache hit
10 file repository 2.8 ms 1.0 ms
100 file respository 2.8 ms 1.0 ms
1,000 file respository 4.4 ms 1.0 ms
10,000 file respository 20.6 ms 1.0 ms
100,000 file repository 176.6 ms 1.0 ms

Using git-status-cache-posh-client to back posh-git

The measurements below capture all the steps from the git-status-cache-posh-client section as well as the cost for posh-git to render the prompt. This extra step has a fixed cost of around 7-10 ms.

Costs were gathered using timestamps from posh-git's debug output. Timings for posh-git without git-status-cache and git-status-cache-posh-client are included for in the "posh-git without cache" column for reference.

No files modified
cache miss cache hit posh-git without cache
10 file repository 11.8 ms 8.6 ms 57.0 ms
100 file respository 11.8 ms 8.6 ms 62.2 ms
1,000 file respository 14.2 ms 8.6 ms 68.2 ms
10,000 file respository 30.8 ms 8.0 ms 133.8 ms
100,000 file repository 181.6 ms 9.6 ms 754.4 ms
10 files modified
cache miss cache hit posh-git without cache
10 file repository 12.2 ms 9.4 ms 63.2 ms
100 file respository 12.2 ms 9.4 ms 65.0 ms
1,000 file respository 14.4 ms 9.4 ms 72.8 ms
10,000 file respository 31.0 ms 9.4 ms 134.0 ms
100,000 file repository 179.6 ms 9.0 ms 763.6 ms

@@ -117,45 +117,73 @@ function Get-GitStatus($gitDir = (Get-GitDirectory)) {
$filesUnmerged = @()

if($settings.EnableFileStatus -and !$(InDisabledRepository)) {
dbg 'Getting status' $sw
$status = git -c color.status=false status --short --branch 2>$null
} else {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this else block got lost in the shuffle?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but removed intentionally. If I'm reading the original code correctly the empty status set by the else block serves to make the loop no-op (and has no additional references below). In this version the loop will only be hit when settings.EnableFileStatus is true (checked before checking whether to get it from the cache or from "git status"), so the else block is now dead code. Please correct me if I'm misunderstanding.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was thinking we needed, but you're absolutely right that it's dead code.

        if($settings.EnableFileStatus -and !$(InDisabledRepository)) {
            if ($settings.EnableFileStatusFromCache -eq $null) {
                $settings.EnableFileStatusFromCache = (Get-Module GitStatusCachePoshClient) -ne $null
            }

            if ($settings.EnableFileStatusFromCache) {
                dbg 'Getting status from cache' $sw
                ...
            } else {
                dbg 'Getting status' $sw
                $status = ...
            }
        } else {
            $status = @()
            # Never used            
        }

@dahlbyk
Copy link
Owner

dahlbyk commented Aug 25, 2015

My primary motivator was making posh-git usable on some very large repositories I interact with every day; this solution has worked great and has been baking for a while. Recognizing that there are previous attempts at this type of improvement I've attempted to incorporate feedback from the earlier conversations.

💯 This does seem to address all my concerns with the previous implementations.

  • Cache is optional.
  • No loss of information.
  • Effectively zero changes to the mechanism for writing status.

To test this out:

  1. Clone git-status-cache-posh-client somewhere.
  2. Run install.ps1 Create a bin/ directory with GitStatusCache.exe from git-status-cache Latest
  3. Add Import-Module <path-to>\git-status-cache-posh-client\GitStatusCachePoshClient to $PROFILE

So far the caching seems to be working exactly as I would expect!

@dahlbyk
Copy link
Owner

dahlbyk commented Aug 25, 2015

I did just introduce a merge conflict on you... 😿

@cmarcusreid
Copy link
Collaborator Author

If you run install.ps1 in git-status-cache-posh-client, it should do the download on your behalf as part of setup as well as add the import to your profile. I like the idea of the warning or auto-download in the case where you enable it through another path and will follow-up on cmarcusreid/git-status-cache-posh-client#4 (may be a few days before I get a chance).

@dahlbyk
Copy link
Owner

dahlbyk commented Aug 25, 2015

If you run install.ps1 in git-status-cache-posh-client, it should do the download on your behalf as part of setup as well as add the import to your profile.

Ironic that I would overlook an install.ps1 modeled after the one we have here! You should mention that in the README. 😀

@cmarcusreid
Copy link
Collaborator Author

README clarified. Not sure how I forgot that. :)

@dahlbyk
Copy link
Owner

dahlbyk commented Aug 25, 2015

README clarified. Not sure how I forgot that. :)

👍


I'm hoping to get a bit of visibility on this to kick the tires before merging, but I'm not too worried about it since the caching is optional.

@ajryan
Copy link

ajryan commented Sep 4, 2015

I just tested this and it rocks. Waaay faster. Thanks guys!

@andrewpmartinez
Copy link

So much nicer. posh-git was starting to go incredibly slow.

@MarkusAmshove
Copy link
Contributor

Hi @cmarcusreid ,

this really works well!

Can you consider adding the StashStatus from #215 into your Pipelines?

I tried to file a PR at your repositories, but I'm no C++ guy and can't get the project building :-)

@Kantis
Copy link

Kantis commented Sep 15, 2015

Fails on windows 10, had to add -UseBasicParsing to all wgets in GitStatusCachePoshClient.ps1 to make install.ps1 to run without errors.

Also required vcredist_x86 to be installed.

@cmarcusreid
Copy link
Collaborator Author

Hey @MarkusAmshove!

Stash support was added to the cache for in cmarcusreid/git-status-cache#12. Relevant integration for this PR is in 008836e. If you have both, posh-git should show it when the setting is enabled.

If you started using the cache before this change, you may need to update your cache bits (grab the latest git-status-cache-posh-client and call Update-GitStatusCache to download.)

You can check if the cache is returning stash information by calling Get-GitStatusFromCache:

D:\posh-git [useGitStatusCache ≡ +1 ~0 -0 ~]> git stash save "My sample stash."
Saved working directory and index state On useGitStatusCache: My sample stash.
HEAD is now at 008836e Read stash count from cache.
D:\posh-git [useGitStatusCache ≡]> $global:GitPromptSettings.EnableStashStatus = $true
D:\posh-git [useGitStatusCache ≡ (1)]> Get-GitStatusFromCache


Version           : 1
Path              : D:\posh-git
RepoPath          : D:/posh-git/.git/
WorkingDir        : D:/posh-git/
State             :
Branch            : useGitStatusCache
Upstream          : origin/useGitStatusCache
AheadBy           : 0
BehindBy          : 0
IndexAdded        : {}
IndexModified     : {}
IndexDeleted      : {}
IndexTypeChange   : {}
IndexRenamed      : {}
WorkingAdded      : {}
WorkingModified   : {}
WorkingDeleted    : {}
WorkingTypeChange : {}
WorkingRenamed    : {}
WorkingUnreadable : {}
Ignored           : {}
Conflicted        : {}
Stashes           : {@{Name=stash@{0}; Sha1Id=771975d52c2507e9c3a0bb6bc151d66af03f2ea; Message=On useGitStatusCache: My sample stash.}}

@cmarcusreid
Copy link
Collaborator Author

@Kantis I don't repro on Windows 10, so I wonder what's different about our environments. Could you clarify the following to help me get this fixed?

  1. What exactly was the failure?
  2. Was this on a call to the install script, retrieving information from the cache, or both?

@Kantis
Copy link

Kantis commented Sep 15, 2015

@cmarcusreid
1.

C:\dev\workspace\git-status-cache-posh-client [master ≡]> .\install.ps1
Creating directory for GitStatusCache.exe at C:\dev\workspace\git-status-cache-posh-client\bin.
Downloading C:\dev\workspace\git-status-cache-posh-client\bin\GitStatusCache.exe.
wget : The response content cannot be parsed because the Internet Explorer engine is not available, or Internet Explore
r's first-launch configuration is not complete. Specify the UseBasicParsing parameter and try again.
At C:\dev\workspace\git-status-cache-posh-client\GitStatusCachePoshClient.ps1:50 char:16
+ ...  $release = wget -Uri "https://api.github.com/repos/cmarcusreid/git-s ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotImplemented: (:) [Invoke-WebRequest], NotSupportedException
    + FullyQualifiedErrorId : WebCmdletIEDomNotSupportedException,Microsoft.PowerShell.Commands.InvokeWebRequestComman
   d

You cannot call a method on a null-valued expression.
At C:\dev\workspace\git-status-cache-posh-client\GitStatusCachePoshClient.ps1:32 char:9
+     if (-not $release.tag_name.StartsWith("v1."))
+         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

2 When calling the install script

Started IE once now and it fixed the error. Think vcredist might still be required though?

@cmarcusreid
Copy link
Collaborator Author

@Kantis Thanks. Are you by chance on a machine without Internet Explorer? I found this:

-UseBasicParsing
Uses the response object for HTML content without Document Object Model (DOM) parsing.
This parameter is required when Internet Explorer is not installed on the computers, such as on a Server Core installation of a Windows Server operating system.

EDIT: Perfect. That explains it. Thanks! Regarding vcredist: I'll take a look to see if that dependency can be removed. Opened cmarcusreid/git-status-cache#13 for that feedback.


Either way, I've logged issue cmarcusreid/git-status-cache-posh-client#9 on git-status-cache-posh-client and will get this fixed.

@lzybkr
Copy link
Collaborator

lzybkr commented Jan 2, 2017

Pipe to foreach is definitely slower than AddRange, but if possible, you should avoid the cast to [string[]] because that might copy the collection.

Instead, Get-GitStatusFromCache should ensure those properties implement IEnumerable<string> to make the cast unnecessary.

@cmarcusreid
Copy link
Collaborator Author

The response of Get-GitStatusFromCache is built by Convert-FromJson. Unfortunately Convert-FromJson converts arrays to Object[] even though they're strings in this case. I don't think we have a way to eliminate the cast (without doing equivalent work ourselves), but I'm open to suggestions if someone sees a better solution. I've merged 3e13ee8 based on the AddRange note.

@dahlbyk
Copy link
Owner

dahlbyk commented Jan 3, 2017

if possible, you should avoid the cast to [string[]] because that might copy the collection.

We could use [Linq.Enumerable]::Cast or [Linq.Enumerable]::OfType, though calling static generic methods seems to require a workaround.

@cmarcusreid
Copy link
Collaborator Author

I would expect the LINQ version to have comparable cost to the earlier foreach as it also incurs an enumeration. Probably the next step if we want to drill into this is to profile each option and see if there's an appreciable difference. Do you feel this is a worthwhile investigation?

@lzybkr
Copy link
Collaborator

lzybkr commented Jan 4, 2017

I expect the LINQ version to always win. I improved the overhead of ForEach-Object some in PowerShell 5.1, but it's still much more expensive than a method call.

I suppose it's possible if you have a huge number of objects (approaching physical memory limits), then the pipeline might win, but for typical workloads sensitive to performance, you should prefer .Net methods over the pipeline where possible.

@cmarcusreid
Copy link
Collaborator Author

Got it. Right now we're using the string[] cast. Any intuition on that versus the LINQ cast?

@lzybkr
Copy link
Collaborator

lzybkr commented Jan 4, 2017

They have differing semantics - LINQ cast will only do a C# cast whereas PowerShell [string[]] cast will convert each element of the array using PowerShell rules, e.g. calling ToString if the object is something other than a string.

That said, a LINQ cast avoids a copy of the array, so in theory it should be better, but it can't be called without reflection (PowerShell does not provide a way to specify a generic argument only used in the return type).

I measured the two options and LINQ is about 3x faster even with the reflection (tested with a 5000 strings in an object[]) using:

# Create 5000 strings
$objArray = @("a") * 5000
$collection = [System.Collections.Generic.List[string]]::new()
$m = [System.Linq.Enumerable].GetMethod('Cast').MakeGenericMethod([string])
$collection.AddRange($m.Invoke($null, (,$objArray)))

@cmarcusreid
Copy link
Collaborator Author

Excellent, thanks! I'll make this adjustment when I have a chance (probably sometime over the weekend).

@dahlbyk
Copy link
Owner

dahlbyk commented Jan 4, 2017

Working on this now, will push to this branch.

@dahlbyk
Copy link
Owner

dahlbyk commented Jan 4, 2017

Not thrilled with b07bce5, but it seems to work? PowerShell auto-enumerating stuff is tricky to work around. 😦

@dahlbyk dahlbyk mentioned this pull request Jan 6, 2017
@dahlbyk
Copy link
Owner

dahlbyk commented Jan 7, 2017

That is a reasonable concern, but the command invocation overhead is unfortunately not small, I would need to measure to know which is worse.

Per comments on b07bce5, I've inlined the reflected method call instead of wrapping it in a PowerShell function.


I know a few folks have mentioned they have been running with this locally for a while. Update to this branch's latest and confirm your projects still work as expected?

dbg 'Getting stash count' $sw
$stashCount = $null | git stash list 2>$null | measure-object | Select-Object -expand Count
if ($settings.EnableFileStatusFromCache -eq $null) {
$settings.EnableFileStatusFromCache = (Get-Module GitStatusCachePoshClient) -ne $null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to do this? It makes it harder to "turn off" this feature if something is misbehaving (or you want to test with it off). Right now, you'd have to unload (remove-module GitStatusCachePoshClient) and then be careful not to execute a command that would auto-load it again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It auto-selects based on the module's presence the first time through (when set to the default null value), but if you manually set it to false that'll stick on subsequent calls since it's no longer null.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@cmarcusreid
Copy link
Collaborator Author

It looks like rename operations currently break the prompt. Other operations look fine on my machine.

Manual execution of the cast throws an InvalidOperationException. Unable to cast object of type System.Management.Automation.PSObject to type System.String.

@rkeithhill
Copy link
Collaborator

@cmarcusreid Which rename operations? Is this the rename of the prompt function that the Chocolatey install adds to your profile? If so, that will be removed before 0.7.0 is released. Importing the module now puts in place the default posh-git prompt.

@cmarcusreid
Copy link
Collaborator Author

Rename operations on files in the index or working directories. :)

I believe it's just the new cast bit in this PR; I'll take a look sometime this weekend if no one beats me to it.

@@ -159,13 +159,19 @@ function Get-GitStatus($gitDir = (Get-GitDirectory)) {

$indexAdded.AddRange($castStringSeq.Invoke($null, (,@($cacheResponse.IndexAdded))))
$indexModified.AddRange($castStringSeq.Invoke($null, (,@($cacheResponse.IndexModified))))
$indexModified.AddRange($castStringSeq.Invoke($null, (,@($cacheResponse.IndexRenamed | Select-Object -ExpandProperty Old))))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast was choking on the System.Management.Automation.PSObject returned by Select-Object.

@@ -159,13 +159,19 @@ function Get-GitStatus($gitDir = (Get-GitDirectory)) {

$indexAdded.AddRange($castStringSeq.Invoke($null, (,@($cacheResponse.IndexAdded))))
$indexModified.AddRange($castStringSeq.Invoke($null, (,@($cacheResponse.IndexModified))))
$indexModified.AddRange($castStringSeq.Invoke($null, (,@($cacheResponse.IndexRenamed | Select-Object -ExpandProperty Old))))
$indexRenamedOld = $cacheResponse.IndexRenamed.Old
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just learned that you could do this today; what is this trick called (for search purposes)?

It returns either null (zero paths), a string (one path), or an object[] (many paths).

Copy link
Collaborator

@rkeithhill rkeithhill Jan 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Member enumeration but it is only available in PS v3 and higher. To make that work on PS v2, use:

$indexRenamedOld = foreach ($obj in $cacheResponse.IndexRenamed) { $obj.Old }

The above is a bit faster than both Select-Object and Foreach-Object and shouldn't mess with the type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drat. I'm reverting back to the initial foreach and add version for the renamed files as I don't see a way to use the AddRange version without doing more enumerations (extracting the property then casting from objects to strings). (Recall the earlier version with Select-Object didn't work due to Select-Object's return type.)

Please feel free to make additional tweaks if there's a faster approach.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm content with this.

@dahlbyk dahlbyk merged commit 6e471d2 into dahlbyk:master Jan 15, 2017
@dahlbyk
Copy link
Owner

dahlbyk commented Jan 15, 2017

🎉

Appreciate everyone's patience with this getting this merged. Thanks for the contribution, @cmarcusreid!

@cmarcusreid cmarcusreid deleted the useGitStatusCache branch January 15, 2017 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.