-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os: using CommandLineToArgv slows process startup significantly #15588
Comments
Sounds good to me. Please send a code review. But it won't get submitted until after go1.7 is released - the tree is frozen at this moment. Thank you. This will need a better test. We used to have a test for syscall.EscapeArg, but it's got lost over time. We would run Go process as a child of another Go process and make sure the arguments parent pass to the child match arguments received by the child. We would have a table of unusual arguments to test. We should resurrect that test. It will test your new code too. I suppose we could create new test regardless of your work. I will try and create this test if I have time. Alex |
CL https://golang.org/cl/22932 mentions this issue. |
@jstarks I do not see speed up you see. I cherry picked your CL 22932 git fetch https://go.googlesource.com/go refs/changes/32/22932/2 && git cherry-pick FETCH_HEAD on top of 0a55a16 (current tip). I run go test -run=none -count=10 -bench=BenchmarkRunningGoProgram runtime on my very old Windows XP computer name old time/op new time/op delta and my not as old Windows 7 computer name old time/op new time/op delta (I used github.com/rsc/benchstat to compare old and new benchmark output). Do these figures look correct to you? What am I doing wrong? Thank you. Alex |
@alexbrainman , where is BenchmarkRunningGoProgram? I don't see it in that CL. |
@alexbrainman Thanks for picking up the trail on this one (and the random number one too). The goal of this change is to stop loading shell32.dll into all Go processes; this is where the expense comes from. There are at least a couple possibilities why you're not seeing the improvement that I saw:
One idea to figure this out is to use a debugger to determine whether shell32 is getting loaded during process run before and after the change. |
@jstarks, makes sense, thanks. We can add a new regression test to make sure we don't introduce shell32 dependencies in the future. Is there a Windows API to list the DLLs loaded in the current process? |
With our luck that API is implemented in shell32.dll... |
Looks like this: EnumProcessModules |
Do you want to check whether go binary generated with this change doesn't depend on CommandLineToArgv? Then, how about debug/pe? |
Ah, no. debug/pe doesn't handle dynamic loading. ignore this. |
It was introduced by CL 29700 - another "stop using some DLLs" CL. It has been submitted.
I will try and find Windows 10 to test on. @mattn you have Windows 10. Could you try my instructions #15588 (comment) and check if you see any improvements?
How would I do that? Thank you. Alex |
@alexbrainman I tried below. Master branch (tip)
CL 22932Note that I couldn't merge the CL.
So I merged following patch. This should be same change. Just small different in that modify export_windows_test.go. https://gist.github.com/mattn/0909a877904be548a595be732fe2bba6
Intel Core i5 2.2GHz 8GB Mem |
Thank you. That is what I see too on my computers. I will try and use EnumProcessModules to make sure that new version does not uses shell32.dll as requested,. I will report here. Alex |
I cherry picked CL 22932 git fetch https://go.googlesource.com/go refs/changes/32/22932/2 && on top of f36e1ad (current tip) I run go test -run=none -count=10 -bench=BenchmarkRunningGoProgram runtime my Windows 7 computer name old time/op new time/op delta my other Windows 7 computer name old time/op new time/op delta my Windows 10 computer name old time/op new time/op delta I also used "Process Explorer" So I have no idea what else I can try. One thing I noticed. On Windows 10 "Process Explorer" shows many more DLLs loaded for old version comparing to new Maybe my BenchmarkRunningGoProgram is the wrong way to measure small Go program running time. Suggestions are welcome. Alex |
Those p values (p=0.447) don't look good. Try running more than 10 iterations. Try like 100 or so. |
I did -count=20 on my Windows 7 computer. It is not much different name old time/op new time/op delta I will sleep on it. Always helps. Alex |
Good p value (p=0.000) now, though. |
I tried looking where all the time goes in BenchmarkRunningGoProgram, and I noticed there is a time.Sleep in os.Process.wait that makes BenchmarkRunningGoProgram slower than it needs to be. I also thought, maybe that time.Sleep is so long, we do not notice CL 22932 improvements because that extra sleep is so large. So I removed time.Sleep (see CL 31536 patch set 1). And then I cherry picked CL 22932 on top of that (see CL 31536 patch set 2). And then I tested performance improvements between CL 31536 base and ps1, and ps1 and ps2. As before I did go test -run=none -count=10 -bench=BenchmarkRunningGoProgram runtime I see this on my Windows XP: base -> ps1: ps1 -> ps2: on Windows 7 base -> ps1: ps1 -> ps2: on another Windows 7 base -> ps1: ps1 -> ps2: on Windows 10 laptop base -> ps1: ps1 -> ps2: So again I do not see much improvement in CL 22932. There is some speed up in removing time.Sleep in os.Process.wait (CL 31536 patch set 1), but all.bat fails during cmd/go testing with this change. I also tried replacing time.Sleep with SwitchToThread Windows API, but that still fails. It would be nice not to care if executable file is still locked after process completes, but a lot of our code depends on that. Maybe Windows caches DLL well, so second execution of the same process is just quick. So I run make.bat against all versions of CL 31536 (I used github.com/alexbrainman/time to measure this): my Windows XP: CL 31536 base 1m23.984s 1m24.172s 1m23.938s my Windows 7: CL 31536 base 2m14.284s 1m57.884s 2m11.774s And I do not see much improvements anywhere. Even removing time.Sleep from os.Process.wait does not look important. I am out of ideas. Unless I can see some improvements, I do not think CL 22932 is worth the trouble. Alex |
Thanks @alexbrainman. @jstarks, can you say more about the measurements you used to find that the startup time reduced from 22ms to 16ms? Those are enormous startup times to begin with. I wonder if something else was wrong on your system? |
The sleep in os.Process.wait strikes again! (Just commented on #17245.) It sounds like we need to understand that before we can understand whether we need to avoid shell32.dll. |
Did some raw measurements against loading shell32 vs not-loading.
Code here https://gist.github.com/egonelbre/d4c8bcc7e61b32fd42bedb2d6847b57a |
Why do you measure loading shell32.dll vs loading nothing? Go needs quite a few DLLs (other than shell32.dll) to run. It would be more appropriate to start with a set of DLLs that Go uses, and compare that with the same set + shell32.dll. Alex |
Sure... I collected all dll names found in stdlib, code here https://gist.github.com/egonelbre/ab6c7a4a004227a168a8229656e9863c (and binaries) Also added 32bit vs 64bit comparison, just in case... "32bit none"
"32bit stdlib"
"32bit stdlib wo shell32"
"64bit none"
"64bit stdlib"
"64bit stdlib wo shell32"
Also, it looks like |
It looks like not loading "32bit stdlib wo shell32,winmm"
"64bit stdlib wo shell32,winmm"
|
FTR, we call timeBeginPeriod for its side effects only, and I don't
think we can remove winmm.dll.
|
Oh right, it modifies how Sleep works... also found issue #8687. |
I have found my problem. The BenchmarkRunningGoProgram in runtime - I made it too simple - it does not even import os package, so it cannot tell the difference. If I import os package into BenchmarkRunningGoProgram (see patch set 1) and compare that to cherry picked CL 22932 on top of it (see patch set 2), I finally see some small improvements on my Windows 7 pc:
Alex |
CL 33264 has my changes. |
I had a thought, whether it would make sense to implement syscall code-generation such that it would load the dll-s only when necessary. For many command-line tools these dlls seem unnecessary; and the same for the functions themselves. Of course, I'm not sure how deeply are they tied into the runtime and how much more benefit it gives in practice. |
They are lazily loaded already, no? |
Never-mind... yeah... probably should stop trying to think when tired :) |
Yes, but the code that uses shell32.dll is in init function in os package. Alex |
More stats comparing different patch-sets of CL 33264. My Windows XP:
A Windows 7 (running inside of vm):
Alex |
Another Windows 10 pc comparing different patch-sets of CL 33264:
I think we should implement this change. Alex |
@alexbrainman, you wrote "comparing different patch-sets of CL 33264". Which one did you compare for your posted numbers? |
This is what I did:
Alex |
I have implemented fix in CL 37914 and 37915. But I am still trying to convince myself it is worth the complexity. I can see these speed ups for runtime.BenchmarkRunningGoProgram on my Windows 7 amd64:
on my Windows XP 386:
these are the steps I followed to collect benchmarks:
These are only Windows computers I can get my hands on. I wonder if other Windows users can see similar improvements. I am more interested in recent Windows OS versions. Please post your results here. Thank you. Alex |
Windows 10 amd64:
|
That convinces me, thank you very much, @egonelbre. Alex |
CL https://golang.org/cl/37915 mentions this issue. |
CL https://golang.org/cl/37914 mentions this issue. |
I would like to use BenchmarkRunningGoProgram to measure changes for issue #15588. So the program in the benchmark should import "os" package. It is also reasonable that basic Go program includes "os" package. For #15588. Change-Id: Ida6712eab22c2e79fbe91b6fdd492eaf31756852 Reviewed-on: https://go-review.googlesource.com/37914 Run-TryBot: Alex Brainman <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
Windows's
os.init()
callssyscall.CommandLineToArgv
to split the Windows command line into separate arguments. This in turn loads shell32.dll and calls CommandLineToArgvW to do the actual work of splitting the command line. This is different from C programs generated by Microsoft's compiler, which use a nearly identical function in the CRT to do the splitting.For a typical Go program, this is the only thing that causes shell32.dll to be loaded. Loading shell32 is expensive, since it depends on lots of additional DLLs -- on my machine shell32 loads 13 additional DLLs that would not otherwise be loaded.
By rewriting the algorithm from
CommandLineToArgvW
directly in go, we can eliminate the need to load all these extra DLLs. This algorithm is documented at https://msdn.microsoft.com/en-us/library/17w5ykft.aspx, although I have found that there is an undocumented special case where a"
next to another"
that ends a quoted argument should be included verbatim.I have prototyped this change (3ae6766) and found on my machine that it reduces startup time for a simple Go program that pulls in
os
from 22ms to 16ms. The cost for this is about a 10KB increase in binary size.If this approach seems worthwhile then I can send out a code review.
The text was updated successfully, but these errors were encountered: