Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: test caching misses on alternate runs #23526

Closed
robpike opened this issue Jan 24, 2018 · 15 comments
Closed

cmd/go: test caching misses on alternate runs #23526

robpike opened this issue Jan 24, 2018 · 15 comments

Comments

@robpike
Copy link
Contributor

robpike commented Jan 24, 2018

% go version
go version devel +6104939432 Sun Jan 21 04:56:36 2018 +0000 darwin/amd64
% 

Checkout upspin: go get -u upspin.io/cmd/upspin. Then

% cd upspin.io
% go test -short ./...
% go test -short ./...
% go test -short ./...
% go test -short ./...
% go test -short ./...
% go test -short ./...
% go test -short ./...

The first run takes a minute or two and will of course cache everything. The next run takes 7 seconds, as everything is cached. The third run reruns the test for upspin.io/test, only. This pattern continues: Even runs report cached values and run quickly; odd runs report mostly cached runs but retest upspin.io/test and take a minute or so.

@robpike
Copy link
Contributor Author

robpike commented Jan 24, 2018

Hint: the package it's retesting on alternate runs is the only one that has no non-test files.

@robpike robpike added this to the Go1.10 milestone Jan 24, 2018
@mvdan
Copy link
Member

mvdan commented Jan 24, 2018

This might be related to the weirdness involved in loading Go packages containing only test files in the new cmd/go. For example, see the recently fixed #23395.

Was this also broken when test caching was first introduced?

@ianlancetaylor
Copy link
Member

I believe this issue has something to do with the fact that (*Schema).Start in upspin.io/upbox/schema.go runs go install upspin.io/cmd/upspin upspin.io/cmd/cacheserver. That causes those binaries to change their mtime, and since they are installed under GOPATH this changes the test input hash. But I don't understand the alternating cached/uncached, and I might be wrong that this go install is related.

@robpike
Copy link
Contributor Author

robpike commented Jan 25, 2018

Yes it does, and I have filed upspin/upspin#572

@robpike
Copy link
Contributor Author

robpike commented Jan 25, 2018

Closing as not go's fault.

@robpike robpike closed this as completed Jan 25, 2018
@robpike
Copy link
Contributor Author

robpike commented Jan 25, 2018

Reopening. After https://upspin-review.googlesource.com/c/upspin/+/18420 the behavior has changed. Now for me it never caches the upspin.io/test results. I believe this must be something we are doing in upspin, but I am reopening to get help figuring out what.

@robpike robpike reopened this Jan 25, 2018
@robpike
Copy link
Contributor Author

robpike commented Jan 25, 2018

Now, unlike previously, this doesn't cache either (cd'ing into the test directory itself):

% cd upspin.io/test
% go test -short ./...

Are there diagnostics one can print to see what caching decisions are being made?

@robpike
Copy link
Contributor Author

robpike commented Jan 25, 2018

cc @rsc

@ianlancetaylor
Copy link
Member

I can no longer recreate the problem. It does reliably cache the test results for me, whether in the test directory or not..

The debugging that I know of is to use GODEBUG=gocachehash=1 and look at the voluminous output. The lines labeled testInput are the ones used for test results. If the test does not cache, run go test twice and compare those lines. The differences should point toward the problem.

There may be a better way that I don't know about.

@robpike
Copy link
Contributor Author

robpike commented Jan 25, 2018

I just cd'd to the upspin.io directory and did go test -short ./... four times. Not one of them cached the test directory, and in fact some others were not cached:

% grep '^ok' /tmp/? | grep -v cached
/tmp/1:ok  	upspin.io/client	0.597s
/tmp/1:ok  	upspin.io/client/clientutil	0.029s
/tmp/1:ok  	upspin.io/cloud/mail/sendgrid	0.019s
/tmp/1:ok  	upspin.io/cmd/cacheserver	0.582s
/tmp/1:ok  	upspin.io/cmd/upspin	9.853s
/tmp/1:ok  	upspin.io/dir/inprocess	0.682s
/tmp/1:ok  	upspin.io/dir/server	0.197s
/tmp/1:ok  	upspin.io/dir/server/tree	0.207s
/tmp/1:ok  	upspin.io/errors	0.530s
/tmp/1:ok  	upspin.io/pack/ee	0.807s
/tmp/1:ok  	upspin.io/pack/eeintegrity	0.227s
/tmp/1:ok  	upspin.io/pack/plain	0.270s
/tmp/1:ok  	upspin.io/rpc	0.127s
/tmp/1:ok  	upspin.io/serverutil/frontend	0.026s
/tmp/1:ok  	upspin.io/serverutil/perm	2.355s
/tmp/1:ok  	upspin.io/serverutil/signup	0.039s
/tmp/1:ok  	upspin.io/shutdown	0.017s
/tmp/1:ok  	upspin.io/store/server	1.479s
/tmp/1:ok  	upspin.io/subcmd	0.016s
/tmp/1:ok  	upspin.io/test	65.905s
/tmp/2:ok  	upspin.io/cmd/upspin	10.336s
/tmp/2:ok  	upspin.io/errors	0.583s
/tmp/2:ok  	upspin.io/shutdown	0.022s
/tmp/2:ok  	upspin.io/store/server	2.114s
/tmp/2:ok  	upspin.io/test	67.280s
/tmp/3:ok  	upspin.io/cmd/upspin	10.074s
/tmp/3:ok  	upspin.io/errors	0.535s
/tmp/3:ok  	upspin.io/shutdown	0.020s
/tmp/3:ok  	upspin.io/store/server	1.834s
/tmp/3:ok  	upspin.io/test	70.321s
/tmp/4:ok  	upspin.io/cmd/upspin	9.593s
/tmp/4:ok  	upspin.io/errors	0.596s
/tmp/4:ok  	upspin.io/shutdown	0.021s
/tmp/4:ok  	upspin.io/store/server	1.547s
/tmp/4:ok  	upspin.io/test	67.230s
% 

There is some evidence that this works for people who don't have GOPATH=$HOME and/or do not have binaries in their $HOME somewhere. I have both those conditions set, which caused the original problem in part - upbox was installing binaries in $HOME==$GOPATH.

I tried diff'ing /tmp/3 and /tmp/4, since go test thinks something has changed and it is necessary to rerun the tests. It looks like there are tests that are writing files into the test tree (of course), but not into the source tree - ls -ltR shows nothing updated today. Here are some samples (the diff itself is enormous):

> HASH[open]: "err stat /var/folders/0y/zwb3fjyn3tlbj8363tp0pfqw0000gp/T/upbox155961431/[email protected]/storecache/remote,localhost:54379/B5: no such file or directory\n"
> HASH[open]: ad0e0cde76bb275e35f4d8e0e45b2e8d4cfc0b147e8574afe85671a3c8e34b05
> HASH[testInputs]: "open /var/folders/0y/zwb3fjyn3tlbj8363tp0pfqw0000gp/T/upbox155961431/[email protected]/storecache/remote,localhost:54379/B5 ad0e0cde76bb275e35f4d8e0e45b2e8d4cfc0b147e8574afe85671a3c8e34b05\n"
448546,448549c448471,448474
< HASH[stat]: "err stat /var/folders/0y/zwb3fjyn3tlbj8363tp0pfqw0000gp/T/upbox503660362/[email protected]/storecache/remote,localhost:54136/D1: no such file or directory\n"
< HASH[stat]: "err lstat /var/folders/0y/zwb3fjyn3tlbj8363tp0pfqw0000gp/T/upbox503660362/[email protected]/storecache/remote,localhost:54136/D1: no such file or directory\n"
< HASH[stat]: baed5ffccf0f45597ac7e186adb5c9827c4c8456c63b29be9dadfdea50836199
< HASH[testInputs]: "stat /var/folders/0y/zwb3fjyn3tlbj8363tp0pfqw0000gp/T/upbox503660362/[email protected]/storecache/remote,localhost:54136/D1 baed5ffccf0f45597ac7e186adb5c9827c4c8456c63b29be9dadfdea50836199\n"

> HASH[open]: "err stat /var/folders/0y/zwb3fjyn3tlbj8363tp0pfqw0000gp/T/upbox887854561/[email protected]: no such file or directory\n"
> HASH[open]: f95722b68b05064fef0179a224b671aec20389c5727ee2fff4b59b089f21b368
> HASH[testInputs]: "open /var/folders/0y/zwb3fjyn3tlbj8363tp0pfqw0000gp/T/upbox887854561/[email protected] f95722b68b05064fef0179a224b671aec20389c5727ee2fff4b59b089f21b368\n"
443498,443500c443308,443310
< HASH[open]: "err stat /var/folders/0y/zwb3fjyn3tlbj8363tp0pfqw0000gp/T/upbox529443502/[email protected]/secret.upspinkey: no such file or directory\n"
< HASH[open]: 60cd2ff5c5e2022ca218fe8cbf544fc3bfdc092e2260b683590f0bf267be04cd
< HASH[testInputs]: "open /var/folders/0y/zwb3fjyn3tlbj8363tp0pfqw0000gp/T/upbox529443502/[email protected]/secret.upspinkey 60cd2ff5c5e2022ca218fe8cbf544fc3bfdc092e2260b683590f0bf267be04cd\n"

> HASH[open]: "err stat /var/folders/0y/zwb3fjyn3tlbj8363tp0pfqw0000gp/T/upspin-storage-disk-test735005669/RE/FF/QU/M2/Nz: no such file or directory\n"
> HASH[open]: 8dfdb00d293dc185f179e4a0f23529bc728c08d832ebcdb8e943ae1876dc841d
> HASH[testInputs]: "open /var/folders/0y/zwb3fjyn3tlbj8363tp0pfqw0000gp/T/upspin-storage-disk-test735005669/RE/FF/QU/M2/Nz 8dfdb00d293dc185f179e4a0f23529bc728c08d832ebcdb8e943ae1876dc841d\n"
442049,442052c441855,441858
< HASH[stat]: "err stat /var/folders/0y/zwb3fjyn3tlbj8363tp0pfqw0000gp/T/upspin-storage-disk-test771584528/RE/RB/ND: no such file or directory\n"
< HASH[stat]: "err lstat /var/folders/0y/zwb3fjyn3tlbj8363tp0pfqw0000gp/T/upspin-storage-disk-test771584528/RE/RB/ND: no such file or directory\n"
< HASH[stat]: e8133c1d30623225c09af6c2930b857f5e9135d93924a0f704fa6e94d58945d7
< HASH[testInputs]: "stat /var/folders/0y/zwb3fjyn3tlbj8363tp0pfqw0000gp/T/upspin-storage-disk-test771584528/RE/RB/ND e8133c1d30623225c09af6c2930b857f5e9135d93924a0f704fa6e94d58945d7\n"

All those "no such file or directory" messages bother me. These seem to be tests involving files, and it's not just upbox (moreover, upbox tests are caching). I can't find a reason the errors test, for example, is rerunning though. If I do it by itself, it caches. But if I run the full test, it does not.

I'll keep digging but the output with GODEBUG=gocachehash=` is inscrutable at best.

@robpike
Copy link
Contributor Author

robpike commented Jan 25, 2018

Progress of a sort. The integration tests in upspin.io/test use upbox, but that use can be disabled. I did that, and now the only tests that don't cache are:

errors
shutdown
store/server

But why errors would fail eludes me. It works in isolation. More digging.

@robpike
Copy link
Contributor Author

robpike commented Jan 26, 2018

A much smaller case, showing the toggling of caching. A run of upspin.io/shutdown manages to break the cache for upspin.io/errors:

for i in `seq 4` ; do go test -short upspin.io/errors upspin.io/shutdown; done
ok  	upspin.io/errors	0.338s
ok  	upspin.io/shutdown	0.018s
ok  	upspin.io/errors	(cached)
ok  	upspin.io/shutdown	0.017s
ok  	upspin.io/errors	0.342s
ok  	upspin.io/shutdown	0.019s
ok  	upspin.io/errors	(cached)
ok  	upspin.io/shutdown	0.017s

Now the shutdown test is odd in that it execs a version of itself to create a child process for testing. But I cannot explain why that would cause the test cache for the errors package to go bad.

@robpike
Copy link
Contributor Author

robpike commented Jan 26, 2018

This is a clue I think:

HASH[open]: "stat 0 42001b6 2018-01-26 11:29:07.206202 +1100 AEDT false\n"
HASH[open]: abf66b1cc4913216dd42904999a4d466de5808f6f23ee81942c4c751d8eb41e5
HASH[testInputs]: "open /dev/null abf66b1cc4913216dd42904999a4d466de5808f6f23ee81942c4c751d8eb41e5\n"

The shutdown test writes to /dev/null, but /dev/null gives a different hash each time. As a total hack, I changed go test to return a constant hash for /dev/null and everything caches properly, always. I think this is the essence of the problem.

@rsc
Copy link
Contributor

rsc commented Jan 26, 2018

Do you have the latest cmd/go? The current internal/test/test.go says:

            case "stat":
                    if !filepath.IsAbs(name) {
                            name = filepath.Join(pwd, name)
                    }
                    if !inDir(name, a.Package.Root) {
                            // Do not recheck files outside the GOPATH or GOROOT root.
                            break
                    }
                    fmt.Fprintf(h, "stat %s %x\n", name, hashStat(name))
            case "open":
                    if !filepath.IsAbs(name) {
                            name = filepath.Join(pwd, name)
                    }
                    if !inDir(name, a.Package.Root) {
                            // Do not recheck files outside the GOPATH or GOROOT root.
                            break
                    }
                    fh, err := hashOpen(name)
                    if err != nil {
                            if cache.DebugTest {
                                    fmt.Fprintf(os.Stderr, "testcache: %s: input file %s: %s\n", a.Package.ImportPath, name, err)
                            }
                            return cache.ActionID{}, err
                    }
                    fmt.Fprintf(h, "open %s %x\n", name, fh)

If /dev/null is being included then the question is why inDir("/dev/null", yourGOROOT) is returning true.

@robpike
Copy link
Contributor Author

robpike commented Jan 26, 2018

As I wrote in the first message, my go command was from Jan 21, less than a week ago. I just synced and rebuilt and of course it all works.

So the bug has been fixed in the last few days. Sorry for the spam. (Fun to track down though.)

@robpike robpike closed this as completed Jan 26, 2018
@golang golang locked and limited conversation to collaborators Jan 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants