Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: spuriously stale dependencies #46377

Closed
mdempsky opened this issue May 25, 2021 · 10 comments
Closed

cmd/go: spuriously stale dependencies #46377

mdempsky opened this issue May 25, 2021 · 10 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@mdempsky
Copy link
Contributor

During compiler development, I periodically take snapshots of my git working tree, copy them to my workstation, and then run a bunch of tests. The script I run on the workstation currently looks like this:

cd $GOROOT/test
git checkout $TRYREV
go install -toolexec=toolstash cmd/compile
go build -a -v -gcflags=-d=inlfuncswithclosures=0 -toolexec='toolstash -cmp' std cmd
go install -v std cmd/cgo cmd/go cmd/link
go install -v -x cmd/cgo cmd/go cmd/link  # why??
go run run.go
go test -short std cmd

I've added the second go install -v -x cmd/cgo cmd/go cmd/link invocation because otherwise tests like cmd/link's TestDWARF fail because cmd/go says cmd/link is stale. I would expect the one go install invocation to be sufficient, but the go install -x output (e.g., https://gist.github.com/mdempsky/bdfc2247df2bba822b833b78309660e9 has the output from my most recent script run) confirms that cmd/cgo, cmd/go, and cmd/link are all being relinked and reinstalled.

The staleness issue bites me quite frequently during my workflow, but I haven't managed to isolate steps that reliably reproduce the failure.

/cc @bcmills

@mdempsky mdempsky added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 25, 2021
@ianlancetaylor
Copy link
Member

This is my understanding.

Running go install cmd/link will update the installed link tool if the hash code has changed. The hash code of linking a program includes the hash code of cmd/link itself. So the first install of cmd/link gives you a link tool with a hash code that records the link tool used to build it. When you run go install again, the link tool has changed, so the hash code has changed, so the linker gets rebuilt and reinstalled. This time, it won't actually change, so another go install cmd/link will see that hash code matches and it won't do anything.

This is basically why make.bash builds the tools three times. See this comment from cmd/dist/build.go:

// Toolchain2 should be semantically equivalent to toolchain1,
// but it was built using the new compilers instead of the Go 1.4 compilers,
// so it should at the least run faster. Also, toolchain1 had no build IDs
// in the binaries, while toolchain2 does. In non-release builds, the
// toolchain's build IDs feed into constructing the build IDs of built targets,
// so in non-release builds, everything now looks out-of-date due to
// toolchain2 having build IDs - that is, due to the go command seeing
// that there are new compilers. In release builds, the toolchain's reported
// version is used in place of the build ID, and the go command does not
// see that change from toolchain1 to toolchain2, so in release builds,
// nothing looks out of date.
// To keep the behavior the same in both non-release and release builds,
// we force-install everything here.
//
//	toolchain3 = mk(new toolchain, toolchain2, go_bootstrap)
//

At least, that is my understanding.

@thanm
Copy link
Contributor

thanm commented May 26, 2021

It is interesting to get some insight into this problem.

I run the cmd/link TestDWARF tests by hand a good deal (since I spend a fair amount of time making changes to the dwarf code), and it is frustrating to have the the test inform me that the linker is stale right after I just did "go install cmd/link".

Maybe we need a new "go installthreetimes cmd/link" or something like that.

@mdempsky
Copy link
Contributor Author

I see. I knew bootstrapping reinstalls multiple times, but I guess I assumed cmd/go would handle the details after that.

If it stays the user's responsibility to run go install to fixpoint and things are working as intended, then that's surprising, but okay.

@rsc
Copy link
Contributor

rsc commented May 26, 2021

Re "the user's responsiblity", this is not something that affects ordinary users. It is something that affects only people who install the tools that the go command uses to build things, namely 'cmd'.

The go command should not go out of its way to add special cases for developing the toolchain itself. That is the job of cmd/dist.

Apologies for needing to run 'go install' multiple times to reach a fixed point, but this is a fundamental consequence of the interaction between (1) cmd/link essentially depending on itself and (2) content-hash-based staleness.

@mvdan
Copy link
Member

mvdan commented May 26, 2021

For this reason I use toolstash restore && go install cmd/xxx if I'm hacking on the toolchain itself. This speeds up the iteration by making sure that I always build with the master version (stored with toolstash save), so I don't need to rebuild half of std and cmd again.

If we agree that this is something we should simply document better, maybe we can add it to #30074.

@mdempsky
Copy link
Contributor Author

Re "the user's responsiblity", this is not something that affects ordinary users. It is something that affects only people who install the tools that the go command uses to build things, namely 'cmd'.

Ack, I agree this issue has limited scope of people affected. But as someone who works primarily on cmd/compile, I still consider myself a user of cmd/go. That's all I meant.

That is the job of cmd/dist.

I'm not very familiar with cmd/dist's full functionality. After I make changes to cmd/compile, is there a cmd/dist invocation to bring things back to a fixed point with the minimal amount of redundant work?

I generally avoid running the full make.bash/all.bash script because I like the flexibility of tweaking my test script to reorder tests to prioritize whatever particular corner case I'm actively dealing with. Waiting for the complete triple-bootstrap really slows down my test cycles when I'm trying to figure out why my change broke fixedbugs/issue12345.go.

For this reason I use toolstash restore && go install cmd/xxx if I'm hacking on the toolchain itself.

Yeah, I'm thinking I should use toolstash restore in my script instead of -toolexec=toolstash. The latter is just muscle memory from doing cmd/compile-only hacking.

But does that actually address the fixpoint issue?

@mdempsky
Copy link
Contributor Author

mdempsky commented May 26, 2021

Edit: Nevermind, this was likely my fault. I forgot that I'd changed ~/pkg/toolstash/compile to a symlink so I could easily switch between a few stashed compilers. I guess that worked with toolstash save but not with toolstash restore.

My build script is currently:

toolstash restore
go tool compile -V

go install cmd/compile

go build -a -v -gcflags='-d=inlfuncswithclosures=0' -toolexec='toolstash -cmp' std cmd

# golang.org/issue/46377
go install std cmd
go install std cmd
go install std cmd

go run run.go
go test -short std cmd

But I keep running into a situation where the initial go install cmd/compile command seems to try recompiling cmd/compile with a broken compiler instead of the known-good version that I stashed and restored.

For example, right now I have this reliable behavior from cmd/go:

$ go install -a -toolexec=toolstash cmd/compile
[succeeds]

$ toolstash restore
$ go install -a cmd/compile
# math
math/trig_reduce.go:36:19: internal compiler error: math/fma.go:96:27: internal compiler error: math/expm1.go:170:14: internal compiler error: 'trigReduce': addr of canSSA expression: 
.   NAME-math.f esc(no) tc(1) Class:PAUTO Offset:0 InlFormal OnStack Used float64 # unsafe.go:23:18,trig_reduce.go:36:19

Why is cmd/go giving me different behavior depending on whether I use -toolexec=toolstash or toolstash restore immediately before?

(I'm not surprised about the compiler error: that's explained by a previous change I made. But I've reverted it in my source tree, and it wasn't in the stashed compiler, so I don't understand why the failure is persisting.)

@rsc
Copy link
Contributor

rsc commented May 27, 2021

If you want a complete fixed-point where nothing at all in the tree appears to be stale after modifying the sources to cmd/compile or cmd/link, then the minimum is

  1. go install cmd
  2. go install cmd
  3. go install cmd
  4. go install std

Looking at the inputs and outputs of each (the input to all steps is the same modified source code, so I'm only looking at the toolchain inputs, namely which toolchain is being used):

  1. Inputs = original toolchain.
    Outputs = new toolchain as compiled by original.

  2. Inputs = new toolchain as compiled by original.
    Outputs = new toolchain as compiled by (new toolchain as compiled by original).

  3. Inputs = new toolchain as compiled by (new toolchain as compiled by original).
    Outputs = new toolchain as compiled by (new toolchain as compiled by (new toolchain as compiled by original)).

Because we are using content-based (as opposed to mtime-based) staleness, a particular build result is indexed by the content hashes of the input sources (not changing here) and the binaries being used.

The binaries used as inputs in steps 1 and 2 are certainly different, since the one in step 2 has the modifications compiled into it.

The binaries used as inputs in steps 2 and 3 are different as well, at least when the modifications affect generated code.
Consider the case where the modifications contain an optimization that applies to some part of the compiler.
The output of step 1 = input of step 2 is an unoptimized compilation of the modified compiler sources.
The output of step 2 = input of step 3 is the optimized compilation of the modified compiler sources.
In the absence of compiler bugs, these two toolchains, although different in binary form, should be semantically equivalent.

Assuming those two toolchains really are semantically equivalent, they should produce identical binaries as output, except for the fact that the input hashes recorded in the binaries are different. We can't just rewrite the hashes, because maybe the toolchains aren't semantically equivalent. That is, maybe there's a bug in the compiler (it happens!).

Assuming the two toolchains really behave identically, then step 3 is a semantic repeat of step 2, just using a toolchain with different binaries. So it should produce the same outputs. But these outputs will have been compiled with themselves as their own input hashes, proving that we really have reached a fixed point and that the toolchains really are semantically equivalent, at least as applied to their own source code.

If go list says any part of cmd is stale after step 3, then there is a bug in the toolchain that is causing it not to produce deterministic outputs. This is what the staleness check in cmd/dist looks for.

Finally we need to recompile the standard library with the fixed-point compilers, which is step 4.

All this only matters if you are trying to make the compilers themselves not be reported as stale. The go command has no problem invoking stale compilers, so when I'm working on the toolchain, I just do one go install cmd (or go install cmd/link etc, whatever I'm changing) and then immediately move on to go test -short std. If you trust that the existing compiler works well enough to compile your changes, then there's no problem not forcing the fixed point. You're just trusting instead of verifying. At that point the only tests that fail are the ones that are picky about staleness.

@bcmills
Copy link
Contributor

bcmills commented May 27, 2021

At that point the only tests that fail are the ones that are picky about staleness.

For what it's worth, I just made a lot of those tests a lot less picky about staleness for #46347; see CL 322629.

@rsc
Copy link
Contributor

rsc commented May 27, 2021

Should this issue be closed?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants