Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build tags making our life difficult #938

Open
dominikh opened this issue Feb 25, 2021 · 9 comments
Open

Build tags making our life difficult #938

dominikh opened this issue Feb 25, 2021 · 9 comments
Labels

Comments

@dominikh
Copy link
Owner

dominikh commented Feb 25, 2021

Build tags cause us great pain. Some checks only operate on literals, not constants, in fear of introducing false positives but thus introducing false negatives. Other checks fail to take all effects of build tags into consideration, causing false positives.

This issue acts as a list of checks with known shortcomings caused by build tags. The goal is to address all of these issues, by teaching Staticcheck which values, types and behaviors depend on build tags, and which can be assumed to be static. Our current plan is to make use of the fact that if the build tags governing the use of a value imply the build tags of its definition, then only the one value can ever be observed. For example, the value of runtime.GOOS depends on the active build tags. If used in a file tagged with windows, the value will always be "windows". However, in a file tagged with linux, the value may be either "linux" or "android".

This approach doesn't let us know of all possible values, only of the fact that values may differ under different build configurations. As such, it itself will cause false negatives, especially in type-based analyses. Consider S1039, which flags calls of the form fmt.Sprint(x) where x's type is string. If x is defined under multiple build configurations, we can't be sure that its type is always string, even when it might be. It is not clear to me whether we should handle this. The vast majority of code isn't this stupid and we cause barely any false positives (no known ones). However, handling this correctly would cause a lot of false negatives.

Checks that only support basic literals, not consts

Note that a lot of these checks shouldn't flag code that uses constants, even if the constant's value is certain. x * 0 is useless, but x * N where N == 0 might be code that intentionally got disabled. The same applies to time.Sleep(N). However, we could flag these exclusively in gopls, so that editors render the code as dead code.

Checks that assume a constant's value is static

Checks that assume a function's behavior is static

Checks that assume a type's definition is static

Checks that assume that an expression's type is static, even though it might differ under different build tags, in particular due to := assignment

Issues that are complicated by build tags

@dominikh
Copy link
Owner Author

The other approach to build tags is to run Staticcheck multiple times, once per interesting build configuration. We can then merge the results. Each diagnostic would specify if it should be reported if 1) all build configurations of which a file is part of produced the diagnostic 2) any build configurations produced the diagnostic.

For example, calling regexp.Compile with an invalid regexp should always be reported, even if it's only invalid under one build configuration. However, dead code should only be reported if it is dead under all build configurations.

This approach leads to more accurate results, but also increases the total runtime by a factor of N, where N is the number of build configurations.

@dominikh
Copy link
Owner Author

If go/analysis allowed attaching metadata to diagnostics, then we could combine the two approaches, by selecting the merge strategy dynamically, requiring "reported under all configurations" only when a type or value is known to differ under build tags. This would increase precision and reduce the number of false negatives.

@dominikh
Copy link
Owner Author

We can also improve precision by tracking reachable code on a per-line basis instead of a per-file basis, although the definition of "reachable" differs for AST-based and IR-based analyses.

dominikh added a commit that referenced this issue Nov 19, 2021
We implement a merge mode, which takes the results of multiple runs
and merges them. On an analysis-by-analysis basis we decide whether a
diagnostic has to occur in all runs to be merged. In addition, we take
into consideration whether a file was part of a run in the first
place, to avoid incorrectly dropping diagnostics.

We add a new output format called "binary", which writes diagnostics
and metadata using gob. This data is the input to the merge mode.

Currently, the metadata includes a list of all files that were
checked. In the future, we might change to amore finegrained data,
such as which lines of code were reachable in the IR.

Updates gh-938
@dominikh
Copy link
Owner Author

Based on feedback from mvdan, we'll provide a shortcut for running and merging multiple runs on the same host:

$ staticcheck -matrix <<EOF
linux: GOOS=linux
windows: GOOS=windows
appengine: -tags=appengine
EOF
sand_appengine.go:8:28: error parsing regexp: missing argument to repetition operator: `?` [appengine] (SA1000)
sand_linux.go:5:28: error parsing regexp: missing argument to repetition operator: `*` [appengine,linux] (SA1000)
sand_windows.go:5:28: error parsing regexp: missing argument to repetition operator: `+` [windows] (SA1000)

This can be used by itself, or in combination with the -merge mode introduced in 5a97b5d.

dominikh added a commit that referenced this issue Nov 20, 2021
When using the -matrix flag, Staticcheck will read a list of build
configurations from stdin, execute all of them, and merge their
results similar to -merge. This simplifies checking multiple
configurations on the same host.

When using build configurations, the plain output will annotate
diagnostics with the names of build configurations that produced them.

The output of a -matrix run can be used together with -f binary and
passed to -merge. This allows running multiple configurations per host
and merging them.

For now, build configurations can specify any environment variables
and flags, without restrictions. That's why they have to be provided
explicitly to Staticcheck and won't be read from staticcheck.conf. In
the future we may allow restricted build configurations to be read
automatically.

Updates gh-938
@cespare
Copy link

cespare commented Jan 20, 2022

You can add S1008 to that first list of checks. It suggested a simplification based on a constant that's set to true/false using different build tags.

@dominikh
Copy link
Owner Author

dominikh commented Jan 20, 2022

@cespare Do you mean that the constant is used as the value in the return statement?

Edit: fixed that one in 246e50b

@cespare
Copy link

cespare commented Jan 20, 2022

I do! Thanks for the fix :)

@dominikh
Copy link
Owner Author

Now that we have the -merge and -matrix flags, the next question is whether we should still aim to detect build-tag-dependent behavior, to reduce the number of false positives when only checking a single configuration. However, if we do that, we'd also need a way to disable it. Otherwise, if we use -merge but the individual runs each suppress their potential false positives, we'll not have any results to merge.

@dominikh
Copy link
Owner Author

we'd also need a way to disable it.

Maybe we could have an -intent flag that spans both this, and the planned code review/aggressive checks, and is used for selecting different intentions of running Staticcheck: code review, single CI run, merged CI run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants