Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

execgen: make file i/o more "unix"-ey #56982

Closed
irfansharif opened this issue Nov 21, 2020 · 5 comments · Fixed by #57030
Closed

execgen: make file i/o more "unix"-ey #56982

irfansharif opened this issue Nov 21, 2020 · 5 comments · Fixed by #57030
Assignees
Labels
A-build-system A-sql-execution Relating to SQL execution. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@irfansharif
Copy link
Contributor

Describe the problem

The UX around execgen i/o is a bit non-standard. Let's consider one example:

execgen -fmt=false pkg/cmd/colexec/hash_any_not_null_agg.eg.go

At first glance it may seem that it's taking the hash_any_not_null_agg.eg.go file as input, but in fact the file path is a positional argument tells execgen which specific .eg.go file to generate, which it then prints to stdout. For it's "input", the execgen program has a hardcoded list of _tmpl.go files relative to the repo root that it knows to read and use in order to generate the relevant .eg.go file.

Expected behavior
Lets consider how optgen behaves instead.

optgen -out pkg/sql/opt/operator.og.go ops pkg/sql/opt/ops/*.opt

It's a bit simpler to understand. The -out parameter specifies where the output file is to be placed. The ops argument tells optgen that it needs to generate operator definitions. The pkg/sql/opt/ops/*.opt arguments tell optgen where to look to find all relevant "input" files (the opt rules).

Additional data / screenshots
N/A

Additional context

The execgen UX has worked well for us thus far, but is exceedingly difficult to work with now that we're trying bazel in earnest (#55687). Specifically, bazel wants to auto-generate code within its sandbox where the template files are not necessarily placed in the same path as it does in the crdb repo itself. Because the template paths are hard-coded, it takes a few hacks to get it working just right with Bazel. Compare the optgen bazel gen rule with the execgen one:

# Define the generator for the operator definitions and functions.
genrule(
name = "gen-operator",
srcs = [":ops"],
outs = ["operator.og.go"],
cmd = """
$(location //pkg/sql/opt/optgen/cmd/optgen) -out $@ ops $(locations :ops)
""",
tools = ["//pkg/sql/opt/optgen/cmd/optgen"],
)

def _generate_impl(ctx):
output_files, input_files = [], []
# Gather the full list of all input template files (deconstructing it from
# the filegroup input). See [1] for a reference of all types/members within
# skylark.
#
# [1]: https://docs.bazel.build/versions/master/skylark/lib/skylark-overview.html
for tmpl in ctx.attr.templates:
input_files.extend(tmpl.files.to_list())
execgen_binary = ctx.executable.execgen
goimports_binary = ctx.executable.goimports
input_binaries = [execgen_binary, goimports_binary]
# For each target, run execgen and goimports.
for generated_file_name in ctx.attr.targets:
generated_file = ctx.actions.declare_file(generated_file_name)
ctx.actions.run_shell(
tools = input_binaries,
inputs = input_files,
outputs = [generated_file],
progress_message = "Generating pkg/cmd/colexec/%s" % generated_file_name,
command = """
%s -fmt=false pkg/sql/colexec/%s > %s
%s -w %s
""" % (execgen_binary.path, generated_file_name, generated_file.path, goimports_binary.path, generated_file.path),
)
output_files.append(generated_file)
# Construct the full set of output files for bazel to be able to analyse.
# See https://docs.bazel.build/versions/master/skylark/lib/DefaultInfo.html.
return [DefaultInfo(files = depset(output_files))]
generate = rule(
implementation = _generate_impl,
# Source all the necessary template files, output targets, the execgen and goimports binaries.
attrs = {
"templates": attr.label_list(mandatory = True, allow_files = ["tmpl.go"]),
"targets": attr.string_list(mandatory = True),
"execgen": attr.label(mandatory = True, executable = True, allow_files = True, cfg = "exec"),
"goimports": attr.label(mandatory = True, executable = True, allow_files = True, cfg = "exec"),
},
)

Right now the bazel targets for pkg/sql/colexec don't work for running tests (they do for building the test binaries, confusingly), as the sandbox places the template files in a location other than what execgen has been taught to expect.

$ bazel test @cockroach//pkg/sql/colexec:colexec_test
...
INFO: From Generating pkg/cmd/colexec/and_or_projection.eg.go:
ERROR: open pkg/sql/colexec/and_or_projection_tmpl.go: no such file or directory

If the UX looked something similar to the following:

execgen hash_sum_agg  -t <relevant template files>

Where it would print to stdout the intended contents of pkg/sql/colexec/colexecagg/hash_sum_agg.eg.go, life would be much easier. Specifically what I'm looking for is for the arguments fully specify where exactly to look for all the "inputs", instead of it internally hardcoding those paths. It'd be more “unix”-ey that way, I think. Specifically for bazel, we’d then be able to declare the full set of template files as data dependencies, while still allowing bazel to place those files in any arbitrary path within the sandbox.

To add more words to my word soup above, another reason the status quo feels gross in bazel is because the template files in one package (colexec) can be used to generate eg.go files in another (colconv for e.g.). By default bazel wants to treat colexec as a dependency for colconv, but in our code it’s the other way around. For now we've settled for not generating the eg.go~ files in colconv`, but we'll want to remove this stopgap soon.

(Aside: that’s yet another thing that's slightly better in optgen. The rules for each package are wholly contained within that package itself or in children packages.)


cc-ing @jordanlewis for triage/routing.

@blathers-crl
Copy link

blathers-crl bot commented Nov 21, 2020

Hi @irfansharif, please add a C-ategory label to your issue. Check out the label system docs.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@irfansharif irfansharif added C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Nov 21, 2020
irfansharif added a commit to irfansharif/cockroach that referenced this issue Nov 23, 2020
The way `execgen` is constructed, it expects to find the relevant
_tmpl.go files in the hardcoded paths (which happen to be relative to
the repo root). With bazel the code-gen process occurs within the
sandbox, and the template files aren't placed in the exact same paths as
in the repo (it's placed within `external/cockroach/pkg/...` instead).
When trying to execute pkg/sql/colexec tests through bazel, the template
files aren't found in the right paths, and the `execgen` processes
subsequently fail. See cockroachdb#56982 for more details.

```
$ bazel test @cockroach//pkg/sql/colexec:colexec_test
...
INFO: From Generating pkg/cmd/colexec/and_or_projection.eg.go:
ERROR: open pkg/sql/colexec/and_or_projection_tmpl.go: no such file or directory
```

Short of changing out `execgen` to consider template files as data
dependencies, for now lets just symlink the relevant files into the
"right" path within the bazel sandbox.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Nov 23, 2020
The way `execgen` is constructed, it expects to find the relevant
_tmpl.go files in the hardcoded paths (which happen to be relative to
the repo root). With bazel the code-gen process occurs within the
sandbox, and the template files aren't placed in the exact same paths as
in the repo (it's placed within `external/cockroach/pkg/...` instead).
When trying to execute pkg/sql/colexec tests through bazel, the template
files aren't found in the right paths, and the `execgen` processes
subsequently fail. See cockroachdb#56982 for more details.

```
$ bazel test @cockroach//pkg/sql/colexec:colexec_test
...
INFO: From Generating pkg/cmd/colexec/and_or_projection.eg.go:
ERROR: open pkg/sql/colexec/and_or_projection_tmpl.go: no such file or directory
```

Short of changing out `execgen` to consider template files as data
dependencies, for now lets just symlink the relevant files into the
"right" path within the bazel sandbox.

Release note: None
@jordanlewis
Copy link
Member

It would be easy to change the way this works to be:

execgen -fmt=false -out pkg/cmd/colexec/hash_any_not_null_agg.eg.go path/to/template

would that help? There is a lot of detail in this issue, but if we just added the path to the template to the invocation, and read the template from that path, would that fix your problem?

There are some output files that take no template arguments, and therefore have no dependencies. For those, we'd just take 0 template args and that would be fine.

@jordanlewis
Copy link
Member

diff --git a/pkg/sql/colexec/execgen/cmd/execgen/main.go b/pkg/sql/colexec/execgen/cmd/execgen/main.go
index 88b92b886d..ae209084cb 100644
--- a/pkg/sql/colexec/execgen/cmd/execgen/main.go
+++ b/pkg/sql/colexec/execgen/cmd/execgen/main.go
@@ -26,10 +26,6 @@ import (
 	"github.com/cockroachdb/gostdlib/x/tools/imports"
 )
 
-var (
-	errInvalidArgCount = errors.New("invalid number of arguments")
-)
-
 func main() {
 	gen := execgenTool{stdErr: os.Stderr}
 	if !gen.run(os.Args[1:]...) {
@@ -71,12 +67,14 @@ func registerGenerator(g generator, outputFile, inputFile string) {
 func (g *execgenTool) run(args ...string) bool {
 	// Parse command line.
 	var printDeps bool
+	var outPath string
 	g.cmdLine = flag.NewFlagSet("execgen", flag.ContinueOnError)
 	g.cmdLine.SetOutput(g.stdErr)
 	g.cmdLine.Usage = g.usage
 	g.cmdLine.BoolVar(&g.fmtSources, "fmt", true, "format and imports-process generated code")
 	g.cmdLine.BoolVar(&g.verbose, "verbose", false, "print out debug information to stderr")
 	g.cmdLine.BoolVar(&printDeps, "M", false, "print the dependency list")
+	g.cmdLine.StringVar(&outPath, "out", "", "path")
 	err := g.cmdLine.Parse(args)
 	if err != nil {
 		return false
@@ -84,32 +82,33 @@ func (g *execgenTool) run(args ...string) bool {
 
 	// Get remaining args after any flags have been parsed.
 	args = g.cmdLine.Args()
-	if len(args) < 1 {
+	if len(args) > 1 {
 		g.cmdLine.Usage()
-		g.reportError(errInvalidArgCount)
+		g.reportError(errors.New("too many input arguments; only 0 or 1 supported"))
 		return false
 	}
 
-	for _, out := range args {
-		_, file := filepath.Split(out)
-		e := generators[file]
-		if e.fn == nil {
-			g.reportError(errors.Errorf("unrecognized filename: %s", file))
-			return false
-		}
-		if printDeps {
-			if e.inputFile == "" {
-				// This output file has no template dependency (its template
-				// is embedded entirely in execgenTool). Skip it.
-				continue
-			}
-			fmt.Printf("%s: %s\n", out, e.inputFile)
-		} else {
-			if err := g.generate(out, e); err != nil {
-				g.reportError(err)
-				return false
-			}
+	if outPath == "" {
+		g.cmdLine.Usage()
+		g.reportError(errors.New("no output path specified"))
+		return false
+	}
+
+	_, file := filepath.Split(outPath)
+	e := generators[file]
+	if e.fn == nil {
+		g.reportError(errors.Errorf("unrecognized filename: %s", file))
+		return false
+	}
+	if e.inputFile != "" {
+		if len(args) == 0 {
+			g.reportError(errors.Errorf("file %s expected input template %s, found none", file, e.inputFile))
 		}
+		e.inputFile = args[0]
+	}
+	if err := g.generate(outPath, e); err != nil {
+		g.reportError(err)
+		return false
 	}
 	return true
 }

@irfansharif
Copy link
Contributor Author

If we just added the path to the template to the invocation, and read the template from that path, would that fix your problem?

That would help, yes.

execgen -fmt=false -out pkg/cmd/colexec/hash_any_not_null_agg.eg.go path/to/template

It'd be even better for execgen to not directly write to the specified path, and instead print the relevant file contents to stdout so the caller could then redirect as needed. The bazel sandbox does not necessarily want to generate these files within $SANDBOX/pkg/cmd/colexec.

@jordanlewis
Copy link
Member

It'd be even better for execgen to not directly write to the specified path, and instead print the relevant file contents to stdout so the caller could then redirect as needed. The bazel sandbox does not necessarily want to generate these files within $SANDBOX/pkg/cmd/colexec.

This is already what execgen does.

craig bot pushed a commit that referenced this issue Nov 23, 2020
56874: sql: add notice for interleaved tables r=postamar a=postamar

This patch adds a deprecation notice when the user attempts to create an
interleaved table by executing a CREATE TABLE ... INTERLEAVE IN PARENT
statement.

Fixes #56579.

Release note (sql change): Added deprecation notice to CREATE TABLE ...
INTERLEAVE IN PARENT statements.

57018: roachpb: use kv.kvpb as package name for errors.proto r=knz a=tbg

We just started using `EncodedError` in `message Error`, which
means that once the first alpha goes out (soon) we will have
to face a migration if we wish to change any proto package path
that is referenced by an `Error` (i.e. most of `errors.proto`).

We do want to dismantle the `roachpb` grab bag at some point.
However, I just tried moving `errors.proto` (or even just its
relevant parts) to a new `kvpb` package, and failed miserably -
this will take a solid chunk of uninterrupted time.

So, go for the stop gap: rename the path in `errors.proto` now,
so that we're free to make the actual move "whenever" we can.

Release note: None


57027: colexec: support running colexec tests through bazel r=irfansharif a=irfansharif

The way `execgen` is constructed, it expects to find the relevant
_tmpl.go files in the hardcoded paths (which happen to be relative to
the repo root). With bazel the code-gen process occurs within the
sandbox, and the template files aren't placed in the exact same paths as
in the repo (it's placed within `external/cockroach/pkg/...` instead).
When trying to execute pkg/sql/colexec tests through bazel, the template
files aren't found in the right paths, and the `execgen` processes
subsequently fail. See #56982 for more details.

```
$ bazel test @cockroach//pkg/sql/colexec:colexec_test
...
INFO: From Generating pkg/cmd/colexec/and_or_projection.eg.go:
ERROR: open pkg/sql/colexec/and_or_projection_tmpl.go: no such file or directory
```

Short of changing out `execgen` to consider template files as data
dependencies, for now lets just symlink the relevant files into the
"right" path within the bazel sandbox.

Release note: None

Co-authored-by: Marius Posta <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
@craig craig bot closed this as completed in fccf248 Nov 24, 2020
irfansharif added a commit to irfansharif/cockroach that referenced this issue Nov 24, 2020
Now that we've added a way to source a specific template file in
`execgen` (instead of relying on hard-coded paths, see cockroachdb#56982), we can
simplify how we generate eg.go files. This lets us parallelize the
generation of these files, as the more fine-grained dependency tracking
lets bazel generate each eg.go file concurrently (previously we had to
specify the superset of all template files as dependencies for the
generation of each individual eg.go file).

There's one exception for the generation of like_ops.eg.go, the
generation of which appears to want read from a second template file.
We've special-cased the generation of this file into it's own thing.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Nov 24, 2020
Now that we've added a way to source a specific template file in
`execgen` (instead of relying on hard-coded paths, see cockroachdb#56982), we can
simplify how we generate eg.go files. This lets us parallelize the
generation of these files, as the more fine-grained dependency tracking
lets bazel generate each eg.go file concurrently (previously we had to
specify the superset of all template files as dependencies for the
generation of each individual eg.go file).

There's one exception for the generation of like_ops.eg.go, the
generation of which appears to want read from a second template file.
We've special-cased the generation of this file into it's own thing.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Nov 25, 2020
Now that we've added a way to source a specific template file in
`execgen` (instead of relying on hard-coded paths, see cockroachdb#56982), we can
simplify how we generate eg.go files. This lets us parallelize the
generation of these files, as the more fine-grained dependency tracking
lets bazel generate each eg.go file concurrently (previously we had to
specify the superset of all template files as dependencies for the
generation of each individual eg.go file).

There's one exception for the generation of like_ops.eg.go, the
generation of which appears to want read from a second template file.
We've special-cased the generation of this file into it's own thing.

Release note: None
craig bot pushed a commit that referenced this issue Nov 25, 2020
57075: colexec,bazel: re-work code-gen through bazel r=irfansharif a=irfansharif

Now that we've added a way to source a specific template file in
`execgen` (instead of relying on hard-coded paths, see #56982), we can
simplify how we generate eg.go files. This lets us parallelize the
generation of these files, as the more fine-grained dependency tracking
lets bazel generate each eg.go file concurrently (previously we had to
specify the superset of all template files as dependencies for the
generation of each individual eg.go file).

There's one exception for the generation of like_ops.eg.go, the
generation of which appears to want read from a second template file.
We've special-cased the generation of this file into it's own thing.

Release note: None

57081: roachtest: update pgjdbc blocklists r=solongordon a=solongordon

I made the following updates to the pgjdbc blocklists:
- Removed tests which are now passing due to user-defined schema
  support.
- Removed testUpdateSelectOnly since we now support this syntax as a
  no-op.
- Updated some failure reasons to "unknown" for tests which are still
  failing even though the referenced issue was closed.
- Added many BatchExecuteTest.* tests to the ignore list. These tests
  are flaky due to a combination of #54477 and the fact that the tests
  do not run in a deterministic order.

Fixes #53467
Fixes #53738
Fixes #54106

Release note: None

Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Solon Gordon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-system A-sql-execution Relating to SQL execution. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants