Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for comments: SQL to Go code generator that leverages pgx #915

Closed
jschaf opened this issue Jan 19, 2021 · 5 comments
Closed

Request for comments: SQL to Go code generator that leverages pgx #915

jschaf opened this issue Jan 19, 2021 · 5 comments

Comments

@jschaf
Copy link

jschaf commented Jan 19, 2021

Hi! I'm working on sqld, a code generator that takes SQL queries and outputs type-safe Go code using pgx. I'm a big fan of pgx and if you have some time, I'd love to get your thoughts on the usage of pgx, especially around:

  1. Safe usage of the pgx APIs in the generated code (example/author/queries.sql.go).
  2. Possible collaboration on a future ClientTrace API similar to the Go stdlib http tracing: https://blog.golang.org/http-tracing. There's a ton of useful metrics we could surface in pgx (DNS resolution for host names, response parse time, pool queue time). I've created an initial pgx-focused ClientTrace.go that I use in the generated code.

RFC: sqld: Go code generation for Postgres (Google doc)

To summarize, sqld takes a query like:

SELECT * FROM author where first_name = sqld.arg('FirstName');

And generates:

func (q *DBQuerier) FindAuthors(ctx context.Context, firstName string) ([]Author, error) {}

sqld is heavily inspired by sqlc. The main differences are sqld gets type information from Postgres directly and sqld uses pgx instead of the standard library sql types.

@jackc
Copy link
Owner

jackc commented Jan 23, 2021

Regarding usage of pgx APIs:

I didn't see any red flags. Only suggestion I have is maybe to replace Query with QueryFunc. It's much harder to misuse (but I guess that doesn't matter in generated code...) and it has the potential to be faster. It's only on master at the moment, but I expect to include it in the next minor release.

Regarding a trace API:

Sounds interesting. There might be a bit of an overlap with the existing logging system. Not sure if that would be a problem or not. I'd be able to review PRs or design sketches but I don't currently have availability to work on it myself.

@jschaf
Copy link
Author

jschaf commented Jan 24, 2021

QueryFunc. It's much harder to misuse (but I guess that doesn't matter in generated code...) and it has the potential to be faster.

Interesting, how would QueryFunc be faster? I figured the indirection through the function call would cause QueryFunc to always be slower than Query.

I'd be able to review PRs or design sketches but I don't currently have availability to work on it myself.

Perfect, I'll get codegen off the ground first, but I'll loop you in once I start taking a look at tracing. The sqlc maintainer pointed me at some prior art: https://github.com/ngrok/sqlmw that uses the default sql driver.

Also, I've renamed the project to pggen (https://github.com/jschaf/pggen) to avoid confusion with the existing sqlc library.

@jackc
Copy link
Owner

jackc commented Jan 27, 2021

Interesting, how would QueryFunc be faster? I figured the indirection through the function call would cause QueryFunc to always be slower than Query.

It costs an extra function call, but it avoids the allocation for the variadic argument to Scan (technically you can do that manually but it's not typically done).

There are also some possible future optimizations. Scan may be called with different variables for each row -- even entirely different types. This means Scan has to do type checking for each row. It might be possible to do that once for the result with QueryFunc. That would save several cycles per value read.

Though to be honest, it's likely the vast majority of real world performance won't matter one way or the other. The big win for me is making it harder to misuse.

@jschaf
Copy link
Author

jschaf commented Jan 27, 2021

There are also some possible future optimizations. Scan may be called with different variables for each row -- even entirely different types. This means Scan has to do type checking for each row. It might be possible to do that once for the result with QueryFunc.

Ah, I see, thanks for explaining. Since I know the types ahead of time, it seems I could generate more optimal code. I know the generated use-case is not the norm, but it'd be neat if I could leverage a lower-level, "unsafe" interface.

@jackc
Copy link
Owner

jackc commented Jan 27, 2021

I know the generated use-case is not the norm, but it'd be neat if I could leverage a lower-level, "unsafe" interface.

Well, you can use RawValues() to get the raw bytes over the wire and inline the decode of each value. But only large result sets can gain anything and even there the gains are only ~10%.

See the benchmarks for pgx for more info -- in particular these 3:

BenchmarkSelectRowsScanSimple/1000_rows-16                           	    1748	    635939 ns/op	  233590 B/op	    4013 allocs/op
BenchmarkSelectRowsExplicitDecoding/1000_rows-16                     	    1861	    599529 ns/op	  105460 B/op	    3012 allocs/op
BenchmarkSelectRowsRawPrepared/1000_rows/binary_-_mostly-16          	    2199	    537088 ns/op	       0 B/op	       0 allocs/op

Scan simple is the normal way pgx is used. Explicit decoding is avoiding scan and manually decoding each value. Raw prepared is measuring simply sending the query and receiving the result bytes but doing no decoding. That should be the theoretical limit of performance.

As you can see even the slowest one is within 19% of the theoretical limit. In almost all cases network time and PG server time vastly overwhelms pgx time.

@jschaf jschaf closed this as completed Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants