Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Context Interface #496

Closed
julienschmidt opened this issue Oct 24, 2016 · 57 comments
Closed

Implement Context Interface #496

julienschmidt opened this issue Oct 24, 2016 · 57 comments
Assignees
Milestone

Comments

@julienschmidt
Copy link
Member

See golang/go#15123

@jsha
Copy link

jsha commented Nov 3, 2016

It would be great to auto-detect support for max_statement_time (MariaDB: 10.1.1+) or max_execution_time (MySQL: 5.7.8+) and use that to support Contexts with deadlines. This can be done on a per-statement basis like so:

-- MariaDB
SET STATEMENT max_statement_time=0.1 FOR SELECT * FROM table;
-- MySQL
SELECT MAX_EXECUTION_TIME(100) * FROM table;

Documentation for MySQL statement-level limit: https://dev.mysql.com/doc/refman/5.7/en/optimizer-hints.html#optimizer-hints-execution-time

@methane
Copy link
Member

methane commented Nov 4, 2016

My personal feeling: I don't like such an implicit magic.
User can add it when they can assume what version of MySQL / MariaDB is used.

@methane
Copy link
Member

methane commented Nov 7, 2016

What should we do to cancel query?

  1. Create new connection; send "KILL QUERY "; close the connection; wait query is stopped.
  2. Just close current connection.

MySQL cli chose 1. But I don't like it because:

  • When query is cancelled because of MySQL server is under heavy load, creating new temporary connection adds load.
  • It ignores DB.SetMaxOpenConns(). It means user faces unexpected "Too many connections" error.

@jsha
Copy link

jsha commented Nov 7, 2016

(2) is not great. In https://github.com/letsencrypt/boulder/ we had been using readTimeout to terminate queries that were too slow. Unfortunately this led to a cascading outage under load. Specifically, a boulder server started N queries (based on DB.SetMaxOpenConns()), and when one of them timed out, it would open another one. From boulder's perspective, it never exceeded the DB.SetMaxOpenConns() limit. However, from MariaDB's perspective, those connections were still active until the query completed and it tried to send data. So boulder kept creating new connections until MariaDB hit its own internal limit, at which point it started returning errors to all components. Additionally, it prevented new connections from the host that created excessive connections, until we manually ran flush tables;.

I think if we go the route of implementing manual KILL QUERY, it should be via a control connection that is opened initially and only used for kills. However, I think the best solution would be to say that query killing is only supported on recent versions of MySQL / MariaDB that support max_statement_time, and provide a config option to declare that the backend supports it (you're right, automagic detection is not good).

It would probably be necessary to register two driver names for the two diverging dialects, mysql, and mariadb.

@methane
Copy link
Member

methane commented Nov 8, 2016

I don't like implement MySQL and MariaDB specific feature.
People can use this library with any servers and proxies talking MySQL protocol.

@methane
Copy link
Member

methane commented Nov 8, 2016

(2) is not great.
...
However, from MariaDB's perspective, those connections were still active until the query completed and it tried to send data.

I know it.
But if people want to avoid such situations, they can use max_execution_time manually.

I think if we go the route of implementing manual KILL QUERY, it should be via a control connection that is opened initially and only used for kills.

I also -1 on this.
Driver implements one connection, and doesn't know about sql.DB (connection pool).
For example, sql.Open() and DB.Close() doesn't call driver.
If we choose "one control connection per pool", DB.Close() doesn't close the control connection.
If we choose "one control connection per connection", it means we use double connections.

We may have another behavior option in the future.
But first and default implementation should be simple, predicable, and usable.

@kardianos
Copy link

Unfortunately the driver interface doesn't indicate when a connection pool is opening or closing. However, while it is more work, it certainly isn't impossible or too much work to open one control connection per (host, port, username, password) combination that remains open while at least one request transaction with that combination is open.

@methane
Copy link
Member

methane commented Dec 5, 2016

It may work sometimes, but it may not work sometimes.
For example, KILL query requires SUPER privilege.

I think the first and the default implementation of cancellation should be:

  • works with not only MySQL or MariaDB.
  • simple and predicable

After that, I'm not against adding "super clever, excellent, wonderful cancel implementation".
But I won't implement it because I love simplicity and predictability.

@methane
Copy link
Member

methane commented Dec 5, 2016

For example, KILL query requires SUPER privilege.

I'm sorry, it was wrong.

https://dev.mysql.com/doc/refman/5.7/en/kill.html

If you have the PROCESS privilege, you can see all threads. If you have the SUPER privilege, you can kill all threads and statements. Otherwise, you can see and kill only your own threads and statements.

But I still prefer simple implementation as default.

@kardianos
Copy link

@methane

Not wanting to go too much out on a limb, I asked @bradfitz what he thought on this issue (postgresql will have a similar issue to solve):

I'd prefer them to keep a conn around just for killing.

They seen sql/driver Driver.Open and Conn.Close. So on transition from 0 to 1 open conns: open the KILLer conn also, and don't report success unless you made both. On Conn.Close transitioning to 0 open Conns: also close the KILLer conn.

This count would be held per unique DSN.

If you'd like I could write up a POC for the bookkeeping aspect.

@methane
Copy link
Member

methane commented Dec 6, 2016

I'd prefer them to keep a conn around just for killing.

They seen sql/driver Driver.Open and Conn.Close. So on transition from 0 to 1 open conns: open the > KILLer conn also, and don't report success unless you made both. On Conn.Close transitioning to 0 open Conns: also close the KILLer conn.

Could you give me an URL for this discussion?

At least, this idea can't follow DB.SetConnMaxLifetime().
Long living connection cause a lot of issues: proxies, load balancers, wait_timeout, network configurations.

Even apart from ConnMaxLifetime, I think KILLer connection can't work with TCP loadbalancer like LVM.
In MySQL, multiple slave servers behind load balancer is one of common configuration.

@bradfitz
Copy link

bradfitz commented Dec 6, 2016

Wow, closing the connection isn't enough to cancel a query in MySQL?

and there's no cancel packet at the wire level?

If supporting multiple slave servers behind load balancer is a goal (which is probably a good goal), and there's no way to send a cancel on an existing connection and closing the connection doesn't abort a query... what options are there?

This is kinda sad.

@methane
Copy link
Member

methane commented Dec 6, 2016

Wow, closing the connection isn't enough to cancel a query in MySQL?

and there's no cancel packet at the wire level?

AFAIK, Yes.

what options are there?

Combine closing connection and (MAX_EXECUTION_TIME(N) optimizer hint 1, or max_execution_time system variable 2).

While it's not an ideal, I think it's practical enough for most applications.

@bradfitz
Copy link

bradfitz commented Dec 7, 2016

I think context cancellation is more common than deadlines.

I think you should probably also implement the "KILL nnnnn" mechanism for cancels. I think single MySQL instances are way more popular than load-balanced read-only pools. (And killing a slow SELECT query is much less important)

@methane
Copy link
Member

methane commented Dec 7, 2016

I think context cancellation is more common than deadlines.

(And killing a slow SELECT query is much less important)

Why closing connection is not enough for cancellation?

I meant "Implement cancellation by just closing connection" + "Use MAX_EXECUTION_TIME if killing slow query is important".

@bradfitz
Copy link

bradfitz commented Dec 7, 2016

Why closing connection is not enough for cancellation?

Oh, sorry, I misunderstood. If closing the connection makes MySQL stop doing work, that's great.

You could even use it for the deadlines too (which is to say: ignore deadlines entirely and just wait on ctx.Done()), so there's only one mechanism in use.

@methane
Copy link
Member

methane commented Dec 15, 2016

Oh, sorry, I misunderstood. If closing the connection makes MySQL stop doing work, that's great.

No. MySQL doesn't stop query. But client can continue working without waiting result of query.
It's like cancelling HTTP request.

@bradfitz
Copy link

No. MySQL doesn't stop query.

Oh, okay, I'm back to being sad about MySQL.

But client can continue working without waiting result of query.
It's like cancelling HTTP request.

That's not a great analogy. Servers can choose what to do when a TCP connection finishes. That's why Go has https://golang.org/pkg/net/http/#CloseNotifier so servers can stop expensive work if the client doesn't care anymore.

@methane
Copy link
Member

methane commented Dec 16, 2016

Servers can choose what to do when a TCP connection finishes.

But this is a client. Go's http client can't kill server side Apache CGI process.

It's desirable, but not required feature for clients, isn't it?
While most MySQL client library doesn't support cancelling query using side band connection,
There are so many production application using MySQL.

I'm not against adding an option like kill_on_cancel. But I prefer simple implementation by default.
There are many distributed MySQL setup. There are some servers talking MySQL protocol.
I can't test all them.

@kardianos
Copy link

kardianos commented Dec 16, 2016

You could offer an op-out in the DSN.

No, it not required to actually cancel the connection. For the same reason that go is different than PHP, I think it is a good thing to have.

EDIT: To be clear, there are many apps written in PHP and many uses of database drivers without cancelation. I write in Go because I think I can write better apps in it. I recommend implementing cancel because I think we can write better apps with it.

@ghost ghost mentioned this issue Feb 19, 2017
@kardianos
Copy link

It is my understanding both MySQL and Postgresql require using a different connection to cancel a currently running query, but we don't want to exceed any given connection pool limit.

I propose having some method where the driver connection itself can request another connection from the database pool for such cancelation needs. This would ensure any limit the user set the pool did not exceed. This would also remove any need to match on DSN and store a single open connection somewhere in the driver.

Maybe something like:

package driver

// ConnRequester may be implemented by Conn. It should store the provided
// function until the connection is closed. This is intended to be used when canceling
// queries for databases that require another connection to be opened.
//
// The reqConnSameDSN returns a driver connection using the same DSN and a function
// that returns the connection to the connection pool.
type ConnRequester interface {
    Conn

    ConnRequest(req ReqConnSameDSN)
}

type ReqConnSameDSN func(ctx context.Context) (c Conn, returnConn func(error), err error)

package mysql

func (c *MySqlConn) ConnRequest(req driver.ReqConnSameDSN) {
    c.req = req
}

func (c *MySqlConn)  runCancel(f func(*MySqlConn) error) error {
    if c.req == nil {
        return errNoCancel
    }
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) // TODO: Get from config.
    defer cancel()

    cancelerConn, returnConn, err := c.req(ctx)
    if err != nil {
        return err
    }
    defer returnConn(err)
    mysqlCancelerConn :=  (*MySqlConn)(cancelerConn)
    err = f(mysqlCancelerConn)
    return err
}

Please bear in mind the above is just a sketch. Do you think something like this would enable a simpler implementation of cancelation?

@kardianos
Copy link

For reference, here is how lib/pq implemented cancelation: https://github.com/lib/pq/pull/535/files .
This driver could probably do something similar.

@abourget
Copy link

Could that behavior be triggered only when you do use Contexts ? Someone using Context clearly wants to be able to shutdown work, and so killing the remote process is something that'll be expected.. especially since other implementations tend to do the same (and it's the culture with contexts to try to kill as much as possible upon cancellation).

@methane
Copy link
Member

methane commented Feb 23, 2017

For reference, here is how lib/pq implemented cancelation: https://github.com/lib/pq/pull/535/files .
This driver could probably do something similar.

Hm, It seems main goroutine waiting reply regardless context.
So, if sub goroutine failed to connect and send cancel request, (e.g. network problem happens), QueryContext(ctx, ...) may not return for a long time after cancellation. (e.g. it would returns 5 minutes later even when timeout is 5 seconds.)
Is it expected behavior of common Go programmer?

On the other hand, it's pros is, additional goroutine is used only when cancellation happens.
Mostly zero overhead for normal case.

@ghost
Copy link

ghost commented Feb 25, 2017

@kardianos Well regardless of any synchronization solution this is clearly a responsibility of database/sql and not the Driver (A Driver provides support for "query execution" not "query execution order" or manging dependency between arbitrary queries) Off the top of my head, I would expect database/sql not to reuse any Connection that has a Context attached to it until the Driver releases it, but I would need to think about some more in the context of transactions..

@kardianos
Copy link

@PublicForest I don't understand what you are saying. The database pool won't re-use a connection until the driver is done with it. You were talking about a race condition where a cancellation is in flight while it is being returned.

@ghost
Copy link

ghost commented Feb 25, 2017

@kardianos I didn't say anything about connection pool. "re-use" as in execute a successor query on the same connection which has an in-flight Cancellation Go Routine for a predecessor query.
Once database/sql hands a Cancellation Context to the Driver it needs to wait for the Driver to tell it when it is allowed to issue future queries on the Connection as apposed to "fire-and-forget".

@kardianos
Copy link

When the driver query function returns, it is assumed the connection may be re-used.

@methane
Copy link
Member

methane commented Feb 25, 2017

When the driver query function returns, it is assumed the connection may be re-used.

After we send "KILL QUERY", we shouldn't reuse the connection.
Otherwise, next query sent by the connection may be cancelled.

MySQL provides connection id (thread id, unique in one server, not unique in cluster),
but not query id. There are no safe way to cancel one query.

@ghost
Copy link

ghost commented Feb 25, 2017

@kardianos So perhaps there is a need for a unique ID (incrementing int64 counter) for each Context that will be stored within the Connection for the duration of the Query as well as in the Cancellation Go Routine so it can check if it needs to discard the cancellation. I would like to see such a counter maintained by database/sql per DB since it provides the Context to begin with and Connections are scoped to DB, but the Driver can also maintain that on a global level.

@kardianos
Copy link

@PublicForest your last few comments haven't made sense to me.

@ghost
Copy link

ghost commented Feb 25, 2017

@kardianos Sorry to hear that.. If you be specific on what you don’t understand I will try to clarify.

@ghost
Copy link

ghost commented Feb 27, 2017

So it turns out that database/sql will run its own Go routine to observe the user supplied Context.
This Go routine will call the Driver’s Rollback() function and return the Connection to pool when the Context times-out (if there is no error). I’m puzzled as to why it assumes the server is ready to process an SQL Rollback statement at that point but more importantly I haven’t noticed any means of coordinating with this Go routine or getting any information as to why the Rollback was issued at the Driver level. Is there a way for the Driver to observe the Context before database/sql ?

@kardianos
Copy link

@PublicForest You aren't reading the database/sql code entirely correct.

@ghost
Copy link

ghost commented Feb 27, 2017

@kardianos Quite possibly, but thats where the debugger breaks when a WithTimout Context expires...Can you please clarify?

@kardianos
Copy link

@PublicForest Ensure you have looked at the tests that use context in database/sql. https://golang.org/src/database/sql/sql_test.go#L285 is a good place to start reading. Know the difference from the docs between a context provided to the BeginTx and QueryContext.

@ghost
Copy link

ghost commented Feb 28, 2017

@kardianos Don’t get it. Upon timeout nothing prevents awaitDone() go routine to make progress and issue a rollback on an already busy connection... There is even no guarantee that the canceling go routine will block on the Done channel while a long-standing query runs, let alone execute the cancellation code before the rollback call. In any case, why is database/sql even attempting to rollback the TX when the context expires? this is something DB servers do implicitly for START/BEGIN TRAN– COMMIT blocks when receiving cancellation request..

@kardianos
Copy link

@PublicForest What is your motivation for commenting on this issue? Do you plan to send a PR to this driver? I'm not seeing any other pull request or code repo on your profile, so I'm slightly confused.

Most of your questions can be answered in the package documentation.

@ghost
Copy link

ghost commented Mar 1, 2017

@kardianos My motivation was to help out in designing a solution for this.

@methane
Copy link
Member

methane commented Mar 1, 2017

@PublicForest Sorry, I can't understand your English.
I can't understand even you're talking about "database/sql" design or "go-sql-driver/mysql" design.

Please use simple and more descriptive sentence. Example code is helpful too.
Otherwise, your comment is confusing rather than helpful.

@smthpickboy
Copy link

Just found a problem in my testing of transactions and context.
My test code is:

// Init db somewhere
ctx, cancel := context.WithTimeout(context.Background(), 1 * time.Second)
defer cancel()
tx, err := db.BeginTx(ctx, nil)
if err != nil {
	t.Fatal(err)
}
defer tx.Rollback()
rows, err := tx.QueryContext(ctx, "SELECT * FROM dev.person WHERE last_name = ? FOR UPDATE", "Doe")
if err != nil {
	t.Fatal(err)
}
defer rows.Close()

people := []Person{}
for rows.Next() {
	p := Person{}
	err = rows.Scan(&p.FirstName, &p.LastName, &p.Email)
	if err != nil {
		t.Fatal("Scan failed:", err)
	}
	people = append(people, p)
}
fmt.Println("people:", people)
err = rows.Err()
if err != nil {
	t.Fatal("rows err:", err)
}
tx.Commit()

The time column is in DATETIME format with parseTime option enabled.

To demo the condition with high server load, I add sleep 3 sec to packets.go#L742
https://github.com/smthpickboy/mysql/blob/demo-panic-with-context/packets.go

The test code above would panic, because the goroutine(in database/sql) which runs tx.awaitDone() just close the mysqlConn which rows uses, causing the panic.

Sorry if this problem is already reported or known issue but I can't find proper solution except I do some invasive modification to both database/sql and go-sql-driver/mysql code.

I guess the rows type and Tx need some sort of synchronization, such as lock driverConn.

Any help would be really appreciated, thanks a lot!

@kardianos
Copy link

Can you show the panic stack?

@smthpickboy
Copy link

@kardianos
Thanks for reply!
The direct cause of panic is that mc is closed and mc.cfg is set to nil in (*textRows) readRow.
I installed go 1.8 using homebrew.

Here's panic stack:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0x11ef98b]

goroutine 5 [running]:
testing.tRunner.func1(0xc420014b60)
	/usr/local/Cellar/go/1.8/libexec/src/testing/testing.go:622 +0x29d
panic(0x1243520, 0x1399b80)
	/usr/local/Cellar/go/1.8/libexec/src/runtime/panic.go:489 +0x2cf
github.com/go-sql-driver/mysql.(*textRows).readRow(0xc4200f8000, 0xc420100080, 0x4, 0x4, 0x4, 0xc420100040)
	/Users/jacky/work/go/src/github.com/go-sql-driver/mysql/packets.go:745 +0x32b
github.com/go-sql-driver/mysql.(*textRows).Next(0xc4200f8000, 0xc420100080, 0x4, 0x4, 0x4, 0x4)
	/Users/jacky/work/go/src/github.com/go-sql-driver/mysql/rows.go:97 +0x5e
database/sql.(*Rows).nextLocked(0xc4200fe000, 0xc420051c20)
	/usr/local/Cellar/go/1.8/libexec/src/database/sql/sql.go:2151 +0x1bc
database/sql.(*Rows).Next.func1()
	/usr/local/Cellar/go/1.8/libexec/src/database/sql/sql.go:2134 +0x3c
database/sql.withLock(0x1385d20, 0xc4200fe030, 0xc420051c60)
	/usr/local/Cellar/go/1.8/libexec/src/database/sql/sql.go:2547 +0x65
database/sql.(*Rows).Next(0xc4200fe000, 0x13bd340)
	/usr/local/Cellar/go/1.8/libexec/src/database/sql/sql.go:2135 +0x83
<my project>/mysql.TestContext(0xc420014b60)
	/Users/jacky/work/go/src/legitlab.letv.cn/LVAR/commons/mysql/mysql_test.go:132 +0x431
testing.tRunner(0xc420014b60, 0x1295338)
	/usr/local/Cellar/go/1.8/libexec/src/testing/testing.go:657 +0x96
created by testing.(*T).Run
	/usr/local/Cellar/go/1.8/libexec/src/testing/testing.go:697 +0x2ca

@bgaifullin
Copy link
Contributor

bgaifullin commented Apr 28, 2017

What about using net.Conn.SetDeadline(time.Now()) in case if need to interrupt current request.
in this case the implementation can be similar to Postgres implementation with only one difference:
in implementation of method cancel:

func (mc *mysqlConn) cancel() {
    mc.netConn.SetDeadline(time.Now())
}

there is still problem with orphan queries on mysql.server, but client will interrupt any query as soon as cancel on context is called.

there is POC https://github.com/bgaifullin/mysql/tree/support_context2

@julienschmidt
Copy link
Member Author

Fixed by #608

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants