-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bad connection errors are incomplete and do not provide context of the instance being connected to #1385
Comments
"talk to thousands of MySQL" is not typical use case of most user of this driver. I won't break backward compatibility only for such rare use case. By the way, we print actual low level error behind |
"talking to thousands of MySQL servers" may not be a typical use case, but you provide the MySQL golang driver, so while I do not expect special handling of my specific use case I do believe the current driver's error handling in this case could be improved. Which "backward compatibility" changes do you think this makes? A colleague said:
So my concern right now is still the same:
e.g. A master dies (this happens and is expected) and so we need to choose the most up to date replacement, and then relocate the other replicas under the new master. Often we see the relocation request to a server we'd seen as being healthy < 5 seconds previously fails. A new connection is opened and we get a "driver: bad connection" error. That's all I see and I don't know why the connection failed. That's a real concern. Of course if I log into the server it's working fine. I can certainly try |
If stderr is not good for you, you can use SetLogger. We can not support 100% use case. In general, it's sysadmin job that reading logs and investigate errors. Applications don't need precise information. We do not know about orchestrator. We can not fix your errors. We do not know what erorr/log is needed to fix your errors. Please vendor & patch (or fork) and add anything you need in it. |
Thank you for teaching this. We support Go 1.13 currently. But don't wait we drop Go 1.13-1.17 support. Note that most ErrBadConn errors are consumed by database/sql retry loop. They are not returned to caller unless it happened in the last retry. So checking log is much better way than trying to retrieve all debug information from error. |
Hi @methane, I will look to see at least if for now forking the driver to get a better insight into what is actually happen will give us hints on the problems we are seeing. Yet partially I find it somewhat disconcerting that the error:
In both cases from what I can see there may be multiple "flavours" of this. MySQL natively returns an indication of the MySQL error but often part of the error response provides something that comes from the OS:
So you get some information from the MySQL perspective but it might be that the MySQL server gives you more details which come from the OS itself. Perhaps that additional information might be useful when the I see that the code returns Either way the error in |
A further thought here is that if there is an error of this type the upper layer retries the statement and the application may be completely unaware there was a problem. This is partly good as the app sees no problem, but it is also unaware the connection may be flaky. If an error is reported by database/sql it will be on the retried attempt so the original error will be lost. Is that right? Even worse. It might be nice to have such metrics/statistics available somewhere so the application can detect or determine such issues. If you have any sort of long running process or daemon which talks to one or more databases I can see this being interesting, but I would expect this to be perhaps exposed via database/sql or through database/sql. Maybe that is not feasible now. Exposing some metrics from the MySQL driver would be fine. If the statitics provided were only error statics at least the happy path would not be affected performancewise and the overhead of updating error counters should be negligible anyway. Do you have any thoughts on these points? |
Yes. database/sql catches most ErrBadConn and doesn't return it to application.
"print debug" is the best way. |
FWIW: I am not sure this really works. I also checked about golang/go#48309 and this is a feature request to make the retry setting optional (or configurable) which I think is what I would want for normal processing in my case of "testing health of servers I manage/monitor". That still requires you to modify your driver to return different, wrapped errors depending on the specific issues currently seen that generate Note the comment from golang/go#48309 (comment):
Related I also notice that setting So it may be good to mention this option to people as the configured behaviour might not match what's expected. |
What is this project's philosophy around supporting different versions of go? Go itself no longer supports any of the versions you mentioned, even with security patches, because their commitment is "Each major Go release is supported until there are two newer major releases" (ref). When could we expect changes that require the 1.18 interface to be incorporated? Our use case for application awareness of Edit: I see you have this stated in the README already! "We aim to support the 3 latest versions of Go." So, since it's likely go1.20 will be released in 1-2 months that is the earliest it would be reasonable to request any breaking change requiring go1.18. |
I'm concerned there is no progress on this. I think we can probably provide a PR which we're trying internally to resolve the issue and it seems to work fine for us. Reporting of the issue is provided in a better way and also the error is wrapped to ensure that the upstream database/sql layer gets the expected error, but they already handle that. So if you'd like us to provide a PR to inspect and that would make progress I see that as being very valuable. I just need to confirm with colleagues prior to doing that. |
Now logger can be configurable per DB. See #1408 |
Issue description
I manage an application called orchestrator which talks to thousands of MySQL servers and is used to detect MySQL failures and if a master or intermediate master fails to re-arrange the replication topology of the cluster to ensure it can continue to be used. This system has a reasonably short connect timeout of 1 second.
The problem I see is that we get quite a lot of errors when connecting to "the database" prior to doing operations such as reconfiguring the server or reconfiguring from which other MySQL server it should replicate. The failed connect errors are the concern. See: https://jira.percona.com/browse/DISTMYSQL-261 for some context of the issue from the application side.
Orchestrator is currently using the v1.6.0 version of the MySQL driver. I'm aware there's a newer version v1.7.0 but as far as I can see there's no change in logic around the topic being discussed.
The error message itself is "extremely vague":
driver: bad connection
does not indicate the actual problem and the logging does not indicate to which host:port this is actually happening and given the fact I'm continually polling a large number of MySQL servers identifying the source of the problem and the exact cause is actually quite important as orchestrator is intended to determine if the MySQL host is healthy or not, so identifying the reason for a connection failure is important.The bad connection errors are not that frequent but do add up. The problem is it's hard with current code to identify the source of the issue OR the specific issue that is being caused in a single log line.
I see that currently the driver logging combines multiple different conditions under the same umbrella term
errBadConnNoWrite
and in some cases it logs the error independently of the error returned to the caller.It would seem better, given recent changes in go error handling, to extend the
errBadConnNoWrite
errors returned to the caller into separate errors for each condition triggered so that applications can still detect this error witherrors.Is(err, errBadConnNoWrite)
but by wrapping the specific error with the error picked up earlier in the code (within the driver) the full error can be returned to the caller and identified more completely.Ideally I'd like the driver to report in the error something like the
mc.cfg.Addr
value of the host being talked to. If that is not considered acceptable then it would be necessary for the caller to be adapted to record this information for all connections so it can be logged with the error received from the driver when it happens.Summarising: the exact cause of
driver: bad connection
errors is not clearly identified in the error returned to the caller.I believe it would be good to return a more detailed error to clarify each of the different cases where it is returned more explicitly and suggest that users use
errors.Is()
to check forerrBadConnNoWrite
if existing functionality is required. If possible include the address of the host where this happens.If there is a better way to identify the specific errors and the host to which they correspond then please share your thoughts.
Error log
See: https://jira.percona.com/browse/DISTMYSQL-261 for some details/examples
Configuration
Driver version (or git SHA):
*Go version:
quay.io/centos/centos:stream8
docker images now show1.19.2
.*Server version:
*Server OS:
The text was updated successfully, but these errors were encountered: