-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Log GrcpReadJournal connection failure details at debug #1231
Conversation
// Users still want to be able to figure out non-transient errors, so log with full exception details at debug | ||
val port = clientSettings.servicePortName.getOrElse(clientSettings.defaultPort.toString) | ||
if (log.isDebugEnabled) | ||
log.debug(s"Connection to ${clientSettings.serviceName}:$port for stream id $streamId failed or lost", ex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this enough? We typically don't have debug enabled. Could we add more details to the ConnectionException
instead, which I assumed will be logged at an outer layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We initially had that as a normal exception, but then losing connections is a pretty normal problem, and since it reconnects it's usually not a problem.
Full stack traces becomes a large amount of noise in the logs in a lot of scenarios, so we made the exception text short and skipped the stack trace back in #980
Best would be if we could make the exception in a scenario where it is repeatedly retry-failing for a number of times, but there is no way to observe that, the exception is thrown here but the retry is done by a restart source elsewhere. I can't think of a good way to do this other than possibly a change in how the restart source logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I agree about the stack trace, but can't we make the exception message contain some more information for this scenario?
Or do you mean that the interesting stuff is inside the root cause and we can't know the interesting scenario and it would be too much to include the message of the root cause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect the specific TLS exception details, if present, are multiple causes down in a stack trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then this is better than nothing, let's merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
References https://github.com/lightbend/kalix-runtime/issues/2916