-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/util/log: introduce fluent.sink.conn.errors
metric to log package
#106607
pkg/util/log: introduce fluent.sink.conn.errors
metric to log package
#106607
Conversation
4eb0a10
to
0e59caf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main feedback concerns the structure of the new package and the ownership of the registry (1st and 2nd commit). The rest checks out.
Reviewed 5 of 5 files at r1, 12 of 12 files at r2, 5 of 5 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)
pkg/cli/demo.go
line 235 at r3 (raw file):
// decide how to enable logging. serverCfg.Stores.Specs = nil logMetrics := metric.NewLogMetricsRegistry()
I believe this change should be moved form the 3rd commit to the 2nd commit.
pkg/server/tenant.go
line 651 at r3 (raw file):
lr, ok := r.(*metric.Registry) if !ok { panic(fmt.Sprintf("unexpected type for LogMetrics registry: %T", r))
panic(errors.AssertionFailedf(...))
pkg/util/log/flags.go
line 93 at r2 (raw file):
// // The returned logShutdownFn can be used to gracefully shut down logging facilities. func ApplyConfig(config logconfig.Config, metrics LogMetrics) (logShutdownFn func(), err error) {
I don't love this change.
It seems to me that the (new) logmetrics
package should instantiate the registry in a init
function as a singleton, inject it into the log package only once (with a specific function just for that purpose).
Then the cli/start
code would retrieve a reference to that singleton for passing to the server initialization.
(I am especially concerned by CLI unit tests that call start
repeatedly. I don't think we want each iteration of the unit test to create a new metric registry.)
pkg/util/metric/log_metrics.go
line 11 at r1 (raw file):
// licenses/APL.txt. package metric
my preference would be to have the log metrics defined in a sub-package of log
, e.g. util/log/logmetrics
.
This way someone who is exploring the code around logging gets a "full picture" from the sub-directory tree.
pkg/util/metric/log_metrics.go
line 68 at r1 (raw file):
l.mu.Lock() defer l.mu.Unlock() if counter, ok := l.mu.counters[metric]; ok {
It seems to me that if the metric does not exist, the program should crash -- it would be an implementation bug.
Maybe a panic? panic(errors.AssertionFailedf(...))
0e59caf
to
ea63366
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, it was very helpful. RFAL!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @knz)
pkg/server/tenant.go
line 651 at r3 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
panic(errors.AssertionFailedf(...))
Ack, but no longer needed as we don't need to worry about this case anymore 😌
pkg/cli/demo.go
line 235 at r3 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I believe this change should be moved form the 3rd commit to the 2nd commit.
Ack - no longer relevant as this entire init process has been greatly simplified thanks to your other comment 🙂
pkg/util/log/flags.go
line 93 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I don't love this change.
It seems to me that the (new)
logmetrics
package should instantiate the registry in ainit
function as a singleton, inject it into the log package only once (with a specific function just for that purpose).Then the
cli/start
code would retrieve a reference to that singleton for passing to the server initialization.(I am especially concerned by CLI unit tests that call
start
repeatedly. I don't think we want each iteration of the unit test to create a new metric registry.)
Thanks for this advice, I somehow felt the entire thing was still too brittle.
Using an init
function made everything much simpler & we no longer have to worry about nil
log metrics.
I ended up adding an init
to the MetricsRecorder
file for this, since the new logmetrics package isn't imported many places otherwise. LMK if you think there's a better place for this.
pkg/util/metric/log_metrics.go
line 11 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
my preference would be to have the log metrics defined in a sub-package of
log
, e.g.util/log/logmetrics
.
This way someone who is exploring the code around logging gets a "full picture" from the sub-directory tree.
Done.
pkg/util/metric/log_metrics.go
line 68 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
It seems to me that if the metric does not exist, the program should crash -- it would be an implementation bug.
Maybe a panic?panic(errors.AssertionFailedf(...))
Agreed & Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 23 files at r5, 2 of 16 files at r6, 4 of 17 files at r7, 14 of 14 files at r8, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)
pkg/server/server.go
line 1844 at r8 (raw file):
// Grab the log metrics registry. r := log.LogMetricsRegistry()
nit: you could retrieve the registry directly with the right type (*metric.Registry
) if you put the accessor in package logmetrics
instead.
pkg/server/tenant.go
line 646 at r8 (raw file):
// Grab the log metrics registry. r := log.LogMetricsRegistry()
ditto
pkg/server/status/recorder.go
line 134 at r8 (raw file):
nodeRegistry *metric.Registry // logRegistry contains the global metrics registry used by the logging // package. NB: code paths exist where this registry is never initialized,
nit: pls provide an example of such code path.
pkg/server/status/recorder.go
line 159 at r8 (raw file):
// The logmetrics package is not imported by much, meaning // that its own init function is not a reliable place to initialize // log metrics. Being that the recorder is the primary user of
I disagree. you can use
import (
_ ".../util/log/logmetrics"
)
at the top of this file, and you'd get the right behavior.
pkg/server/status/recorder.go
line 254 at r8 (raw file):
nodeReg.AddLabel("node_id", strconv.Itoa(int(desc.NodeID))) if !mr.hasLabel(logReg.GetLabels(), "node_id") { logReg.AddLabel("node_id", strconv.Itoa(int(desc.NodeID)))
This does not work for me. The log metric registry should be read-only.
If we have multiple in-memory servers running one after the other (e.g. think about multiple go tests in one package), the logic here would make the first one "win" the right to get its node_id
in there. The others that then access the log package afterwards would not have "their" node_id in there and the person looking at the log output would be utterly confused
Is there a way to have the counters in the singleton in logmetrics
, and the labels in an object that's different in each serveR?
pkg/util/log/clog.go
line 115 at r8 (raw file):
} // LogMetricsRegistry returns the metrics registry used by
As explained above, you can put this accessor in logmetrics
. The Set function below can remain here.
ea63366
to
9a8bf20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, great feedback. I've made some changes, PTAL!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @knz)
pkg/server/server.go
line 1844 at r8 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
nit: you could retrieve the registry directly with the right type (
*metric.Registry
) if you put the accessor in packagelogmetrics
instead.
Thanks for the tip, this worked well and cleaned up the interface. Done.
pkg/server/tenant.go
line 646 at r8 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
ditto
Done.
pkg/server/status/recorder.go
line 134 at r8 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
nit: pls provide an example of such code path.
Ah, this was leftovers that I missed/forgot to remove. No such case exists anymore, since the recorder imports pkg/util/log/logmetrics
, which ensures initialization.
Removed.
pkg/server/status/recorder.go
line 159 at r8 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I disagree. you can use
import ( _ ".../util/log/logmetrics" )at the top of this file, and you'd get the right behavior.
Thanks for this, it worked nicely. Done.
pkg/server/status/recorder.go
line 254 at r8 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
This does not work for me. The log metric registry should be read-only.
If we have multiple in-memory servers running one after the other (e.g. think about multiple go tests in one package), the logic here would make the first one "win" the right to get its
node_id
in there. The others that then access the log package afterwards would not have "their" node_id in there and the person looking at the log output would be utterly confusedIs there a way to have the counters in the singleton in
logmetrics
, and the labels in an object that's different in each serveR?
Good point, thank you for bringing this up.
What we can do is export a function from the package, logmetrics.NewRegistry()
, that adds the same singleton counters to a fresh registry per-server. This way, each server can add its own labels without anything colliding.
I've implemented this and found it worked nicely during testing. LMK your thoughts.
pkg/util/log/clog.go
line 115 at r8 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
As explained above, you can put this accessor in
logmetrics
. The Set function below can remain here.
Done, great tip! Thanks.
8aadbf0
to
d3e490a
Compare
43ee2f8
to
4be4af6
Compare
Logging is a critical subsystem within CRDB, but as things are today, we have very little observability into logging itself. For starters, we have no metrics in the logging package at all! This makes it difficult to observe things within the log package. For example, if a fluent-server log sink fails to connect to FluentBit, how can we tell? We get some STDOUT message, but that's about it. It's time to get some metrics into the log package. Doing so is a bit of a tricky dance, because pretty much every package in CRDB imports the logging package, meaning you almost always get a circular dependency when trying to make use of any library within pkg/util/log. Therefore, we must inject metrics functionality into the logging package. This patch provides a means of doing so. Within the log package, we add a new interface called `LogMetrics` with functions that enable its users to modify metrics. The implementation of the interface can live elsewhere, where circular dependencies aren't such a pain. We can then inject the implementation into the log package. This patch also provides a basic implementation to be used. Future patches will plumb the implementation into the log package and actually modify the supported metrics. Release note: none
This patch plumbs the LogMetricRegistry into the log package as part of the initialization of the package where the implementation lives. With the log metrics plumbed into the log package and made available in the server, we can now add them to the metrics recorder used by the server, as well as begin using metrics within the log package. Release note: none
This patch adds the necessary plumbing to inject the metrics used by the logging package into the MetricsRecorders used by servers in CockroachDB. With this plumbing complete, we're ready to begin using metrics within the log package. Release note: none
Previous patches introduced the infrastructure necessary to record metrics within the log package without introducing circular dependencies. With all that plumbing done, we can now increment the metric representing fluentbit connection errors whenever a fluent-server log sink fails to establish a connection. Release note (ops change): This patch introduces the counter metric `fluent.sink.conn.errors` to the CockroachDB tsdb, which is exposed to `/_status/vars` clients as `fluent_sink_conn_errors`. The metric is incremented whenever a `fluent-server` log sink fails to establish a connection to the log sink pointed to by the `address` for the sink in the provided log config.
4be4af6
to
abd9c99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 14 files at r9, 2 of 3 files at r10, 5 of 5 files at r11, 1 of 1 files at r12, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @dhartunian)
TFTR! bors r=knz |
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
Logging is a critical subsystem within CRDB, but as things
are today, we have very little observability into logging
itself. For starters, we have no metrics in the logging
package at all!
This makes it difficult to observe things within the log
package. For example, if a fluent-server log sink fails
to connect to FluentBit, how can we tell? We get some STDOUT
message, but that's about it.
It's time to get some metrics into the log package.
Doing so is a bit of a tricky dance, because pretty much every
package in CRDB imports the logging package, meaning you
almost always get a circular dependency when trying to make
use of any library within pkg/util/log. Therefore, this PR
injects metrics functionality into the logging package.
It does so via a new interface called
LogMetrics
withfunctions that enable its users to modify metrics.
The implementation of the interface can live elsewhere,
such as the metrics package itself, whe circular dependencies
aren't such a pain. We can then inject the implementation
into the log package.
With all that plumbing done, the PR also introduces a new metric,
fluent.sink.conn.errors
representing fluentbit connection errorswhenever a fluent-server log sink fails to establish a connection.
We can see the metric represented below in a test, where no
![Screenshot 2023-07-11 at 1 32 57 PM](https://private-user-images.githubusercontent.com/8194877/252747651-ccacf84e-e585-4a76-98af-ed145629f9ef.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyOTQ3MTIsIm5iZiI6MTczOTI5NDQxMiwicGF0aCI6Ii84MTk0ODc3LzI1Mjc0NzY1MS1jY2FjZjg0ZS1lNTg1LTRhNzYtOThhZi1lZDE0NTYyOWY5ZWYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTFUMTcyMDEyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZWEwYjZkYjQyM2VlYmMxMmY3ZjUyMTc1OTkyNzA4ZGZmNDVlNWM1Y2JkNDZiNzVlNWY3ODM3ZTFhNDY3YjI2NyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.Ba6pA6qgyOcivlmRTgA9Xy_J4PDM6Am1nr8LoobRSAk)
fluent-server was running for a moment, before starting it.
Note that the connection errors level off once the server was
started:
Release note (ops change): This patch introduces the counter
metric
fluent.sink.conn.errors
to the CockroachDB tsdb,which is exposed to
/_status/vars
clients asfluent_sink_conn_errors
. The metric is incremented whenevera
fluent-server
log sink fails to establish a connection tothe log sink pointed to by the
address
for the sink in theprovided log config.
Epic: CC-9681
Addresses: #102753