-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache subrings returned by Ring.ShuffleShardWithLookback()
#283
Conversation
tl;dr: I think my mental model is slightly different than what code says, but I can't see a problem with the code. I will spend more time with it. I find the image bit confusing. Lookback period indicates that we are looking for recently-registered instances (within lookback period), however image indicates looking "forward". Here is an example:
Looking at the PR code, I don't understand following details:
(After manually stepping through tests, I think it's doing the right thing, it just doesn't quite match my mental model. I need more time to fully understand the code.) I don't think we can reuse cached entries for different lookback periods. For example in my example, let's say there are three requests for subring, all three are done at 13:00 and use the same identifier, but different lookback periods:
All these are valid replies and can be reused (first can be reused "forever", second can be reused until 13:05, third until 10:00+12h = 22:00). I think we should cache all three, and include lookback period (5m, 1h and 12h respectively) in the cache key. |
I think the difference in the way we're thinking about it is that I think about the lookback windows as being "starting at T, where T = now - lookback period L", whereas based on your explanation, you're thinking about them as "ending at now, looking back L". I think these are equivalent here, based on two conditions:
This might be why my diagram was confusing - it was based on how I think about it. The diagram was meant to show looking backward, but emphasised the start of the lookback window as that's the important timestamp for caching here.
This is me trying to prevent an edge case we might not care about, and is probably best explained with an example. Imagine we have an instance that registered at 10:30, and we're operating with a two hour lookback window. We've previously cached the subring for a user that includes this new instance. It's now 12:29:59, and two goroutines, A and B, are both about to request the subring for the same user. Let's say the following happens:
This is probably another symptom of how I'm thinking about the problem: I might need better names here - how about something like
At present, Mimir is the only system that uses With the code as it stands, I believe it'll behave correctly with the example you gave, but the cache won't be particularly effective (it'll cache the first subring, and recompute but not store the second and third). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! The logic LGTM.
* Bump dskit version to bring in grafana/dskit#283. * Add changelog entry.
Also update httpsnoop to version 1.0.3, which supports Unwrap method as well.
* Exempt 503 (Service Unavailable) from error logging (#97) Part of cortexproject/cortex#810 * Add gRPC streaming middleware. (#95) * Add gRPC streaming middleware. Adds a series of gRPC stream middleware for logging, user header propagation and tracing. This is laying groundwork for using gRPC streaming between ingesters and queriers in Cortex. * Update the vendored versions. * Pin kuberesolver because the API has changed. * Always put code under github.com/weaveworks/common; Dependancies between packages won't work if the code is checked out in github.com/tomwilkie, for instance. Also, common is a very common name for a repo, so its not uncommon for the repo to be called something like weaveworks-common. Signed-off-by: Tom Wilkie <[email protected]> * When trace reporting is enabled, don't enable trace sampling unless asked (#102) * Don't enable trace sampling by default * Don't export new unused member, simplify docstring * Abstract logging for the middleware (#96) * Move dedupe code to dedupe.go * Add unified logging interface. * Adapt middleware/, user/, signals/ and server/ to new logging interfaces. Allow user to specify logging implementation in server, if non is specified default to logrus. As part of this we need log level as server config, and for convenience we expose the server logger. * Update vendoring * Expose the various level types. Signed-off-by: Tom Wilkie <[email protected]> * Don't shutdown the gRPC server the minute a signal is received. (#99) gRPC behaviour should be the same as HTTP: Run() exits when a signal is received, and Stop() stops the handling of requests. And as we no longer call GracefulShutdown in Run, call it in Stop - one should always try to shutdown gracefully. Signed-off-by: Tom Wilkie <[email protected]> * Add user and org ID to span if known (#100) * Propagate tracing context over httpgrpc (#105) * Propagate tracing context over httpgrpc * Log warnings for failed trace inject/extract * Don't use full URL as metric label on unmatched path (#108) Random people on the Internet can send in arbitrary paths, so collapse them all into "other" when reporting metrics. * Raise default server timeouts from 5s to 30s (#109) Primary motivation is to allow a standard Go profiling request to complete, for which I only need the write timeout, but I raised the other 5-second timeouts too since it isn't that unusual to have operations or data uploads take longer. * Return errors when http/grpc servers exit. (#92) * Return errors when http/grpc servers exit. There was a case where the HTTP server died but the process exited. We need to exit when the servers exit. Signed-off-by: Goutham Veeramachaneni <[email protected]> * Expose jaeger metrics (#114) Signed-off-by: Goutham Veeramachaneni <[email protected]> * Improve logging of http errors (#115) * Improve http request logging When logging an http request, e.g. on 500 error, put all the headers and response on one line. Also strip cookies and csrf tokens on security grounds. * Add a test to generate an http error to check formatting * Update httpgrpc to match cortexproject/cortex#910 (#117) * Refactor: extract functions HTTPRequest(), WriteResponse(), WriteError() * Regenerate httpgrpc.pb.go with protoc-gen-gogo from https://github.com/gogo/protobuf Change taken from cortexproject/cortex@f807690 where no explanation was given. * Expose the HTTP server from the server struct. (#118) * Expose the HTTP server from the server struct. This allows users to inject HTTP requests and have the logged, instrumented and traced correctly. * Fix #120: buffer the errChan, as one of the servers could return an error before Run starts listening on errChan. Signed-off-by: Tom Wilkie <[email protected]> * Add HTTP tracing middleware (#119) * Print traceID with request (#123) Signed-off-by: Goutham Veeramachaneni <[email protected]> * Also set global logging when initializing logrus (#122) * Refactor - s/logusEntry/logrusEntry * Also set global logging when initializing logrus This PR sets the global logging instance when calling `logging.Setup(level)`. Previously, only the standard logrus logger was set up and if anyone used the new funcs such as `logging.Info()` it was discarded. * Don't trace http requests when using httpgrpc. (#124) * Don't trace http requests when using httpgrpc. * Demonstrate how the same can be acheived with middleware.Tracer. * Print logrus output in logfmt format (#121) * Print logrus output in logfmt format This changes the way logrous outputs log messages in case there is no TTY attached. Changes from ERRO: 2018/08/23 20:28:10.075952 Sample log message extra=foo to time="2018-08-23T20:28:10Z" level=info msg="Sample log message" extra="foo" which follows the `logfmt` key-value format by also providing `time`, `level`, and `msg` in adition to the extra fields. The PR removes the custom `textFormatter` to have the default `logrus.TextFormatter` in place which takes care of detecting whether a terminal is attached or not and switches between text and logfmt output automatically. It appears the custom formatter was to prevent differences in rendering depending on color/non-color output which I don't necessarily see as an issue. * dep ensure to fix tests * Update more golang/protobuf to gogo/protobuf (#127) * Update 'dep' to 0.5.0 * Update more golang/protobuf to gogo/protobuf * add constraints for downstream users * Label http request handler tracing spans with the matched route name (#126) * Pass options to tracing middleware Signed-off-by: Goutham Veeramachaneni <[email protected]> * Use the route matcher instead Signed-off-by: Goutham Veeramachaneni <[email protected]> * Fix dropped namespace in url parsing (#128) * Update grpc-opentracing to latest version at new home (#129) * go-grpc-middleware has moved (#130) * Update to latest kuberesolver (#131) * Update to latest kuberesolver and use a load balancing policy Signed-off-by: Goutham Veeramachaneni <[email protected]> * Log which ports the server is listening on. (#143) Signed-off-by: Tom Wilkie <[email protected]> * server: add PathPrefix to config for all routes This can be used to serve an API from a prefix (e.g. /prometheus/) without having to re-write every HTTP Handler. * Allow server.Config to be unmarshaled from YAML. Signed-off-by: Tom Wilkie <[email protected]> * Dont flag context.Canceled as an error in tracing Cancellations are always caused by something else, so having them show up as errors in the tracing UI is distracting. * Allow overriding max message sizes and concurrent streams in the server. NB This use the gRPC defaults as the flag defaults. Signed-off-by: Tom Wilkie <[email protected]> * exclude Authorization headers from log messages. * server: use -server.path-prefix for metrics and debug endpoints Previously calling .Subrouter() would overwrite the instrumentation handlers and they'd be ignored when PathPrefix was specified. This meant one could never get back metrics or debug endpoints. * Skip logging context cancelled warning msgs Context cancellations are controlled by clients and logging this message can cause more more confusion as mentioned here: cortexproject/cortex#1279 Signed-off-by: Neeraj Poddar <[email protected]> * Allow log levels to be marshalled to YAML. Test it works. Signed-off-by: Tom Wilkie <[email protected]> * Fix lint warning * Expose RegisterInstrumentation setting Fix #156 Signed-off-by: Xiang Dai <[email protected]> * Update prometheus/client_golang and prometheus/procfs Signed-off-by: Ganesh Vernekar <[email protected]> * Allow user to specify HTTP and gRPC bind addresses (not just ports.) This allows us to use 'localhost' in tests, and prevents an 'Allow connections from...' dialog on MacOS when running unit tests. Signed-off-by: Tom Wilkie <[email protected]> * Change naming from host to address Signed-off-by: Thore Kruess <[email protected]> * Revert changes to server_test.go * Extend tests to check server works with flag defaults (with exception of http port, which we override for the convenience of people running the tests as an ordinary user who can't bind to port 80) * limit the number of connections to server Signed-off-by: Jacob Lisi <[email protected]> * Tell Go runtime we have stopped listening to signals If we exit the handler loop, e.g. after receiving SIGTERM, then we don't want Go to attempt further delivery of signals. * Fix kuberesolver URLs to have three slashes * Add more test cases prompted by review feedback * Trace Websocket requests Update the OpenTracing library to a version which supports Websockets, and remove the workaround for when it didn't. * Update gogo proto to 1.3.0 Signed-off-by: Cyril Tovena <[email protected]> * add flags to configure keep alive configs (#174) * add flags & yaml to configure keep alive configs Signed-off-by: Jacob Lisi <[email protected]> * Fixed gRPC server keepalive flags registration Signed-off-by: Marco Pracucci <[email protected]> * Refactor: extract common error decoding * Re-do protogen for fake_server with current gRPC * Extend server tests with canceled gRPC functions * Extend cancelation checking to include gRPC errors * Extend error instrumentation tests to http * Instrument canceled gRPC calls with a unique label * Return error in configuring jaeger * add log format option * Add tls support to http server Signed-off-by: Annanay <[email protected]> * Use require package in tests, correct TLS fields in server config Signed-off-by: Annanay <[email protected]> * Load certs only if config options are specified Signed-off-by: Annanay <[email protected]> * Add metrics namespace to server Signed-off-by: Annanay <[email protected]> * use same system as level for format * Add tls support to grpc server, add dummy certs Signed-off-by: Annanay <[email protected]> * Addressed @tomwilkie's review comments Signed-off-by: Annanay <[email protected]> * Fix broken tests Signed-off-by: Annanay <[email protected]> * Add TODO note about using copied function Signed-off-by: Annanay <[email protected]> * Use ConfigToTLSConfig util function from node_exporter Signed-off-by: Annanay <[email protected]> * Log errors when write fails, not 200 See discussion here: cortexproject/cortex#2483 (comment) Signed-off-by: Goutham Veeramachaneni <[email protected]> * Address @bboreham's comments Signed-off-by: Annanay <[email protected]> * fix function documentation * Move shebang to first line, add commit hash to link Signed-off-by: Annanay <[email protected]> * Add test and warn only if not context cancelled Signed-off-by: Goutham Veeramachaneni <[email protected]> * Added option to disable signal handling by Server. (#191) * Make signal-handling by Server configurable. * Fixed unrelated lint warnings. * Log X-Forwarded-For for every request Signed-off-by: Michel Hollands <[email protected]> * Address review comments and add Forwarded and X-Real-IP support Signed-off-by: Michel Hollands <[email protected]> * Remove making standard middleware optional Signed-off-by: Michel Hollands <[email protected]> * Remove unused fields as well Signed-off-by: Michel Hollands <[email protected]> * Add option to turn logging of source IPs on Signed-off-by: Michel Hollands <[email protected]> * Add sourceIPs tag in tracing Signed-off-by: Michel Hollands <[email protected]> * Add custom regex for extracting IP from header Signed-off-by: Michel Hollands <[email protected]> * Rename file so it's clearer what it does Signed-off-by: Michel Hollands <[email protected]> * Use better names Signed-off-by: Michel Hollands <[email protected]> * Do no use log prefix in config var name Signed-off-by: Michel Hollands <[email protected]> * Address review comments Signed-off-by: Michel Hollands <[email protected]> * Use same way of access struct Signed-off-by: Michel Hollands <[email protected]> * Address review comments Signed-off-by: Michel Hollands <[email protected]> * Replace interceptor with httpsnoop library. This library supports additional interfaces other than http.Hijacker. Signed-off-by: Peter Štibraný <[email protected]> * Metrics for HTTP and gRPC message sizes and inflight requests. Signed-off-by: Peter Štibraný <[email protected]> * Added metrics test. Signed-off-by: Peter Štibraný <[email protected]> * s/Kuberneretes/Kubernetes/ Signed-off-by: Jack Baldry <[email protected]> * Make HTTP middleware optional (#194) * Make HTTP middleware optional * Allow the Router to be set by caller Signed-off-by: Michel Hollands <[email protected]> * Add exemplar support to middleware Signed-off-by: beorn7 <[email protected]> * tracing: Allow specifying JAEGER_ENDPOINT instead of sampling server or local agent port closes #203 * Expose grpc keepalive enforcement policy options. Signed-off-by: Peter Štibraný <[email protected]> * s/faviourite/preferred Changed the wording to avoid US/UK English concerns. * Set the content-length of httpgrpc request. http.NewRequest can't set it while using a `io.NopCloser`. Signed-off-by: Cyril Tovena <[email protected]> * Replace `ioutil.NopCloser` with a custom one. This way we can easily get direct access to the underlaying `bytes.Buffer`. This means we don't need to copy the data anymore. Signed-off-by: Cyril Tovena <[email protected]> * Fix panic in server i13n matching NotFoundHandler When router can't find the route but it has the NotFoundHandler set, the RouteMatch doesn't have the Route set but it has MatchErr set to ErrNotFound. Added a check for RouteMatch.Route to be not nil before accessing it, and added a special "notfound" route name to be used when NotFoundHandler is used. Signed-off-by: Oleg Zaytsev <[email protected]> * Unindent getRouteName by using early returns Signed-off-by: Oleg Zaytsev <[email protected]> * support custom registerer * Add metric "tcp_connections" for number of TCP connections to server. Counts the number of accepted connections and the number of closed connections, by intercepting the listener for each port (http/grpc). Signed-off-by: Steve Simpson <[email protected]> * add tcpv4 support * tracing: add options to NewFromEnv Signed-off-by: Dave Henderson <[email protected]> * Refactor: move ExtractTraceID functions to tracing package Signed-off-by: Bryan Boreham <[email protected]> * Add exemplars to 'instrument' package Whenever a duration is observed and we have a trace ID, store it as a Prometheus exemplar. Signed-off-by: Bryan Boreham <[email protected]> * refactor: re-use ObserveWithExemplar function Signed-off-by: Bryan Boreham <[email protected]> * Add exemplars to gRPC instrumentation We need to call the tracing interceptor before the metric interceptor so that the latter can extract the trace ID. Signed-off-by: Bryan Boreham <[email protected]> * Allow blank network in server.New There are a lot of tests which create a `server.Config{}` and don't populate the network fields; allow this by setting `DefaultNetwork` if blank. Also pick up a bugfix where `HTTPListenNetwork` was used for gRPC. Signed-off-by: Bryan Boreham <[email protected]> * gRPC stream middleware: move metrics after tracing So that we can pass the trace ID as an exemplar. Signed-off-by: Bryan Boreham <[email protected]> * Migrating from go-kit/kit/log to the slimmer go-kit/log Signed-off-by: Danny Kopping <[email protected]> * Add benchmark for Debugf() Extract function to wrap with level, caller, timestamp. Signed-off-by: Bryan Boreham <[email protected]> * Filter by log level before anything else This saves a lot of work walking the stack to find the caller, and a bit more to format the timestamp. Recommended at go-kit/log#14 (comment) Signed-off-by: Bryan Boreham <[email protected]> * Avoid Sprintf when log line is filtered out Defer the Sprintf until after the level check, by using an auxilliary struct Signed-off-by: Bryan Boreham <[email protected]> * Revert "logging: Add ability to deduplicate log messages (#48)" This reverts commit 0aecff0. It has never de-duplicated a single line for us in production. * fix master build - run `go mod tidy` - change logging test to use `go-kit/log` * replace node_exporter https package with exporter-toolkit The node_exporter/https package has been moved to exporter-toolkit as of v1.1.0. weaveworks/common was still using this package, which was preventing downstream importers (i.e., grafana/agent) from updating their dependency on node_exporter. This change also required bumping kuberesolver to v2.4.0 (the next version after v2.1.0) which uses a newer version of the gRPC library where type names have changed. kuberesolver diff: sercand/kuberesolver@v2.1.0...v2.4.0 * Added tcp_connections_limit metric Signed-off-by: Marco Pracucci <[email protected]> * Add a tag with the user-agent to traces This is useful to understand which kind of client is behaving in a certain way. Signed-off-by: Christian Simon <[email protected]> * undo whitespace changes to server/server_test.go * Merged both tag setters into a single MWSpanObserver Signed-off-by: Christian Simon <[email protected]> * middleware: Add gRPC instrumentation utilities from dskit Signed-off-by: Arve Knudsen <[email protected]> * middleware: Rename client instrumentation functions Signed-off-by: Arve Knudsen <[email protected]> * middleware: Move client instrumentation functions to grpc_instrumentation.go Signed-off-by: Arve Knudsen <[email protected]> * middleware: Slight cleanup Signed-off-by: Arve Knudsen <[email protected]> * add flag to log requests at info level * fix comment for LogRequestsAtInfoLevel Co-authored-by: Victor Cinaglia <[email protected]> * add yaml tag & add LogRequestsAtInfoLevel to RegisterFlags * add method to set log level for logging with request * Add user and org labels to observed exemplars Exemplars provide information about traces, and traces usually reference to the user that served the request, however, searching the user in the trace is usually a tedious task, and it would be nice to be able to overwiew whether the slow requests (exemplars) from your panel corresponds to the same user. Signed-off-by: Oleg Zaytsev <[email protected]> * Revert "add method to set log level for logging with request" This reverts commit 3422262. * call logWithRequest once * middleware: Simplify as suggested by @pracucci Signed-off-by: Arve Knudsen <[email protected]> * fix typo in registering LogRequestAtInfoLevel flag * Removed debug log on successfuly gRPC requests from GRPCServerLog Signed-off-by: Marco Pracucci <[email protected]> * Make debug logging on successful gRPC requests from GRPCServerLog optional * Update server/server.go Co-authored-by: Marco Pracucci <[email protected]> * Apply changes according to code review * server: Remove advanced TLS config parameters Remove advanced TLS config parameters stemming from github.com/prometheus/exporter-toolkit/web, that were introduced in commit 56b8fa7. Motivation for their removal being that users would most likely not want to change them, and they add corresponding configuration parameters to the Grafana Mimir project, that we don't want. We also think they're not interesting to the Grafana Tempo and Loki projects. Signed-off-by: Arve Knudsen <[email protected]> * Revert "Add user and org labels to observed exemplars" * server: Expose `http` and `grpc` listen addresses. The main rationale is for "choosing" random unbinded port for testing. It's one of the Go's idiom to pass port number `0` to enforce Go's net's package to bind to some random port that is free. ``` func main() { httpListener, err := net.Listen("tcp", "0.0.0.0:0") if err != nil { panic(err) } grpcListener, err := net.Listen("tcp", "0.0.0.0:0") if err != nil { panic(err) } h := http.Server{} go func() { log.Println("http serving at", httpListener.Addr()) h.Serve(httpListener) }() g := grpc.Server{} go func() { log.Println("grpc serving at", grpcListener.Addr()) g.Serve(grpcListener) }() time.Sleep(20 * time.Second) } ``` And that will bind random "unassigned" port. ``` 2022/08/08 16:51:49 grpc serving at [::]:35359 2022/08/08 16:51:49 http serving at [::]:44671 ``` The problem is, currently there is no way to know what those ports are. This PR exposes those addresses so that, we can test server easily and pass on the listen addresses to all it's clients. Signed-off-by: Kaviraj <[email protected]> * Bugfix server tcp metrics to use the configured registry or the default Prometheus registry Server TCP related metrics tcp_connections and tcp_connections_limit are currently always registered with the default prometheus registry: https://github.com/prometheus/client_golang/blob/main/prometheus/registry.go#L57 As a result, it can cause an attempt to register duplicate metrics collector failing with a panic. This commit updates these metrics to use the configured Registry, if any, otherwise, the default Prometheus Registry. log and gatherer initializations were also moved for consistency. * Fix incorrect import of obsolete github.com/go-kit/kit/log package Signed-off-by: Dave Henderson <[email protected]> * httpgrpc/server: Update NewClient to not use WithBalancerName On projects consuming this module as a library, gRPC might be set at newer versions where the deprecated #WithBalancerName has been removed already. Given that the version used by this module already offers a forward-compatible method, this commit uses that instead of the deprecated #WithBalancerName. This is a simplified version of #240, without touching the gRPC versions. Fixes #239 Signed-off-by: Juraci Paixão Kröhling <[email protected]> * Fix 'make protos' Use a 3rd-party image `namely/protoc:1.22_1`, which generates identical output for two of the protos in this repo. The third one, httpgrpc, I have not managed to match exactly, but the result is very close. Signed-off-by: Bryan Boreham <[email protected]> * Implement `http.Flusher` interface on Log middleware (#257) This adds `Flush()` to the `middleware.badResponseLoggingWriter`, making it implement `http.Flusher` if the wrapped `http.ResponseWriter` does. * Allow config of TLS cipher suites and min version (#256) * Server: allow config of TLS cipher suites and min version Add a single parameter for each, not split across HTTP and gRPC. Required change upstream - prometheus/exporter-toolkit#110. Downstream projects rely on CLI parameters to generate docstrings, so we add `--server.tls-cipher-suites` and `--server.tls-min-version`. Both CLI and yaml require comma-separated lists of cipher suites, which is different to the yaml array format supported by prometheus/exporter-toolkit. The names accepted are from Go, listed here: https://pkg.go.dev/crypto/tls#pkg-constants Signed-off-by: Bryan Boreham <[email protected]> * Tweak TLS server option help text Signed-off-by: Bryan Boreham <[email protected]> * Reproducable httpgrpc.pb.go build Use gogo protobuf options explicitly to generate the same Equal, String, GoString etc functions in the protobug implementation. Add tools.go to download the gogo.proto dependency. Add vendor directory as include for protoc to load the dependency. Update protoc version otherwise there is a diff in the generated files, most notably const _ = proto.GoGoProtoPackageIsVersion3 is replaced with const _ = proto.GoGoProtoPackageIsVersion2 Signed-off-by: György Krajcsovits <[email protected]> * Make gogo.proto import vendoring agnostic prometheus/alertmanager imports gogo.proto as gogoproto/gogo.proto This is because it does not use vendoring and needs to look up gogoproto in the module cache, which results in a path like /home/user/go/pkg/mod/github.com/gogo/[email protected] To avoid having @v1.3.2 in the proto definition it then sets this as an include path and import only gogoproto/gogo.proto. Signed-off-by: György Krajcsovits <[email protected]> * Updated prometheus/exporter-toolkit to v0.8.1 * Fixed tests * Use ChainInterceptor from upstream grpc And eliminate dependency on `grpc-ecosystem/go-grpc-middleware`. Functionality is identical; the feature was added upstream in v1.21 for client side and v1.28 for server side. Signed-off-by: Bryan Boreham <[email protected]> * Small testcode lint fixes (#269) * Use built in string conversion of recoder.Body * Defer close after error check VS Code complains otherwise. Signed-off-by: György Krajcsovits <[email protected]> * server test: don't compare error for exact equality DeepEqual is brittle against changes in implementation. Signed-off-by: Bryan Boreham <[email protected]> * server: remove unused fields from Client struct Signed-off-by: Bryan Boreham <[email protected]> * replace deprecated 'grpc.WithInsecure' Signed-off-by: Bryan Boreham <[email protected]> * middleware: fix lint complaint about buf.String Signed-off-by: Bryan Boreham <[email protected]> * miscelaneous lint fixes Checking errors, etc. Signed-off-by: Bryan Boreham <[email protected]> * server: check error before closing connection lint was complaining Signed-off-by: Bryan Boreham <[email protected]> * Log middleware now accepts a list of headers to exclude Previously the log middleware would exclude printing only the headers "Cookie", "Authorization" and "X-Csrf-Token". I believe the users of this library may want to ommit more headers than those. To do so I've changed how the middleware is initialize to optionally add more headers to the list. Also I've added a unit test to make sure the behavior works as expected. Also I've added an optional configuration parameter to the server to set LogRequestHeaders from there. This parameter was already an option in the log middleware but was not exposed in the server. Signed-off-by: Jesus Vazquez <[email protected]> * Enable native histograms for the request_duration_seconds histogram This is an opt-in via setting Config.MetricsNativeHistogramFactor so that nobody will expose native histograms by accident. Even if native histograms are enabled, the conventional histograms are still maintained and presented as before to scrapers incapabale of scraping native histograms. This commit also includes an update of prometheus/client_golang to v1.14.0 as that is the earliest release supporting native histograms. Native histograms were tested previously in the sparse-histograms branch without any problems. This commit obsoletes the sparse-histograms branch. The histograms request_message_bytes and response_message_bytes are left as conventional histograms for now. Maybe they would work well as native histograms with a bucket factor of 2, as high resolution is not the aim here. But we can decide about that once we have gathered more experience with the native histograms for request_duration_seconds. The settings for limiting the bucket count (i.e. NativeHistogramMaxBucketNumber = 100 and NativeHistogramMinResetDuration = 1h) are both conservative and most likely good enough for the expected use cases. We can make them configurable later, if users feel the need. The idea here is to not add too many knobs prematurely and thereby avoid confusion. Signed-off-by: beorn7 <[email protected]> * update grpc * Add Unwrap method to http.ResponseWriter implementations. (#283) Also update httpsnoop to version 1.0.3, which supports Unwrap method as well. * Add description to command line flags (#287) Add description to log-request command line flags * Add support to route both GRPC and HTTP over the HTTP server (#288) * add support for grpc on the http port Signed-off-by: Joe Elliott <[email protected]> * Make DisableRequestSuccessLog configurable. (#284) - NewLogMiddleware is a public method. Adding a new parameter would make this PR a breaking change one. - However, the behavior is the same: whatever is configured for the existing DisableRequestSuccessLog, it will be used by the log middleware. - Test option to not log successful requests. * Allow users to manually register metrics * Clean up some linter warnings * grpc: use errors.Is to check if error is Canceled This improves behaviour when one error wraps another. Signed-off-by: Bryan Boreham <[email protected]> * Make gRPC logging optional via a custom interface (#299) * middleware: add OptionalLogging interface with method ShouldLog E.g. if the error is caused by overload, then we don't want to log it because that uses more resource. * Add test for gRPC logging, patterned after the one for http logging. Signed-off-by: Bryan Boreham <[email protected]> * Add provenance and license comments to files migrated from weaveworks/common, update package names * Update imports to match new package paths * Fix linting issues. * Fix package paths in protobuf descriptors and regenerate with gogo/protobuf * Fix linting issues in migrated code. * Remove dependency on deprecated package * Replace use of `golang.org/x/net/context` with `context` * Replace use of `prometheus` package with `promauto` where possible * Add changelog entry. --------- Signed-off-by: Tom Wilkie <[email protected]> Signed-off-by: Goutham Veeramachaneni <[email protected]> Signed-off-by: Neeraj Poddar <[email protected]> Signed-off-by: Xiang Dai <[email protected]> Signed-off-by: Ganesh Vernekar <[email protected]> Signed-off-by: Thore Kruess <[email protected]> Signed-off-by: Jacob Lisi <[email protected]> Signed-off-by: Cyril Tovena <[email protected]> Signed-off-by: Marco Pracucci <[email protected]> Signed-off-by: Annanay <[email protected]> Signed-off-by: Michel Hollands <[email protected]> Signed-off-by: Peter Štibraný <[email protected]> Signed-off-by: Jack Baldry <[email protected]> Signed-off-by: beorn7 <[email protected]> Signed-off-by: Oleg Zaytsev <[email protected]> Signed-off-by: Steve Simpson <[email protected]> Signed-off-by: Dave Henderson <[email protected]> Signed-off-by: Bryan Boreham <[email protected]> Signed-off-by: Danny Kopping <[email protected]> Signed-off-by: Christian Simon <[email protected]> Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Juraci Paixão Kröhling <[email protected]> Signed-off-by: György Krajcsovits <[email protected]> Signed-off-by: Jesus Vazquez <[email protected]> Signed-off-by: Joe Elliott <[email protected]> Co-authored-by: Julius Volz <[email protected]> Co-authored-by: Tom Wilkie <[email protected]> Co-authored-by: Marcus Cobden <[email protected]> Co-authored-by: Bryan Boreham <[email protected]> Co-authored-by: Goutham Veeramachaneni <[email protected]> Co-authored-by: Roland Schilter <[email protected]> Co-authored-by: Adam Shannon <[email protected]> Co-authored-by: Tom Wilkie <[email protected]> Co-authored-by: Anthony Woods <[email protected]> Co-authored-by: Neeraj Poddar <[email protected]> Co-authored-by: Bryan Boreham <[email protected]> Co-authored-by: Xiang Dai <[email protected]> Co-authored-by: Ganesh Vernekar <[email protected]> Co-authored-by: Thore <[email protected]> Co-authored-by: Jacob Lisi <[email protected]> Co-authored-by: Cyril Tovena <[email protected]> Co-authored-by: Marco Pracucci <[email protected]> Co-authored-by: Aditya C S <[email protected]> Co-authored-by: Sean Liao <[email protected]> Co-authored-by: Annanay <[email protected]> Co-authored-by: Goutham Veeramachaneni <[email protected]> Co-authored-by: Roli Schilter <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> Co-authored-by: Michel Hollands <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> Co-authored-by: Jack Baldry <[email protected]> Co-authored-by: Daniel Holbach <[email protected]> Co-authored-by: Michel Hollands <[email protected]> Co-authored-by: beorn7 <[email protected]> Co-authored-by: Chance Zibolski <[email protected]> Co-authored-by: Oleg Zaytsev <[email protected]> Co-authored-by: Robert Fratto <[email protected]> Co-authored-by: Steve Simpson <[email protected]> Co-authored-by: 3Xpl0it3r <[email protected]> Co-authored-by: Dave Henderson <[email protected]> Co-authored-by: Danny Kopping <[email protected]> Co-authored-by: Christian Simon <[email protected]> Co-authored-by: Arve Knudsen <[email protected]> Co-authored-by: colin-stuart <[email protected]> Co-authored-by: Victor Cinaglia <[email protected]> Co-authored-by: Jeanette Tan <[email protected]> Co-authored-by: zenador <[email protected]> Co-authored-by: Kaviraj <[email protected]> Co-authored-by: Susana Ferreira <[email protected]> Co-authored-by: Juraci Paixão Kröhling <[email protected]> Co-authored-by: Stefan Prodan <[email protected]> Co-authored-by: Justin Lei <[email protected]> Co-authored-by: György Krajcsovits <[email protected]> Co-authored-by: Sebastian Rabenhorst <[email protected]> Co-authored-by: George Krajcsovits <[email protected]> Co-authored-by: Jesus Vazquez <[email protected]> Co-authored-by: Alan Protasio <[email protected]> Co-authored-by: Joe Elliott <[email protected]> Co-authored-by: Dylan Guedes <[email protected]> Co-authored-by: Piotr Gwizdala <[email protected]>
What this PR does:
This PR adds caching for
Ring.ShuffleShardWithLookback()
.ShuffleShardWithLookback
is a potentially expensive operation (see benchmarks in #281), and the results are cacheable.As far as I can tell, the only system that uses$T_1$ , then subsequent requests for $T_1$ .
ShuffleShardWithLookback
is Mimir - I couldn't find any references in Loki, Tempo or Phlare. So I've implemented a cache that suits Mimir's access pattern, namely that we only request subrings with lookback windows later than previous lookback windows. For example, if we request a subring foruser-1
with lookback window starting atuser-1
will have a lookback window starting at or afterShuffleShardWithLookback
will still return correct results if this access pattern is not followed, but the cache will not be effective.Cached subrings are returned if no members of the subring registered between the beginning of the cached lookback window and the beginning of the requested lookback window. This is best illustrated visually:
Explanation:
(Please check my logic carefully here: I'm not super familiar with the ring implementation and may have missed some edge cases.)
The cache implementation follows a similar pattern to the caching implemented for
Ring.ShuffleShard()
, and all cached subrings are discarded when the ring topology changes.Which issue(s) this PR fixes:
(none)
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]