forked from cortexproject/cortex
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics for remaining planned compactions #3
Open
ac1214
wants to merge
16
commits into
shuffle-sharding-compactor
Choose a base branch
from
add-metrics
base: shuffle-sharding-compactor
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ortexproject#4262) * add MaxRetries to WaitInstanceState Signed-off-by: Albert <[email protected]> * update CHANGELOG.md Signed-off-by: Albert <[email protected]> * Add timeout for waiting on compactor to become ACTIVE in the ring. Signed-off-by: Albert <[email protected]> * add MaxRetries variable back to WaitInstanceState Signed-off-by: Albert <[email protected]> * Fix linting issues Signed-off-by: Albert <[email protected]> * Remove duplicate entry from changelog Signed-off-by: Albert <[email protected]> * Address PR comments and set timeout to be configurable Signed-off-by: Albert <[email protected]> * Address PR comments and fix tests Signed-off-by: Albert <[email protected]> * Update unit tests Signed-off-by: Albert <[email protected]> * Update changelog and fix linting Signed-off-by: Albert <[email protected]> * Fixed CHANGELOG entry order Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Albert <[email protected]> Co-authored-by: Marco Pracucci <[email protected]>
* MergeIterator: allocate less memory at first We were allocating 24x the number of streams of batches, where each batch holds up to 12 samples. By allowing `c.batches` to reallocate when needed, we avoid the need to pre-allocate enough memory for all possible scenarios. * chunk_test: fix innacurate end time on chunks The `through` time is supposed to be the last time in the chunk, and having it one step higher was throwing off other tests and benchmarks. * MergeIterator benchmark: add more realistic sizes At 15-second scrape intervals a chunk covers 30 minutes, so 1,000 chunks is about three weeks, a highly un-representative test. Instant queries, such as those done by the ruler, will only fetch one chunk from each ingester. Signed-off-by: Bryan Boreham <[email protected]>
* Expose default configuration values for memberlist. Set the defaults for various memberlist configuration values based on the "Default LAN" configuration. The only result of this change is that the defaults are now visible and are in the documentation. This also means that if the default values change, then the changes are visible in the documentation, where as before they would have gone unnoticed. To prevent this being a breaking change, the existing behaviour is retained, in case anyone is explicitly setting the values to zero and expecting the default to be used. Signed-off-by: Steve Simpson <[email protected]> * Remove use of zero value as default value indicator. Signed-off-by: Steve Simpson <[email protected]> * Review comments. Signed-off-by: Steve Simpson <[email protected]> * Review comments. Signed-off-by: Steve Simpson <[email protected]>
cortexproject#4342) * Allow setting ring heartbeat timeout to zero to disable timeout check. This change allows the various ring heartbeat timeouts to be configured with zero, as a means of disabling the timeout. This is expected to be used with a separate enhancement to allow disabling heartbeats. When the heartbeat timeout is disabled, instances will always appear as healthy in the ring. Signed-off-by: Steve Simpson <[email protected]> * Review comments. Signed-off-by: Steve Simpson <[email protected]>
…time. (cortexproject#4317) * Add a new config and metric for reporting ruler query execution wall time. Signed-off-by: Tyler Reid <[email protected]> * Spacing and PR number fixup Signed-off-by: Tyler Reid <[email protected]> * Wrap the defer in a function to make it defer after the return rather than after the if block. Add a unit test to validate we're tracking time correctly. Signed-off-by: Tyler Reid <[email protected]> * Use seconds for our duration rather than nanoseconds Signed-off-by: Tyler Reid <[email protected]> * Review comment fixes Signed-off-by: Tyler Reid <[email protected]> * Update config flag in the config docs Signed-off-by: Tyler Reid <[email protected]> * Pass counter rather than counter vector for metrics query function Signed-off-by: Tyler Reid <[email protected]> * Fix comment in MetricsQueryFunction Signed-off-by: Tyler Reid <[email protected]> * Move query metric and log to separate function. Add log message for ruler query time. Signed-off-by: Tyler Reid <[email protected]> * Update config file and change log to show this a per user metric Signed-off-by: Tyler Reid <[email protected]> * code review fixes Signed-off-by: Tyler Reid <[email protected]> * update log message for ruler query metrics Signed-off-by: Tyler Reid <[email protected]> * Remove append and just use the array for key values in the log messag Signed-off-by: Tyler Reid <[email protected]> * Add query-frontend component to front end log message Signed-off-by: Tyler Reid <[email protected]>
I thought it would be good to put a security page into the docs, so that it shows up in a search. Content is just pointing at other resources. Signed-off-by: Bryan Boreham <[email protected]>
…xproject#4345) * Optimise memberlist kv store access by storing data unencoded. The following profile data was taken from running 50 idle ingesters with memberlist, with almost everything at default values (5s heartbeats): ``` 52.16% mergeBytesValueForKey +- 52.16% mergeValueForKey +- 47.84% computeNewValue +- 27.24% codec Proto Decode +- 26.25% mergeWithTime ``` It is apparent from the this that a lot of time is spent on the memberlist receive path, as might be expected, specifically, the merging of the update into the current state. The cost however is not in decoding the incoming states (occurs in `mergeBytesValueForKey` before `mergeValueForKey`), but in fact decoding _current state_ of the value in the store (as it is stored encoded). The ring state was measured at 123K (50 ingesters), so it makes sense that decoding could be costly. This can be avoided by storing the value in it's decoded `Mergeable` form. When doing this, care has to be taken to deep copy the value when accessed, as it is modified in place before being updated in the store, and accessed outside the store mutex. Note a side effect of this change is that is no longer straightforward to expose the `memberlist_kv_store_value_bytes` metric, as this reported the size of the encoded data, therefore it has been removed. Signed-off-by: Steve Simpson <[email protected]> * Typo. Signed-off-by: Steve Simpson <[email protected]> * Review comments. Signed-off-by: Steve Simpson <[email protected]>
…o. (cortexproject#4344) * Allow disabling of ring heartbeats by setting relevant options to zero. Signed-off-by: Steve Simpson <[email protected]> * Review comments. Signed-off-by: Steve Simpson <[email protected]>
…#4346) * Expose configuration of memberlist packet compression. Allows manually specifying whether memberlist should compress packets via a new configuration flag: `-memberlist.enable-compression`. This typically has little benefit for Cortex, as the ring state messages are already compressed with Snappy, the second layer of compression does not achieve any additional saving. It's not clear cut whether there might still be some benefit for internal memberlist messages; this needs to be evaluated in a environment of some reasonable scale. Signed-off-by: Steve Simpson <[email protected]> * Review comments. Signed-off-by: Steve Simpson <[email protected]> * Review comments. Signed-off-by: Steve Simpson <[email protected]>
…exproject#4348) It was only waiting one second for the second sync to complete, which is probably too harsh a deadline than necessary for overloaded systems. Signed-off-by: Steve Simpson <[email protected]>
…xproject#4349) The test is writing a single silence and checking a metric which indicates whether replicating the silence has been attempted yet. This is so we can check later on that no replication activity occurs. The assertions later on in the test are passing, but the first one is not, indicating that the replication doesn't trigger early enough. This makes sense because the replication is not synchronous with the writing of the silence. Signed-off-by: Steve Simpson <[email protected]>
) * Add proposal document Signed-off-by: Gofman <[email protected]> Signed-off-by: ilangofman <[email protected]> * Minor text modifications Signed-off-by: ilangofman <[email protected]> * Implement requested changes to the proposal Signed-off-by: ilangofman <[email protected]> * Fix mention of Compactor instead of purger in proposal Signed-off-by: ilangofman <[email protected]> * Fixed wording and spelling in proposal Signed-off-by: ilangofman <[email protected]> * Update the cache invalidation method Signed-off-by: ilangofman <[email protected]> * Fix wording on cache invalidation section Signed-off-by: ilangofman <[email protected]> * Minor wording additions Signed-off-by: ilangofman <[email protected]> * Remove white-noise from text Signed-off-by: ilangofman <[email protected]> * Remove the deleting state and change cache invalidation Signed-off-by: ilangofman <[email protected]> * Add deleted state and update cache invalidation Signed-off-by: ilangofman <[email protected]> * Add one word to clear things up Signed-off-by: ilangofman <[email protected]> * update api limits section Signed-off-by: ilangofman <[email protected]> * ran clean white noise Signed-off-by: ilangofman <[email protected]>
Signed-off-by: Albert <[email protected]>
Signed-off-by: Albert <[email protected]>
Signed-off-by: Albert <[email protected]>
alvinlin123
force-pushed
the
shuffle-sharding-compactor
branch
from
January 14, 2022 22:03
db0595b
to
8e78a51
Compare
roystchiang
pushed a commit
that referenced
this pull request
Apr 6, 2022
Added workflow for building prerelease images
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: Albert [email protected]
What this PR does:
Adds
cortex_compactor_remaining_planned_compactions
to track the remaining number of compactions to be completed by the cortex compactor.Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]