-
Notifications
You must be signed in to change notification settings - Fork 79
Queue full warning #121
Comments
+1 |
Have the same error during high load with the heap 6Gb. Strange thing that graphite carbon manages to handle the load, while cyanite can not. |
Same thing, then falls over with an out of memory error in under 25 minutes in a low-volume staging environment. |
I am also trying to P.O.C this and as soon as i send 1/8th of our prod traffic to the cyanite process i receive these errors straight away. Was this an issue in previous versions? |
I am in the exact same situation as @gmlwall, this error starts right away almost immediately after cyanite starts up. Similarly, I know my standard graphite-carbon stack can handle this load. Also, I have used the |
i added the following to the config to try to give the process a bit more headroom queues: Where default-poolsize: (no. of CPU cores on your server) to ensure it wasn't limited by CPU usage, it lasted a bit longer before the I/O errors appeared but not too long :( |
@altvnk I will be starting to introduce graphite-stresser to validate further commits and resolve this issue, thanks. |
@altvnk would you mind sharing the details of the -Xmx/-Xms params you used for cyanite and the stresser params you used to generate the issue ? |
Sure. |
Off topic: Also i've run into issues with |
@altvnk Hi thanks for the feedback. the latest commit helps a lot, but doesn't solve all issues. I'm working on fixing the behavior. The issue with graphite-api is known and registered, i'm currently changing the way cyanite interacts with grafana. Cheers! |
Hi @altvnk, I'm nearing the end of my work on wip/instrumented. I haven't tested the index write-path just yet, but going to cassandra I can now push with engine:
rules:
default: [ "5s:1h" ]
api:
port: 8080
input:
- type: carbon
port: 2003
index:
type: empty
queues:
defaults:
ingestq:
pool-size: 100
queue-capacity: 2000000
writeq:
pool-size: 100
queue-capacity: 2000000
store:
cluster: 'localhost'
keyspace: 'metric'
logging:
level: info
console: true Note how you can now provide per-queue defaults. The trick here is to give the inputq some room to breathe. There are now metrics exposed through JMX and /tmp/csv by default, which can help making sure the write-path and input-queues are in good shape, i'll add the ability to flush these to cassandra as well for good measure. With this I can sucessfully run at -Xms512m -Xmx512m with no trouble. When a queue-full error occurs, it doesn't kill the daemon either. This is all in the Once this is done, this will leave room to update the API part to be compatible with graphite-api and the release will be around the corner. |
Awesome! Will take a look into it again as soon as possible. |
@AnderEnder @gmlwall @jeffpierce @nherson #134 fixes this issue and should be satisfactory. Please pay attention to your index configuratin. I recommend opting for the elasticsearch index if you're in a real world scenario. I'm leaving this issue open for now. |
Not sure what i'm doing wrong, but i have empty metrics in Cassandra.
Schema is created from schema.cql |
Hi @altvnk, With the above config, the index will not be provisioned since I use the "empty" indexer. Cheers! |
Right, I changed index to memory, forgot to note this. |
Ok, i've changed in example above only index to |
Sorry for confusion, tested on another cassandra cluster, metrics are writing. Now, about original issue: performance is increased and now i'm not seeing error messages anymore. |
Improve performance and Graphite compatibility. - [X] Refactor search interface - [X] Cassandra search implementation - [X] Graphite query parser - [X] Load test procedure Refactor search interface ------------------------- The search functionality now puts a lot less responsibility in the hands of implementers. Three functions are now expected from implementations: ```clojure (defprotocol MetricIndex (push-segment! [this pos segment path length]) (by-pos [this pos]) (by-segment [this pos segment])) ``` Based on these primitives, cyanite now builds a simple inverted index of the following structure, given the input paths: `collectd.web01.cpu` and `collectd.web02.cpu`: ```json { "segments": { 0: [ "collectd" ], 1: [ "web01", "web02" ], 2: [ "cpu" ] }, "paths": { [0, "collectd"]: [["collectd.web01.cpu", 3], ["collectd.web02.cpu", 3]], [1, "web01"]: [["collectd.web01.cpu", 3]], [1, "web02"]: [["collectd.web02.cpu", 3]], [2, "cpu"]: [["collectd.web01.cpu", 3], ["collectd.web02.cpu", 3]] } } ``` Given this structure, cyanite will now split paths into segments, and perform globbing queries on segments. - `push-segment!`: Register a new path. - `by-pos`: Yield all segments at a given position - `by-segment`: Yield paths for a position and segment tuple The globbing implementation is somewhat naive and leaves room for improvement, implementations should aim to sort segments. Subsequent commits will bypass `by-pos` whenever possible and perform prefix lookups up to the first wildcard to further reduce lookup times. Two implementations of this protocol are provided: - `AgentIndex` stores segments and paths in memory, updates go through an agent. - `CassandraIndex` provides Cassandra-backed storage for paths and segments. The ElasticSearch-backed index is now gone, but a compatible implementation will be provided as a subsequent improvement. Graphite Query Parser --------------------- A tokenizer for the Graphite syntax has been living in the tree for a while. A subset of the syntax is now handled, and may be translated to an AST. A `run-query!` function is also provided which will walk tokens to extract paths, query paths, handling globs and will then be reduced to the result of the operation if successful. The following operations are already implemented: - `sumSeries` - `divideSeries` - `scale` - `absolute` Globbing is handled by https://github.com/pyr/globber and adheres to globbing rules as available in common shells. Load testing procedure ---------------------- Cyanite now integrates https://github.com/feangulo/graphite-stresser for development, the baseline against which it is tested is a workload of 200000 metrics per second, flushed at at 5 second interval with a maximum heap-size of 512m. Remaining work -------------- These changes fix & improve the following on-going issues: - #119 - The path indexing part of #121. - Most of the work needed for #136.
Hi @altvnk, Thanks for your help in testing this. I will close this ticket for now and focus on finishing the direct grafana integration. Sounds as if the release is getting really close now :-) |
Hi! I'm getting some problems with cyanite after few minutes running. I'm using (graphite-stresser)[https://github.com/feangulo/graphite-stresser] to load some data. Heap is set to 512m. Here log fragment:
Does it means that cyanite fails due to slow writes into cassandra?
The text was updated successfully, but these errors were encountered: