ui: graphs don't load on 30 node cluster #24018

jordanlewis · 2018-03-19T15:41:51Z

On a 30 node cluster, graphs hang forever in the Admin UI:

This happens even if I change from "All Nodes" to a single node.

The cause is that the query endpoint returns a 500, with the message:

+insufficient memory budget to attempt query��

The input to that endpoint is:

curl 'http://<redacted>:26258/ts/query' -H 'Origin: http://<redacted>:26258' -H 'Accept-Encoding: gzip, deflate' -H 'Accept-Language: en-US,en;q=0.9' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36' -H 'Content-Type: application/x-protobuf' -H 'Accept: application/x-protobuf' -H 'Referer: http://<redacted>:26258/' -H 'Connection: keep-alive' -H 'Grpc-Timeout: 30000m' --data-binary $'\x08\x80\xf8\xd6\x9b\xde\x87\xd7\x8e\x15\x10\x80\xd0\xe5\xfc\xcd\x88\xd7\x8e\x15\x1a\x1f\n\x17cr.node.sql.query.count\x10\x01\x18\x02 \x02\x1a\'\n\x1fcr.node.sql.service.latency-p50\x10\x03\x18\x03 \x00\x1a\'\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x03 \x00 \x80\xc8\xaf\xa0%' --compressed

or

 --data-binary $'\x08\x80\xa4\xea\x8d\x8b\xf7\xd6\x8e\x15\x10\x80\x84\x90\xa4\xc6\x88\xd7\x8e\x15\x1a \n\x18cr.node.sql.select.count\x10\x01\x18\x02 \x02\x1a(\n cr.node.sql.distsql.select.count\x10\x01\x18\x02 \x02\x1a \n\x18cr.node.sql.update.count\x10\x01\x18\x02 \x02\x1a \n\x18cr.node.sql.insert.count\x10\x01\x18\x02 \x02\x1a \n\x18cr.node.sql.delete.count\x10\x01\x18\x02 \x02\x1a*\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x011\x1a*\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x012\x1a*\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x013\x1a*\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x014\x1a*\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x015\x1a*\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x016\x1a*\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x017\x1a*\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x018\x1a*\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x019\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0210\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0211\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0212\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0213\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0214\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0215\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0216\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0217\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0218\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0219\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0220\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0221\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0222\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0223\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0224\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0225\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0226\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0227\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0228\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0229\x1a+\n\x1fcr.node.sql.service.latency-p99\x10\x03\x18\x02 \x00*\x0230\x1a\x1c\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x011\x1a\x1c\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x012\x1a\x1c\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x013\x1a\x1c\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x014\x1a\x1c\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x015\x1a\x1c\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x016\x1a\x1c\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x017\x1a\x1c\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x019\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0212\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0211\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0210\x1a\x1c\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x018\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0213\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0214\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0215\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0216\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0217\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0218\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0219\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0220\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0221\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0222\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0223\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0224\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0225\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0226\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0227\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0228\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0229\x1a\x1d\n\x11cr.store.replicas\x10\x01\x18\x02 \x00*\x0230\x1a\x19\n\x11cr.store.capacity\x10\x01\x18\x02 \x00\x1a#\n\x1bcr.store.capacity.available\x10\x01\x18\x02 \x00\x1a\x1e\n\x16cr.store.capacity.used\x10\x01\x18\x02 \x00 \x80\xc8\xaf\xa0%' --compressed

Is there any solution to this besides upping the memory on the boxes? I would hope there's a way to issue whatever queries are necessary on the backend in a less memory-intensive way.

The text was updated successfully, but these errors were encountered:

jordanlewis · 2018-03-19T15:45:30Z

In addition to this, the logs are full of:

W180319 15:45:07.207127 526906784 server/server.go:1521  [n1] error closing gzip response writer: http: request method or response status code does not allow body
W180319 15:45:07.243826 526951087 server/server.go:1521  [n1] error closing gzip response writer: http: request method or response status code does not allow body

mrtracy · 2018-03-19T17:58:48Z

The memory limit on time series queries is set very low at a hard coded level; 64 MiB for the entire system, and 1MiB per query. It looks like 1MiB is not enough to query even the minimum number of keys (3) into memory at one time on a 30 node cluster - we need 43.3K per node, and at 30 nodes we only have 33.3k in the budget.

@petermattis Need to make a call here, options are:

Raise the overall limit (currently 64MiB)
Lower the number of simultaneous workers allowed (currently 1MiB).
Not actually fail when we hit the minimum budget; just go ahead and query with the minimum necessary time span. This will allow memory usage to grow unbounded, but it would be at a modest rate directly related to the number of nodes; a 500 node cluster would use up to 1.5 GB if 64 queries were being served at the same time.

Independently of this, I will also fix it so that single-node queries work correctly.

bdarnell · 2018-03-19T19:33:14Z

Let's reduce the number of simultaneous workers - I'd go with 8 workers for an 8MiB limit each.

Adjusts time series QueryWorkerMax, the maximum number of simultaneous workers, down to 8 from 64. Also adjusts the estimatedSourceCount for individual queries to match the source count for queries which provide a specific list of sources. Resolves cockroachdb#24018 Release note (Admin UI): Fixed a bug where graphs would not display on clusters with large numbers of nodes.

jordanlewis · 2018-03-19T23:17:39Z

Please backport this! If not for 2.0 then 2.0.1.

Adjusts time series QueryWorkerMax, the maximum number of simultaneous workers, down to 8 from 64. Also adjusts the estimatedSourceCount for individual queries to match the source count for queries which provide a specific list of sources. Resolves cockroachdb#24018 Release note (Admin UI): Fixed a bug where graphs would not display on clusters with large numbers of nodes.

Previously, the memory limit per-`tsdb` worker was set at a static 64MiB. This cap created issues seen in cockroachdb#24018 where this limit was hit on a 30 node cluster. To alleviate the issue, the number of workers was reduced. We've currently hit this limit again as part of load testing with larger clusters and have decided to make the per-query worker memory limit dynamic. The per-worker limit is now raised based on the amount of memory available to the SQL Pool via the `MemoryPoolSize` configuration variable. This is set to be 25% of the system memory by default. The `tsdb` memory cap per-worker is now doubled until it reaches `1/128` of the memory pool setting. For example, on a node with 128 - 256 GiB of memory, this will correspond to 512 MiB allocated per worker. TODO(davidh): Can the tests be faster? They iterate on a server create TODO(davidh): Is 1/128 a good setting? How do we validate this. TODO(davidh): Should this behavior be gated behind a feature flag? It's possible on some memory-constrained deployments a sudden spike in memory usage by tsdb could cause problems. Resolves cockroachdb#72986 Release note (ops change): customers running clusters with 240 nodes or more can effectively access tsdb metrics.

Previously, the memory limit for all `tsdb` workers was set at a static 64MiB. This cap created issues seen in cockroachdb#24018 where this limit was hit on a 30 node cluster. To alleviate the issue, the number of workers was reduced, raising the per-worker allocation. We've currently hit this limit again as part of load testing with larger clusters and have decided to make the per-query worker memory limit dynamic. The per-worker limit is now raised based on the amount of memory available to the SQL Pool via the `MemoryPoolSize` configuration variable. This is set to be 25% of the system memory by default. The `tsdb` memory cap per-worker is now doubled until it reaches `1/128` of the memory pool setting. For example, on a node with 128 - 256 GiB of memory, this will correspond to 512 MiB allocated for all running `tsdb` queries. In addition, the ts server is now connected to the same `BytesMonitor` instance as the SQL memory monitor and workers will becapped at double the query limit. Results are monitored as before but a cap is not introduced there since we didn't have one present previously. This behavior is gated behind a private cluster setting that's enabled by default. TODO(davidh): Can the tests be faster? They iterate on a server create TODO(davidh): Is 1/128 a good setting? How do we validate this. Resolves cockroachdb#72986 Release note (ops change): customers running clusters with 240 nodes or more can effectively access tsdb metrics.

Previously, the memory limit for all `tsdb` workers was set at a static 64MiB. This cap created issues seen in cockroachdb#24018 where this limit was hit on a 30 node cluster. To alleviate the issue, the number of workers was reduced, raising the per-worker allocation. We've currently hit this limit again as part of load testing with larger clusters and have decided to make the per-query worker memory limit dynamic. The per-worker limit is now raised based on the amount of memory available to the SQL Pool via the `MemoryPoolSize` configuration variable. This is set to be 25% of the system memory by default. The `tsdb` memory cap per-worker is now doubled until it reaches `1/128` of the memory pool setting. For example, on a node with 128 - 256 GiB of memory, this will correspond to 512 MiB allocated for all running `tsdb` queries. In addition, the ts server is now connected to the same `BytesMonitor` instance as the SQL memory monitor and workers will becapped at double the query limit. Results are monitored as before but a cap is not introduced there since we didn't have one present previously. This behavior is gated behind a private cluster setting that's enabled by default and sets the ratio at 1/128 of the SQL memory pool. Resolves cockroachdb#72986 Release note (ops change): customers running clusters with 240 nodes or more can effectively access tsdb metrics.

Previously, the memory limit for all `tsdb` workers was set at a static 64MiB. This cap created issues seen in cockroachdb#24018 where this limit was hit on a 30 node cluster. To alleviate the issue, the number of workers was reduced, raising the per-worker allocation. We've currently hit this limit again as part of load testing with larger clusters and have decided to make the per-query worker memory limit dynamic. This PR introduces a new CLI flag `--max-tsdb-memory` to mirror the functionality of the `--max-sql-memory` flag by accepting bytes or a percentage of system memory. The default is set to be `1%` of system memory or 64 MiB, whichever is greater. This ensures that performance after this PR is equal or better for timeseries queries without eating too far into memory budgets for SQL. In addition, the ts server is now connected to the same `BytesMonitor` instance as the SQL memory monitor and workers will becapped at double the query limit. Results are monitored as before but a cap is not introduced there since we didn't have one present previously. Resolves cockroachdb#72986 Release note (cli change, ops change): A new CLI flag `--max-tsdb-memory` is now available, that can set the memory budget for timeseries queries when processing requests from the Metrics page in DB Console. Most customers should not need to tweak this setting as the default of 1% of system memory or 64 MiB, whichever is greater, is adequate for most deployments. In the case where a deployment of hundreds of nodes has low per-node memory available (below 8 GiB for instance) it may be necessary to increase this value to `2%` or higher in order to render timeseries graphs for the cluster using the DB Console. Otherwise, the default settings will be adequate for the vast majority of deployments.

74563: kv,kvcoord,sql: poison txnCoordSender after a retryable error r=lidorcarmel a=lidorcarmel Previously kv users could lose parts of a transaction without getting an error. After Send() returned a retryable error the state of txn got reset which made it usable again. If the caller ignored the error they could continue applying more operations without realizing the first part of the transaction was discarded. See more details in the issue (#22615). The simple case example is where the retryable closure of DB.Txn() returns nil instead of returning the retryable error back to the retry loop - in this case the retry loop declares success without realizing we lost the first part of the transaction (all the operations before the retryable error). This PR leaves the txn in a "poisoned" state after encountering an error, so that all future operations fail fast. The caller is therefore expected to reset the txn handle back to a usable state intentionally, by calling Txn.PrepareForRetry(). In the simple case of DB.Txn() the retry loop will reset the handle and run the retry even if the callback returned nil. Closes #22615 Release note: None 74662: tsdb: expand mem per worker based on sql pool size r=dhartunian a=dhartunian Previously, the memory limit for all `tsdb` workers was set at a static 64MiB. This cap created issues seen in #24018 where this limit was hit on a 30 node cluster. To alleviate the issue, the number of workers was reduced, raising the per-worker allocation. We've currently hit this limit again as part of load testing with larger clusters and have decided to make the per-query worker memory limit dynamic. The per-worker limit is now raised based on the amount of memory available to the SQL Pool via the `MemoryPoolSize` configuration variable. This is set to be 25% of the system memory by default. The `tsdb` memory cap per-worker is now doubled until it reaches `1/128` of the memory pool setting. For example, on a node with 128 - 256 GiB of memory, this will correspond to 512 MiB allocated for all running `tsdb` queries. In addition, the ts server is now connected to the same `BytesMonitor` instance as the SQL memory monitor and workers will becapped at double the query limit. Results are monitored as before but a cap is not introduced there since we didn't have one present previously. This behavior is gated behind a private cluster setting that's enabled by default and sets the ratio at 1/128 of the SQL memory pool. Resolves #72986 Release note (ops change): customers running clusters with 240 nodes or more can effectively access tsdb metrics. 75677: randgen: add PopulateRandTable r=mgartner a=msbutler PopulateRandTable populates the caller's table with random data. This helper function aims to make it easier for engineers to develop randomized tests that leverage randgen / sqlsmith. Informs #72345 Release note: None 76334: opt: fix missing filters after join reordering r=mgartner a=mgartner #### opt: add TES, SES, and rules to reorderjoins This commit updates the output of the `reorderjoins` opt test command to display the initial state of the `JoinOrderBuilder`. It adds additional information to the output including the TES, SES, and conflict rules for each edge. Release note: None #### opt: fix missing filters after join reordering This commit eliminates logic in the `assoc`, `leftAsscom`, and `rightAsscom` functions in the join order builder that aimed to prevent generating "orphaned" predicates, where one or more referenced relations are not in a join's input. In rare cases, this logic had the side effect of creating invalid conflict rules for edges, which could prevent valid predicates from being added to reordered join trees. It is safe to remove these conditionals because they are unnecessary. The CD-C algorithm already prevents generation of orphaned predicates by checking that the total eligibility set (TES) is a subset of a join's input vertices. In our implementation, this is handled by the `checkNonInnerJoin` and `checkInnerJoin` functions. Fixes #76522 Release note (bug fix): A bug has been fixed which caused the query optimizer to omit join filters in rare cases when reordering joins, which could result in incorrect query results. This bug was present since v20.2. Co-authored-by: Lidor Carmel <[email protected]> Co-authored-by: David Hartunian <[email protected]> Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Marcus Gartner <[email protected]>

Previously, the memory limit for all `tsdb` workers was set at a static 64MiB. This cap created issues seen in cockroachdb#24018 where this limit was hit on a 30 node cluster. To alleviate the issue, the number of workers was reduced, raising the per-worker allocation. We've currently hit this limit again as part of load testing with larger clusters and have decided to make the per-query worker memory limit dynamic. This PR introduces a new CLI flag `--max-tsdb-memory` to mirror the functionality of the `--max-sql-memory` flag by accepting bytes or a percentage of system memory. The default is set to be `1%` of system memory or 64 MiB, whichever is greater. This ensures that performance after this PR is equal or better for timeseries queries without eating too far into memory budgets for SQL. In addition, the ts server is now connected to the same `BytesMonitor` instance as the SQL memory monitor and workers will becapped at double the query limit. Results are monitored as before but a cap is not introduced there since we didn't have one present previously. Resolves cockroachdb#72986 Release note (cli change, ops change): A new CLI flag `--max-tsdb-memory` is now available, that can set the memory budget for timeseries queries when processing requests from the Metrics page in DB Console. Most customers should not need to tweak this setting as the default of 1% of system memory or 64 MiB, whichever is greater, is adequate for most deployments. In the case where a deployment of hundreds of nodes has low per-node memory available (below 8 GiB for instance) it may be necessary to increase this value to `2%` or higher in order to render timeseries graphs for the cluster using the DB Console. Otherwise, the default settings will be adequate for the vast majority of deployments.

jordanlewis assigned mrtracy Mar 19, 2018

mrtracy mentioned this issue Mar 19, 2018

ts: Adjust QueryWorkerMax for TS Server #24039

Merged

mrtracy closed this as completed in #24039 Mar 19, 2018

mrtracy mentioned this issue Mar 19, 2018

backport-2.0: ts: Adjust QueryWorkerMax for TS Server #24045

Merged

nvanbenschoten mentioned this issue Nov 19, 2021

ui: graphs don't load on 240 node cluster #72986

Closed

dhartunian mentioned this issue Jan 11, 2022

tsdb: expand mem per worker based on sql pool size #74662

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ui: graphs don't load on 30 node cluster #24018

ui: graphs don't load on 30 node cluster #24018

jordanlewis commented Mar 19, 2018

jordanlewis commented Mar 19, 2018

mrtracy commented Mar 19, 2018

bdarnell commented Mar 19, 2018

jordanlewis commented Mar 19, 2018

ui: graphs don't load on 30 node cluster #24018

ui: graphs don't load on 30 node cluster #24018

Comments

jordanlewis commented Mar 19, 2018

jordanlewis commented Mar 19, 2018

mrtracy commented Mar 19, 2018

bdarnell commented Mar 19, 2018

jordanlewis commented Mar 19, 2018