sql: prevent unbounded growth of `conn.writerState.buf` and `conn.msgBuilder` #80497

mgartner · 2022-04-25T17:59:34Z

A wide row returned to the client can cause the following buffers in pgwire to grow to a large size. The backing array is not GC'd until the session is terminated. This can cause memory to grow indefinitely as more and more connections grow these buffers.

conn.writeState.buf:

cockroach/pkg/sql/pgwire/conn.go

Line 86 in d7f3d51

buf bytes.Buffer
writeBuffer.wrapped (writeBuffer is the type of conn.msgBuilder):

cockroach/pkg/sql/pgwire/write_buffer.go

Line 30 in c097a16

wrapped bytes.Buffer

Some extra information from @yuzefovich:

I just wanted to add when these buffers are reset for reuse:

conn.msgBuilder which wraps bytes.Buffer is reset in initMsg and finishMsg, and each message contains at most one row

conn.writerState.buf is reset in conn.Flush which is called by conn.maybeFlush when more than 16KiB worth of data has been buffered, at least one full row. Unless results_buffer_size session variable has been changed.

Jira issue: CRDB-15591

The text was updated successfully, but these errors were encountered:

mgartner · 2022-04-25T18:01:22Z

I'm not sure exactly how this would work. What pool of memory would these buffers consume? What would happen as they consumed more and more memory? Would the query have less memory available for execution, causing memory errors? Would we prevent new connections from being created?

mgartner · 2022-04-25T18:49:10Z

A different approach to prevent these buffers from growing past results_buffer_size would be to flush the buffer in the middle of writing a row to the buffer. Are there any limitations in the pgwire protocol that would prevent this?

yuzefovich · 2022-04-25T18:56:45Z

A different approach to prevent these buffers from growing past results_buffer_size would be to flush the buffer in the middle of writing a row to the buffer. Are there any limitations in the pgwire protocol that would prevent this?

Maybe @rafiss would know.

yuzefovich · 2022-04-26T00:50:39Z

What pool of memory would these buffers consume?

I think it would consume from the root SQL memory monitor (i.e. --max-sql-memory) or from sql monitor (pgwire.Server.sqlMemoryPool, which is a child of root SQL memory monitor) - either seems appropriate.

What would happen as they consumed more and more memory?

Hm, if a memory budget error is encountered when composing a PGWire message, then the connection would probably become toast. We need to think more about this one.

Would the query have less memory available for execution, causing memory errors?

Any other consumer of the memory from our accounting system would be more likely to run into "budget exceeded" errors, queries issued by other connections including. I think this is expected and acceptable though.

Would we prevent new connections from being created?

Probably no. Imagining a scenario when a particular connection has accumulated a very large bytes.Buffer and ran into a budget exceeded error, once that connection is closed, the memory budget should open up, so other connections can proceed to be established.

If, however, we got a situation when a newly-established connection runs into the same error when it itself doesn't have a large bytes.Buffer, then we're probably in a bad spot already (for other internal queries), so the node probably should be restarted. We could also have some "minimum allowance" that doesn't get registered with the memory accounting system which would allow the new connection to proceed (e.g. so that it can be used to cancel other, memory-intensive queries), but I'm not sure whether it'd be a good idea.

michae2 · 2022-04-26T18:42:37Z

@mgartner says this isn't terribly urgent.

rafiss · 2022-04-26T18:46:26Z

A different approach to prevent these buffers from growing past results_buffer_size would be to flush the buffer in the middle of writing a row to the buffer. Are there any limitations in the pgwire protocol that would prevent this?

I think it would be allowed in the protocol. We should be able to write tests for it.

cucaroach · 2022-04-26T19:05:07Z

Doesn't each session have its own monitor, would that make more sense? I've seen session-root and session monitors,not sure what the diff is. Also even if we flush couldn't we have a large column value that defeated mid-row flushing?

yuzefovich · 2022-04-26T20:13:26Z

Doesn't each session have its own monitor, would that make more sense?

Yeah, there is one. Eventually everything would still consume from the root SQL monitor (i.e. against --max-sql-memory).

mgartner · 2022-04-26T20:22:01Z

Any other consumer of the memory from our accounting system would be more likely to run into "budget exceeded" errors, queries issued by other connections including. I think this is expected and acceptable though.

I'd argue that this wouldn't be acceptable. There would still be a "poisoning" effect where a previous query makes future queries on the connection more likely to hit memory errors. You could imagine a situation where the buffers grow so large that there are only a few bytes left in the budget to process a query and so every query results in "budget exceeded".

I think the best options, in order, would be:

Flush the buffers in the middle of a row if results_buffer_size is hit.
After a query has been processed (or maybe before), check if the buffers have exceeded results_buffer_size, and if so, allocate a new slice so that the old can be GC'd.

mgartner · 2022-04-26T20:24:02Z

To be clear, I think memory accounting of the buffers could be useful to prevent other types of OOMs. But I think we need something more than that to prevent these buffers from growing past results_buffer_size.

mgartner · 2022-04-27T23:51:14Z

I'm curious why both these buffers exist. It looks like we stage results in conn.msgBuilder (a writeBuffer), and then periodically copy the contents of conn.msgBuilder to conn.writeState.buf when writeBuffer.finishMsg() is called. The contents of conn.writeState.buf are periodically flushed to the network when conn.Flush() is called. Is there a fundamental reason we cannot simplify this to a single buffer that is periodically flushed to the network?

mgartner · 2022-05-04T19:15:01Z

@andreimatei it's been awhile, but your name is featured prominently in the git history of pgwire. Maybe you have an answer for my last question?

andreimatei · 2022-05-04T19:28:40Z

Is writeState.buf immutable, while msgBuilder is changing? You need to not send a message until it was finished, and also each message seems to be prefixed with its length, which you only know at finishMsg() time. So I'm not sure how you do these with a single buffer / if it'd be pretty. Other than that, I don't think there's anything too deep here.

mgartner · 2022-05-05T14:15:00Z

Thanks @andreimatei!

So I'm not sure how you do these with a single buffer / if it'd be pretty.

A section of the msgBuilder buffer is reserved for the length and written over in finishMsg. Seems like you could do the same with writeState.buf as long as you were sure not to flush to the network before the message was finished. But I agree it might not be pretty.

and also each message seems to be prefixed with its length

I think this rules out my suggestion above:

Flush the buffers in the middle of a row if results_buffer_size is hit.

So this is the best option I can think of at the moment:

After a query has been processed, check if the buffers have exceeded results_buffer_size, and if so, allocate a new slice so that the old can be GC'd.

mgartner · 2022-06-21T17:47:14Z

@cucaroach I've unassigned you because I believe you've had other things on your plate. If you're still making progress on this, feel free to reassign yourself, otherwise, don't worry about it for now.

There are two buffers in `pgwire.conn` that are used when sending results to the client. They have to buffer each row in its entirety before flushing, so wide rows can cause the capacity of these buffers to exceed the limit. Previously, the buffers were never reallocated even when they exceeded the size limit. This could cause a long-running session to hold on to large amounts of memory until the session was ended. This increased the risk of OOM, in addition to being inefficient. This commit adds a method `maybeReallocate` to `pgwire.conn`, which reallocates each buffer if it has reached the size limit. `maybeReallocate` is called upon closing `Sync` and `Flush` command results. These are natural places to reallocate the buffers, since they already flush the buffers to the client and are called roughly at the statement granularity, rather than for every row. This allows long-running sessions to bound their memory usage without causing reallocation on every wide row that is sent to the client. Informs cockroachdb#80497 Release note (performance improvement): Long-running sql sessions are now less likely to maintain large allocations for long periods of time, which decreases the risk of OOM and improves memory utilization.

There are two buffers in `pgwire.conn` that are used when sending results to the client. They have to buffer each row in its entirety before flushing, so wide rows can cause the capacity of these buffers to exceed the limit. Previously, the buffers were never reallocated even when they exceeded the size limit. This could cause a long-running session to hold on to large amounts of memory until the session was ended. This increased the risk of OOM, in addition to being inefficient. This commit adds a method `maybeReallocate` to `pgwire.conn`, which reallocates each buffer if it has reached the size limit. `maybeReallocate` is called upon closing `Sync` and `Flush` command results. These are natural places to reallocate the buffers, since they already flush the buffers to the client and are called roughly at the statement granularity, rather than for every row. This allows long-running sessions to bound their memory usage without causing reallocation on every wide row that is sent to the client. Informs cockroachdb#80497 Release note (performance improvement): Long-running sql sessions are now less likely to maintain large allocations for long periods of time, which decreases the risk of OOM and improves memory utilization. Release justification: bug fix to reduce likelihood of OOM

85949: sql: reset conn buffers after each query r=DrewKimball a=DrewKimball **sql: reset conn buffers after each query** There are two buffers in `pgwire.conn` that are used when sending results to the client. They have to buffer each row in its entirety before flushing, so wide rows can cause the capacity of these buffers to exceed the limit. Previously, the buffers were never reallocated even when they exceeded the size limit. This could cause a long-running session to hold on to large amounts of memory until the session was ended. This increased the risk of OOM, in addition to being inefficient. This commit adds a method `maybeReallocate` to `pgwire.conn`, which reallocates each buffer if it has reached the size limit. `maybeReallocate` is called upon closing `Sync` and `Flush` command results. These are natural places to reallocate the buffers, since they already flush the buffers to the client and are called roughly at the statement granularity, rather than for every row. This allows long-running sessions to bound their memory usage without causing reallocation on every wide row that is sent to the client. Informs #80497 Release note (performance improvement): Long-running sql sessions are now less likely to maintain large allocations for long periods of time, which decreases the risk of OOM and improves memory utilization. Release justification: bug fix to reduce likelihood of OOM **sql: remove unnecessary maybeFlush call** This commit fixes an oversight of #83870 that added a redundant call to `maybeAppend` in `limitedCommandResult.AddRow`. Release note: None Release justification: low-risk fix for oversight in previous bug fix Co-authored-by: DrewKimball <[email protected]>

DrewKimball · 2022-08-24T10:31:37Z

I'm leaving this open and moving to the backlog, since it sounds like with some more significant changes we could ensure the buffers never exceed the limit (e.g. we could flush in the middle of a row if we know the length before buffering).

There are two buffers in `pgwire.conn` that are used when sending results to the client. They have to buffer each row in its entirety before flushing, so wide rows can cause the capacity of these buffers to exceed the limit. Previously, the buffers were never reallocated even when they exceeded the size limit. This could cause a long-running session to hold on to large amounts of memory until the session was ended. This increased the risk of OOM, in addition to being inefficient. This commit adds a method `maybeReallocate` to `pgwire.conn`, which reallocates each buffer if it has reached the size limit. `maybeReallocate` is called upon closing `Sync` and `Flush` command results. These are natural places to reallocate the buffers, since they already flush the buffers to the client and are called roughly at the statement granularity, rather than for every row. This allows long-running sessions to bound their memory usage without causing reallocation on every wide row that is sent to the client. Informs #80497 Release note (performance improvement): Long-running sql sessions are now less likely to maintain large allocations for long periods of time, which decreases the risk of OOM and improves memory utilization. Release justification: bug fix to reduce likelihood of OOM

mgartner added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team labels Apr 25, 2022

michae2 assigned cucaroach Apr 26, 2022

mgartner changed the title ~~sql: add memory accounting for conn.writeState.buf and writeBuffer.wrapped~~ sql: prevent unbound growth of conn.writeState.buf and writeBuffer.wrapped Apr 27, 2022

mgartner changed the title ~~sql: prevent unbound growth of conn.writeState.buf and writeBuffer.wrapped~~ sql: prevent unbounded growth of conn.writeState.buf and writeBuffer.wrapped Apr 27, 2022

mgartner changed the title ~~sql: prevent unbounded growth of conn.writeState.buf and writeBuffer.wrapped~~ sql: prevent unbounded growth of conn.writerState.buf and conn.writeBuffer.wrapped Apr 27, 2022

mgartner changed the title ~~sql: prevent unbounded growth of conn.writerState.buf and conn.writeBuffer.wrapped~~ sql: prevent unbounded growth of conn.writerState.buf and conn.msgBuilder Apr 27, 2022

jlinder added sync-me and removed sync-me labels May 20, 2022

mgartner unassigned cucaroach Jun 21, 2022

rytaft assigned DrewKimball Jul 21, 2022

DrewKimball mentioned this issue Aug 11, 2022

sql: reset conn buffers after each query #85949

Merged

blathers-crl bot mentioned this issue Aug 24, 2022

release-22.1: sql: reset conn buffers after each query #86797

Merged

blathers-crl bot mentioned this issue Aug 24, 2022

release-21.2: sql: reset conn buffers after each query #86798

Merged

michae2 mentioned this issue Feb 19, 2023

sql: free scanner and parser allocations after large statement #97334

Open

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Backlog (DO NOT ADD NEW ISSUES) in SQL Queries Jul 24, 2023

DrewKimball removed their assignment Mar 15, 2024

DrewKimball moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: prevent unbounded growth of `conn.writerState.buf` and `conn.msgBuilder` #80497

sql: prevent unbounded growth of `conn.writerState.buf` and `conn.msgBuilder` #80497

mgartner commented Apr 25, 2022 •

edited by cockroach-jira-scripts

Loading

mgartner commented Apr 25, 2022

mgartner commented Apr 25, 2022

yuzefovich commented Apr 25, 2022

yuzefovich commented Apr 26, 2022

michae2 commented Apr 26, 2022

rafiss commented Apr 26, 2022 •

edited

Loading

cucaroach commented Apr 26, 2022

yuzefovich commented Apr 26, 2022

mgartner commented Apr 26, 2022

mgartner commented Apr 26, 2022 •

edited

Loading

mgartner commented Apr 27, 2022

mgartner commented May 4, 2022

andreimatei commented May 4, 2022

mgartner commented May 5, 2022

mgartner commented Jun 21, 2022

DrewKimball commented Aug 24, 2022

sql: prevent unbounded growth of conn.writerState.buf and conn.msgBuilder #80497

sql: prevent unbounded growth of conn.writerState.buf and conn.msgBuilder #80497

Comments

mgartner commented Apr 25, 2022 • edited by cockroach-jira-scripts Loading

mgartner commented Apr 25, 2022

mgartner commented Apr 25, 2022

yuzefovich commented Apr 25, 2022

yuzefovich commented Apr 26, 2022

michae2 commented Apr 26, 2022

rafiss commented Apr 26, 2022 • edited Loading

cucaroach commented Apr 26, 2022

yuzefovich commented Apr 26, 2022

mgartner commented Apr 26, 2022

mgartner commented Apr 26, 2022 • edited Loading

mgartner commented Apr 27, 2022

mgartner commented May 4, 2022

andreimatei commented May 4, 2022

mgartner commented May 5, 2022

mgartner commented Jun 21, 2022

DrewKimball commented Aug 24, 2022

sql: prevent unbounded growth of `conn.writerState.buf` and `conn.msgBuilder` #80497

sql: prevent unbounded growth of `conn.writerState.buf` and `conn.msgBuilder` #80497

mgartner commented Apr 25, 2022 •

edited by cockroach-jira-scripts

Loading

rafiss commented Apr 26, 2022 •

edited

Loading

mgartner commented Apr 26, 2022 •

edited

Loading