Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: prevent unbounded growth of conn.writerState.buf and conn.msgBuilder #80497

Open
mgartner opened this issue Apr 25, 2022 · 16 comments
Open
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Apr 25, 2022

A wide row returned to the client can cause the following buffers in pgwire to grow to a large size. The backing array is not GC'd until the session is terminated. This can cause memory to grow indefinitely as more and more connections grow these buffers.

Some extra information from @yuzefovich:

I just wanted to add when these buffers are reset for reuse:

  • conn.msgBuilder which wraps bytes.Buffer is reset in initMsg and finishMsg, and each message contains at most one row
  • conn.writerState.buf is reset in conn.Flush which is called by conn.maybeFlush when more than 16KiB worth of data has been buffered, at least one full row. Unless results_buffer_size session variable has been changed.

Jira issue: CRDB-15591

@mgartner mgartner added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team labels Apr 25, 2022
@mgartner
Copy link
Collaborator Author

I'm not sure exactly how this would work. What pool of memory would these buffers consume? What would happen as they consumed more and more memory? Would the query have less memory available for execution, causing memory errors? Would we prevent new connections from being created?

@mgartner
Copy link
Collaborator Author

A different approach to prevent these buffers from growing past results_buffer_size would be to flush the buffer in the middle of writing a row to the buffer. Are there any limitations in the pgwire protocol that would prevent this?

@yuzefovich
Copy link
Member

A different approach to prevent these buffers from growing past results_buffer_size would be to flush the buffer in the middle of writing a row to the buffer. Are there any limitations in the pgwire protocol that would prevent this?

Maybe @rafiss would know.

@yuzefovich
Copy link
Member

What pool of memory would these buffers consume?

I think it would consume from the root SQL memory monitor (i.e. --max-sql-memory) or from sql monitor (pgwire.Server.sqlMemoryPool, which is a child of root SQL memory monitor) - either seems appropriate.

What would happen as they consumed more and more memory?

Hm, if a memory budget error is encountered when composing a PGWire message, then the connection would probably become toast. We need to think more about this one.

Would the query have less memory available for execution, causing memory errors?

Any other consumer of the memory from our accounting system would be more likely to run into "budget exceeded" errors, queries issued by other connections including. I think this is expected and acceptable though.

Would we prevent new connections from being created?

Probably no. Imagining a scenario when a particular connection has accumulated a very large bytes.Buffer and ran into a budget exceeded error, once that connection is closed, the memory budget should open up, so other connections can proceed to be established.

If, however, we got a situation when a newly-established connection runs into the same error when it itself doesn't have a large bytes.Buffer, then we're probably in a bad spot already (for other internal queries), so the node probably should be restarted. We could also have some "minimum allowance" that doesn't get registered with the memory accounting system which would allow the new connection to proceed (e.g. so that it can be used to cancel other, memory-intensive queries), but I'm not sure whether it'd be a good idea.

@michae2
Copy link
Collaborator

michae2 commented Apr 26, 2022

@mgartner says this isn't terribly urgent.

@rafiss
Copy link
Collaborator

rafiss commented Apr 26, 2022

A different approach to prevent these buffers from growing past results_buffer_size would be to flush the buffer in the middle of writing a row to the buffer. Are there any limitations in the pgwire protocol that would prevent this?

I think it would be allowed in the protocol. We should be able to write tests for it.

@cucaroach
Copy link
Contributor

Doesn't each session have its own monitor, would that make more sense? I've seen session-root and session monitors,not sure what the diff is. Also even if we flush couldn't we have a large column value that defeated mid-row flushing?

@yuzefovich
Copy link
Member

Doesn't each session have its own monitor, would that make more sense?

Yeah, there is one. Eventually everything would still consume from the root SQL monitor (i.e. against --max-sql-memory).

@mgartner
Copy link
Collaborator Author

Any other consumer of the memory from our accounting system would be more likely to run into "budget exceeded" errors, queries issued by other connections including. I think this is expected and acceptable though.

I'd argue that this wouldn't be acceptable. There would still be a "poisoning" effect where a previous query makes future queries on the connection more likely to hit memory errors. You could imagine a situation where the buffers grow so large that there are only a few bytes left in the budget to process a query and so every query results in "budget exceeded".

I think the best options, in order, would be:

  1. Flush the buffers in the middle of a row if results_buffer_size is hit.
  2. After a query has been processed (or maybe before), check if the buffers have exceeded results_buffer_size, and if so, allocate a new slice so that the old can be GC'd.

@mgartner
Copy link
Collaborator Author

mgartner commented Apr 26, 2022

To be clear, I think memory accounting of the buffers could be useful to prevent other types of OOMs. But I think we need something more than that to prevent these buffers from growing past results_buffer_size.

@mgartner mgartner changed the title sql: add memory accounting for conn.writeState.buf and writeBuffer.wrapped sql: prevent unbound growth of conn.writeState.buf and writeBuffer.wrapped Apr 27, 2022
@mgartner mgartner changed the title sql: prevent unbound growth of conn.writeState.buf and writeBuffer.wrapped sql: prevent unbounded growth of conn.writeState.buf and writeBuffer.wrapped Apr 27, 2022
@mgartner
Copy link
Collaborator Author

I'm curious why both these buffers exist. It looks like we stage results in conn.msgBuilder (a writeBuffer), and then periodically copy the contents of conn.msgBuilder to conn.writeState.buf when writeBuffer.finishMsg() is called. The contents of conn.writeState.buf are periodically flushed to the network when conn.Flush() is called. Is there a fundamental reason we cannot simplify this to a single buffer that is periodically flushed to the network?

@mgartner mgartner changed the title sql: prevent unbounded growth of conn.writeState.buf and writeBuffer.wrapped sql: prevent unbounded growth of conn.writerState.buf and conn.writeBuffer.wrapped Apr 27, 2022
@mgartner mgartner changed the title sql: prevent unbounded growth of conn.writerState.buf and conn.writeBuffer.wrapped sql: prevent unbounded growth of conn.writerState.buf and conn.msgBuilder Apr 27, 2022
@mgartner
Copy link
Collaborator Author

mgartner commented May 4, 2022

@andreimatei it's been awhile, but your name is featured prominently in the git history of pgwire. Maybe you have an answer for my last question?

@andreimatei
Copy link
Contributor

Is writeState.buf immutable, while msgBuilder is changing? You need to not send a message until it was finished, and also each message seems to be prefixed with its length, which you only know at finishMsg() time. So I'm not sure how you do these with a single buffer / if it'd be pretty. Other than that, I don't think there's anything too deep here.

@mgartner
Copy link
Collaborator Author

mgartner commented May 5, 2022

Thanks @andreimatei!

So I'm not sure how you do these with a single buffer / if it'd be pretty.

A section of the msgBuilder buffer is reserved for the length and written over in finishMsg. Seems like you could do the same with writeState.buf as long as you were sure not to flush to the network before the message was finished. But I agree it might not be pretty.

and also each message seems to be prefixed with its length

I think this rules out my suggestion above:

  1. Flush the buffers in the middle of a row if results_buffer_size is hit.

So this is the best option I can think of at the moment:

  1. After a query has been processed, check if the buffers have exceeded results_buffer_size, and if so, allocate a new slice so that the old can be GC'd.

@jlinder jlinder added sync-me and removed sync-me labels May 20, 2022
@mgartner
Copy link
Collaborator Author

@cucaroach I've unassigned you because I believe you've had other things on your plate. If you're still making progress on this, feel free to reassign yourself, otherwise, don't worry about it for now.

DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Aug 11, 2022
There are two buffers in `pgwire.conn` that are used when sending results
to the client. They have to buffer each row in its entirety before flushing,
so wide rows can cause the capacity of these buffers to exceed the limit.

Previously, the buffers were never reallocated even when they exceeded the
size limit. This could cause a long-running session to hold on to large
amounts of memory until the session was ended. This increased the risk of
OOM, in addition to being inefficient.

This commit adds a method `maybeReallocate` to `pgwire.conn`, which
reallocates each buffer if it has reached the size limit. `maybeReallocate`
is called upon closing `Sync` and `Flush` command results. These are
natural places to reallocate the buffers, since they already flush the
buffers to the client and are called roughly at the statement granularity,
rather than for every row. This allows long-running sessions to bound their
memory usage without causing reallocation on every wide row that is sent
to the client.

Informs cockroachdb#80497

Release note (performance improvement): Long-running sql sessions are
now less likely to maintain large allocations for long periods of time,
which decreases the risk of OOM and improves memory utilization.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Aug 11, 2022
There are two buffers in `pgwire.conn` that are used when sending results
to the client. They have to buffer each row in its entirety before flushing,
so wide rows can cause the capacity of these buffers to exceed the limit.

Previously, the buffers were never reallocated even when they exceeded the
size limit. This could cause a long-running session to hold on to large
amounts of memory until the session was ended. This increased the risk of
OOM, in addition to being inefficient.

This commit adds a method `maybeReallocate` to `pgwire.conn`, which
reallocates each buffer if it has reached the size limit. `maybeReallocate`
is called upon closing `Sync` and `Flush` command results. These are
natural places to reallocate the buffers, since they already flush the
buffers to the client and are called roughly at the statement granularity,
rather than for every row. This allows long-running sessions to bound their
memory usage without causing reallocation on every wide row that is sent
to the client.

Informs cockroachdb#80497

Release note (performance improvement): Long-running sql sessions are
now less likely to maintain large allocations for long periods of time,
which decreases the risk of OOM and improves memory utilization.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Aug 15, 2022
There are two buffers in `pgwire.conn` that are used when sending results
to the client. They have to buffer each row in its entirety before flushing,
so wide rows can cause the capacity of these buffers to exceed the limit.

Previously, the buffers were never reallocated even when they exceeded the
size limit. This could cause a long-running session to hold on to large
amounts of memory until the session was ended. This increased the risk of
OOM, in addition to being inefficient.

This commit adds a method `maybeReallocate` to `pgwire.conn`, which
reallocates each buffer if it has reached the size limit. `maybeReallocate`
is called upon closing `Sync` and `Flush` command results. These are
natural places to reallocate the buffers, since they already flush the
buffers to the client and are called roughly at the statement granularity,
rather than for every row. This allows long-running sessions to bound their
memory usage without causing reallocation on every wide row that is sent
to the client.

Informs cockroachdb#80497

Release note (performance improvement): Long-running sql sessions are
now less likely to maintain large allocations for long periods of time,
which decreases the risk of OOM and improves memory utilization.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Aug 16, 2022
There are two buffers in `pgwire.conn` that are used when sending results
to the client. They have to buffer each row in its entirety before flushing,
so wide rows can cause the capacity of these buffers to exceed the limit.

Previously, the buffers were never reallocated even when they exceeded the
size limit. This could cause a long-running session to hold on to large
amounts of memory until the session was ended. This increased the risk of
OOM, in addition to being inefficient.

This commit adds a method `maybeReallocate` to `pgwire.conn`, which
reallocates each buffer if it has reached the size limit. `maybeReallocate`
is called upon closing `Sync` and `Flush` command results. These are
natural places to reallocate the buffers, since they already flush the
buffers to the client and are called roughly at the statement granularity,
rather than for every row. This allows long-running sessions to bound their
memory usage without causing reallocation on every wide row that is sent
to the client.

Informs cockroachdb#80497

Release note (performance improvement): Long-running sql sessions are
now less likely to maintain large allocations for long periods of time,
which decreases the risk of OOM and improves memory utilization.

Release justification: bug fix to reduce likelihood of OOM
craig bot pushed a commit that referenced this issue Aug 18, 2022
85949: sql: reset conn buffers after each query r=DrewKimball a=DrewKimball

**sql: reset conn buffers after each query**

There are two buffers in `pgwire.conn` that are used when sending results
to the client. They have to buffer each row in its entirety before flushing,
so wide rows can cause the capacity of these buffers to exceed the limit.

Previously, the buffers were never reallocated even when they exceeded the
size limit. This could cause a long-running session to hold on to large
amounts of memory until the session was ended. This increased the risk of
OOM, in addition to being inefficient.

This commit adds a method `maybeReallocate` to `pgwire.conn`, which
reallocates each buffer if it has reached the size limit. `maybeReallocate`
is called upon closing `Sync` and `Flush` command results. These are
natural places to reallocate the buffers, since they already flush the
buffers to the client and are called roughly at the statement granularity,
rather than for every row. This allows long-running sessions to bound their
memory usage without causing reallocation on every wide row that is sent
to the client.

Informs #80497

Release note (performance improvement): Long-running sql sessions are
now less likely to maintain large allocations for long periods of time,
which decreases the risk of OOM and improves memory utilization.

Release justification: bug fix to reduce likelihood of OOM

**sql: remove unnecessary maybeFlush call**

This commit fixes an oversight of #83870 that added a redundant call to
`maybeAppend` in `limitedCommandResult.AddRow`.

Release note: None

Release justification: low-risk fix for oversight in previous bug fix

Co-authored-by: DrewKimball <[email protected]>
@DrewKimball
Copy link
Collaborator

I'm leaving this open and moving to the backlog, since it sounds like with some more significant changes we could ensure the buffers never exceed the limit (e.g. we could flush in the middle of a row if we know the length before buffering).

blathers-crl bot pushed a commit that referenced this issue Aug 24, 2022
There are two buffers in `pgwire.conn` that are used when sending results
to the client. They have to buffer each row in its entirety before flushing,
so wide rows can cause the capacity of these buffers to exceed the limit.

Previously, the buffers were never reallocated even when they exceeded the
size limit. This could cause a long-running session to hold on to large
amounts of memory until the session was ended. This increased the risk of
OOM, in addition to being inefficient.

This commit adds a method `maybeReallocate` to `pgwire.conn`, which
reallocates each buffer if it has reached the size limit. `maybeReallocate`
is called upon closing `Sync` and `Flush` command results. These are
natural places to reallocate the buffers, since they already flush the
buffers to the client and are called roughly at the statement granularity,
rather than for every row. This allows long-running sessions to bound their
memory usage without causing reallocation on every wide row that is sent
to the client.

Informs #80497

Release note (performance improvement): Long-running sql sessions are
now less likely to maintain large allocations for long periods of time,
which decreases the risk of OOM and improves memory utilization.

Release justification: bug fix to reduce likelihood of OOM
blathers-crl bot pushed a commit that referenced this issue Aug 24, 2022
There are two buffers in `pgwire.conn` that are used when sending results
to the client. They have to buffer each row in its entirety before flushing,
so wide rows can cause the capacity of these buffers to exceed the limit.

Previously, the buffers were never reallocated even when they exceeded the
size limit. This could cause a long-running session to hold on to large
amounts of memory until the session was ended. This increased the risk of
OOM, in addition to being inefficient.

This commit adds a method `maybeReallocate` to `pgwire.conn`, which
reallocates each buffer if it has reached the size limit. `maybeReallocate`
is called upon closing `Sync` and `Flush` command results. These are
natural places to reallocate the buffers, since they already flush the
buffers to the client and are called roughly at the statement granularity,
rather than for every row. This allows long-running sessions to bound their
memory usage without causing reallocation on every wide row that is sent
to the client.

Informs #80497

Release note (performance improvement): Long-running sql sessions are
now less likely to maintain large allocations for long periods of time,
which decreases the risk of OOM and improves memory utilization.

Release justification: bug fix to reduce likelihood of OOM
@mgartner mgartner moved this to Backlog (DO NOT ADD NEW ISSUES) in SQL Queries Jul 24, 2023
@DrewKimball DrewKimball removed their assignment Mar 15, 2024
@DrewKimball DrewKimball moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

8 participants