Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't copy utilityStmt into the incorrect memory context #2128

Merged
merged 1 commit into from
May 1, 2018
Merged

Conversation

lithp
Copy link
Contributor

@lithp lithp commented Apr 25, 2018

While working on #2119 I discovered #2127, plpgsql functions which call COPY fail on the second invocation. There's a full explanation in #2127, but the important part is that we copyObject into a memory context which gets thrown away. This PR removes the copy.

I don't think this is the right solution. I wonder if you have any feedback which could point me to the correct solution. Some ideas I've come up with:

  1. Also remove the other copy which was introduced with this one. It's not a bad thing that we scribble on the parsetree we're given.
  2. Somehow figure out which memory context pstmt is in, and copyObject into that context.
  3. Detect when we're inside SPI, or otherwise working with cached plans, and copy into CacheMemoryContext.

(2) seems like the best solution, but I'm not sure it's possible.

@codecov
Copy link

codecov bot commented Apr 25, 2018

Codecov Report

Merging #2128 into master will increase coverage by 0.06%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2128      +/-   ##
==========================================
+ Coverage   93.72%   93.79%   +0.06%     
==========================================
  Files         100      100              
  Lines       25895    25907      +12     
==========================================
+ Hits        24271    24300      +29     
+ Misses       1624     1607      -17

@marcocitus
Copy link
Member

You could use GetMemoryChunkContext to implement (2). It seems like we already do that in at least one case in ProcessCopyStmt. We would need to be super careful that we don't leak memory into CacheMemoryContext and if we need to free memory then we need to be super careful that we don't cause crashes.

I guess we do various kinds of trickery with COPY, which is not very compatible with the way PG10+ handles utility commands in PL/pgSQL. For example, we generate a whole different parse tree when you run COPY distributed_table TO STDOUT because we replace distributed_table by (SELECT * FROM distributed_table) to make it work. It seems like this "new" parse tree would needs to be allocated in the same memory context as the original parse tree. I'm not sure whether that means we should clean up the old parse tree.

We should maybe also reconsider some of the ways in which we (ab)use the COPY utility hook. For example, it seems we have some special logic in the utility hook around COPY table FROM "pg_job_cache/..." which scribbles on the parse tree to allow a non-superuser to do it, but then I wonder why that goes through the utility hook at all and doesn't just call the copy functions directly (which also seems safer).

@mtuncer
Copy link
Member

mtuncer commented Apr 26, 2018

@lithp could you update the title

@lithp lithp changed the title Fix 2127 COPY inside plpgsql func fails after first invocation Apr 26, 2018
@lithp lithp changed the title COPY inside plpgsql func fails after first invocation Don't copy utilityStmt into the incorrect memory context Apr 26, 2018
@lithp
Copy link
Contributor Author

lithp commented Apr 26, 2018

Nice catch with GetMemoryChunkContext! I can't believe that every single pointer we use is preceded by a pointer to the memory context it's inside of.

You could use GetMemoryChunkContext to implement (2). It seems like we already do that in at least one case in ProcessCopyStmt. We would need to be super careful that we don't leak memory into CacheMemoryContext and if we need to free memory then we need to be super careful that we don't cause crashes.

I think we're safe to copyObject without free()ing anything. We're not really inside CacheMemoryContext, just a child of it. Every plan has it's own CachedPlan context which is a child of CacheMemoryContext. That entire context is thrown away when the cached plan is no longer needed. (I added some logging and confirmed that pstmt really is inside a CachedPlan context)

I guess we do various kinds of trickery with COPY, which is not very compatible with the way PG10+ handles utility commands in PL/pgSQL. For example, we generate a whole different parse tree when you run COPY distributed_table TO STDOUT because we replace distributed_table by (SELECT * FROM distributed_table) to make it work. It seems like this "new" parse tree would needs to be allocated in the same memory context as the original parse tree. I'm not sure whether that means we should clean up the old parse tree.

I've pushed another commit which implements (2). For now I'm copying into the original memory context and leaving the old parse tree alone.

The only other danger I can think of is that some other part of postgres holds onto a reference to the old utilityStmt and uses it and now we have a split-brain of sorts. There are a lot of places in postgres which touch a utilityStmt but I skimmed through most of them and didn't find anything unduly worrying (We should be extra careful when changing the type of utilityStmt, as that might change the resultDesc or bypass important checks such as those which disallow transaction commands from inside SPI). That's also a problem which would have existed before this PR, this isn't making things any worse.


parsetree = copyObject(parsetree);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty hesitant to say we should perform all of ProcessCopyStmt under the parsetree's planner context, it seems like a good way to get a memory leak (I don't mean a long-lived one, I just mean running the same function over and over in the same connection could keep adding things to the cached plan's context, right?)

I have two alternatives:

  • Copy the object into your own context, pass it to ProcessCopyStmt, then copy the return value into the plan context. This way we're only copying what we end up needing for the cached plan (i.e. the parsetree)
  • Keep the code as-is, except switch back to the previous context before calling ProcessCopyStmt. To do this, you'll need to go into ProcessCopyStmt and find anywhere you write to the parsetree. So far as I can see, this is its relation->schemaname (already handled on line 826 in the past, apparently by me), the filename (on line 913), and the entirety of lines 845 to 870 (which build an entirely new copy statement and select query, which would need to be in the right context)

Performing two copies seems silly but I suppose it's the easier solution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, this definitely leaks a bunch of memory into CachedPlan, which has session lifetime.


parsetree = copyObject(parsetree);
parsetree = ProcessCopyStmt((CopyStmt *) parsetree, completionTag,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternately, you could refactor all of ProcessCopyStmt to ensure it never mutates the parsetree it is handed. Instead, you'd declare variables like newSchemaName or newFileName or newQuery and use them to build a new CopyStatement at the very bottom of the method (in the right memory context), and return that. This is probably both cleanest and most performant in the long run, but I wouldn't mind if you go with the double-copy + a TODO to get unblocked.

@jasonmp85
Copy link
Contributor

OK, so I don't really like the whole ProcessCopyStmt being performed under the planned statement's memory context: it just seems sort of risky with many calls in a long-lived session or transaction.

I think you can copyObject (in the current context) before calling ProcessCopyStmt (with that copy), and then do a second copy on its return value in the planned statement's context. This seems most safe and correct, at the minor cost of an extra copy…

If you want to go with another approach, see the two alternatives I suggest (getting rid of mutations to the input parsetree altogether and building a new one at the end of our Process functions, or just meticulously auditing where we assign allocated memory to parsetree fields and appropriately gating those with calls to memory context switch).

I'll leave it up to you as to which you choose; I think the double-copy is fine to unblock you, though. Long-term we should refactor the Process functions to prevent any modifications to the parsetree and to instead create a new parsetree in the right memory context if they need to change things.

Copy link
Contributor

@jasonmp85 jasonmp85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave a couple of impressions on this approach and some alternatives. Pick one (and maybe @marcocitus can give this a second look) and :shipit:.


parsetree = copyObject(parsetree);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, this definitely leaks a bunch of memory into CachedPlan, which has session lifetime.

@lithp
Copy link
Contributor Author

lithp commented Apr 28, 2018

I think you can copyObject (in the current context) before calling ProcessCopyStmt (with that copy), and then do a second copy on its return value in the planned statement's context. This seems most safe and correct, at the minor cost of an extra copy…

Since this is blocking the code-freeze I decided to go with the quicker option. It does leak an extra copy of utilityStmt into the CachedPlan context each time it's called, but I think that's not a large cost, and it's safer than trying to free() the previous statement.

This also has the advantage of being fool-proof, future coders of ProcessCopyStmt can't accidentally leak more memory into the cache context.

Also, I'm pretty sure that the first copy is unnecessary, I don't think it matters whether we scribble on it, but just to be safe I left the first copy in, utility statements are usually (always?) small so this shouldn't add a noticeable overhead.

@lithp lithp force-pushed the fix-2127 branch 2 times, most recently from cfce59b to 274a434 Compare April 28, 2018 05:05
-----------------------

(1 row)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to also add a test with some Citus-specific COPY logic that scribbles on the parse tree (e.g. COPY distributed_table TO ...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it but it can't really show correctness, just that we don't crash.

COPY […] TO STDOUT isn't permitted in PL/pgSQL, and writing to a file isn't something we do in tests, so that leaves sending it to a program, which works but doesn't really print any output.

I considered writing out to some sort of csv file in the results directory and then adding a correct csv file in the expected directory (they'd probably have to end in out) and adding a blank test with that name to trick pg_regress/diff into comparing the output and expectation of the COPY TO for us, but that seemed way too hacky.

So I've just added a test that verifies we don't crash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I've actually removed the test because I'm seeing weird behavior in Travis with COPY TO PROGRAM (namely a broken pipe). It no longer crashes, at any rate.

parsetree = ProcessCopyStmt((CopyStmt *) parsetree, completionTag,
&commandMustRunAsOwner);

previousContext = MemoryContextSwitchTo(planContext);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warrants a comment and maybe even point out the memory leak.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both done.


parsetree = copyObject(parsetree);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use an intermediate variable here? It gets a bit confusing that we're dealing with 3 different values of parsetree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@jasonmp85
Copy link
Contributor

I'll give this a quick look after the team meeting and we can get it merged.

@jasonmp85 jasonmp85 self-assigned this Apr 30, 2018
@jasonmp85
Copy link
Contributor

@marcocitus I talked to @lithp and I'm gonna clean this up according to your latest feedback and merge.

Can't we add a pfree and get rid of the leak? Or am I mistaken?

@marcocitus
Copy link
Member

Can't we add a pfree and get rid of the leak? Or am I mistaken?

Assuming we'd recursively free the tree, then that may be what we should be doing, but we'd have to make very sure that the original parse tree won't get used after the call to the process utility hook.

@marcocitus
Copy link
Member

I talked to @lithp and I'm gonna clean this up according to your latest feedback and merge.

Fine with me.

@jasonmp85
Copy link
Contributor

Assuming we'd recursively free the tree, then that may be what we should be doing, but we'd have to make very sure that the original parse tree won't get used after the call to the process utility hook.

I wrongly assumed there'd be some sort of recursive free in PostgreSQL, akin to copy and equality functions. I suppose it's a testament to the memory functions of PostgreSQL that I've never really needed it.

@jasonmp85 jasonmp85 force-pushed the fix-2127 branch 3 times, most recently from 934fe18 to 37c4828 Compare May 1, 2018 06:47
utilityStmt sometimes (such as when it's inside of a plpgsql function)
comes from a cached plan, which is kept in a child of the
CacheMemoryContext. When we naively call copyObject we're copying it into
a statement-local context, which corrupts the cached plan when it's
thrown away.
@lithp lithp merged commit f8fb7a2 into master May 1, 2018
@marcocitus marcocitus deleted the fix-2127 branch May 4, 2018 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants