Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: cleanup prepare cache when client send deallocate #8332

Merged
merged 7 commits into from
Nov 20, 2018
Merged

planner: cleanup prepare cache when client send deallocate #8332

merged 7 commits into from
Nov 20, 2018

Conversation

lysu
Copy link
Contributor

@lysu lysu commented Nov 15, 2018

What problem does this PR solve?

ref #8330, current tidb's ps deallocate handler doesn't release plan-cache memory, this PR fix it.

What is changed and how it works?

delete plan cache when client deallocate a prepared stmt.

  • in handleStmt for binary protocol
  • in deallocExec for text protocol "deallocate prepare stmtX"
  • add some test case

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)

use code in issue doesn't saw OOM any more

Code changes

  • Has exported function/method change
  • Has interface methods change

Side effects

  • Increased code complexity

Related changes

  • Need to cherry-pick to the release branch

Remain Question

current impl simple use current sessionVars, if sql_mode or schema_version are changed before deallocate, then we can not free previous item.


This change is Reviewable

@lysu lysu added type/bugfix This PR fixes a bug. sig/planner SIG: Planner labels Nov 15, 2018
@lysu
Copy link
Contributor Author

lysu commented Nov 15, 2018

/run-all-tests

@lysu
Copy link
Contributor Author

lysu commented Nov 15, 2018

PTAL @dbjoa, @eurekaka if free, thx~

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dbjoa
Copy link
Contributor

dbjoa commented Nov 15, 2018

@lysu Should we need to delete these obsoleted plans from the plan cache? By virtue of LRU, the plans will be destroyed eventually.

planner/core/cache.go Outdated Show resolved Hide resolved
session/session.go Outdated Show resolved Hide resolved
func (key *pstmtPlanCacheKey) SetPstmtIDSchemaVersion(pstmtID uint32, schemaVersion int64) {
key.pstmtID = pstmtID
key.schemaVersion = schemaVersion
key.hash = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to set key.hash = key.hash[:0]. In https://github.com/pingcap/tidb/pull/8332/files#diff-76a70a17b419c3333a3ff060d8f7c330R73:

if len(key.hash) == 0 {
	// calculate hash value
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key.hash = key.hash[:0] will not release the memory and it's some kind of leak, while key.hash = nil does

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiancaiamao I reuse this key because it can be reuse if there have more then one key(emm, we can new everytime too), if we keep reuse way [:0] in here is more suitable.

if planCacheEnabled {
if i > 0 {
cacheKey.(plannercore.PstmtCacheKeyMutator).SetPstmtIDSchemaVersion(
stmtID, s.sessionVars.PreparedStmts[stmtID].SchemaVersion,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only need to reset stmtID?

Copy link
Contributor Author

@lysu lysu Nov 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for normal situation, session env are as same as last exec time, and here we can only got current session vars, so we CAN only reset stmtID here, it works well in most time although do nothing in some corner case(e.g. prepare exec then change sql_mode then dealloc).

@zz-jason
Copy link
Member

Agreed with @dbjoa . #8330 described a bad usage of the plan cache. For cache, it should not be set too large.

@lysu
Copy link
Contributor Author

lysu commented Nov 15, 2018

Hi, @dbjoa I think it's better to do that, because same stmtID for this connection will never be use, we should remove them just like we remove SessionVars$PreparedStmts, plan tree is more heavy than StmtNode ast tree(much func node, type node..many small object, much gc pressure).

IMHO, LRU capacity should be a protection mechanism that protect max memory usage instead of free memory, LRU only can confirm cache item never overflow capacity but can not make sure item destoryed eventually.

For real product env, we should be hard to set a right lowest capacity value, so there are still some memory will be hold, current plan cache is in connection level, so if a appliaction with a big connection pool or delploy in 100+ instance, it will be much memory in total view even if just some item can not free for each connections.

I thinks it is better release resource if no need them, for this case delete isn't expensive, free memory will be async done by gc.

@dbjoa
Copy link
Contributor

dbjoa commented Nov 15, 2018

In order to fix #8330, we can change util.kvcache to use the byte-size as its capacity instead of the number of elements.

@eurekaka
Copy link
Contributor

LRU capacity should be a protection mechanism that protect max memory usage instead of free memory

+1
If we want to limit the memory of kvcache, we have to do bookkeeping for memory usage of cached plan? seems it is not an easy job.

@lysu
Copy link
Contributor Author

lysu commented Nov 15, 2018

@dbjoa, I agree byte-size is better than elements and can solve OOM, but it still keep n bytes no use data per connection and waste resource.

Again, I think LRU capacity should be a protection mechanism

@zz-jason
Copy link
Member

@dbjoa It's hard to calculate the memory consumed by a cached plan. The present plan cache is in the session level, once there are hundreds of connections, TiDB can still run in OOM with a low cache capacity.

For real product env, we should be hard to set a right lowest capacity value, so there are still some memory will be hold, current plan cache is in connection level, so if a appliaction with a big connection pool or delploy in 100+ instance, it will be much memory in total view even if just some item can not free for each connections.

For this point, I agree with @lysu

I think maybe we can use a global plan cache to fix #8330

@eurekaka
Copy link
Contributor

Would global plan cache introduce contention? IIRC, Oracle used to have this problem.
https://yq.aliyun.com/articles/55698?spm=a2c4e.11153940.blogcont55719.4.41dc4c02Qbdunp

@zz-jason
Copy link
Member

zz-jason commented Nov 15, 2018

This PR can not fix this situation: The capacity of the session level plan cache is reasonable, but there are a lot of connections execute the prepare, execute, execute, ... operations, without any deallocate operation.

In this situation, the unused cache can not be cleaned by the method introduced in this PR.

@dbjoa
Copy link
Contributor

dbjoa commented Nov 15, 2018

  • I have concern about the performance of the global plan cache because it needs lock to prevent the race condition. The lock management cost might not be cheap for the CPU intensive workload like compiling query statements.
  • We can compute the size of an object once when the object is put. In order to know the object size, we can use memory.Sizeof

@lysu
Copy link
Contributor Author

lysu commented Nov 15, 2018

@zz-jason yes, that case we will fixed if user forgot close stmt by follow max_prepared_stmt_count https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_max_prepared_stmt_count
and will send prepared but not deallocate count into grafana, it seems this work is in @tiancaiamao 's roadmap :D

for long-alive connection without close stmt, maybe we need a TTL - -? but I think its by user's purpose didn't close, maybe we'd better doesn't close if total number doesn't over capacity.

executor/prepared_test.go Outdated Show resolved Hide resolved
}

// SetPstmtIDSchemaVersion implements PstmtCacheKeyMutator interface to change pstmtID and schemaVersion of cacheKey.
// so we can reuse Key instead of new every time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to reuse Key?

func (key *pstmtPlanCacheKey) SetPstmtIDSchemaVersion(pstmtID uint32, schemaVersion int64) {
key.pstmtID = pstmtID
key.schemaVersion = schemaVersion
key.hash = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key.hash = key.hash[:0] will not release the memory and it's some kind of leak, while key.hash = nil does

for _, stmtID := range retryInfo.DroppedPreparedStmtIDs {
delete(s.sessionVars.PreparedStmts, stmtID)
if len(retryInfo.DroppedPreparedStmtIDs) > 0 {
planCacheEnabled := plannercore.PreparedPlanCacheEnabled()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the deallocate prepare stmt1 in the retry history ?

Copy link
Contributor Author

@lysu lysu Nov 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do some dig for it..

the result is deallocate prepare stmt1 will be added into history, and protocol level stmtClose doesn't..

just as #2473 (comment) said, we only add exec into history, and retry reuse exec, so that PR fix the stmtClose delay stmt distory until retry finished....

but It seems forgot deallocate prepare stmt1, it seems there are a bug in deallocate prepare stmt1 with retry in master code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiancaiamao I retrid:

  • for prepare stmt1 from 'xx' and deallocate prepare stmt1 will add history, and will reprepare and close for every retry, so no need lazy clean, and it's unnormal for people use prepare in text protocol, maybe is ok.
  • but for binary stmtPrepare and stmtClose will NOT add to history, so need lazy cleanup at here

summary: it seems no problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retry prepare stmt1 from 'xx' will prepare twice, and retry deallocate prepare stmt1 will dealloc twice?
What will happen in the plan cache ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after this PR will it will add plan cache twice and remove from plan cache twice...

I suddenly found there are maybe have question when a "prepare stmt from xx" in a transaction but without dealloc, retry will make many useless stmt in server, maybe handleStmtPrepare's way is better, we should use a unified way to handle this two entrances.

util/kvcache/simple_lru.go Outdated Show resolved Hide resolved
@@ -70,12 +70,14 @@ type pstmtPlanCacheKey struct {

// Hash implements Key interface.
func (key *pstmtPlanCacheKey) Hash() []byte {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, how about refactor kv.SimpleCache use []byte as key ? (not in this pr)
I don't see any benefit of its Key definition. @lysu

@tiancaiamao
Copy link
Contributor

LGTM

@tiancaiamao tiancaiamao added the status/LGT1 Indicates that a PR has LGTM 1. label Nov 19, 2018
Copy link
Contributor

@eurekaka eurekaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eurekaka eurekaka added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Nov 20, 2018
@eurekaka
Copy link
Contributor

/run-all-tests

@tiancaiamao tiancaiamao merged commit 61ee0da into pingcap:master Nov 20, 2018
lysu added a commit that referenced this pull request Nov 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner status/LGT2 Indicates that a PR has LGTM 2. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants