-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor: add concurrency limit on the union executor #16815
Conversation
If we do not limit the concurrency on the union executor, 'select * from t limit 1000' could make TiDB OOM on a large partition table.
PTAL @XuHuaiyu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -1461,19 +1461,66 @@ func (e *UnionExec) Open(ctx context.Context) error { | |||
return nil | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiancaiamao maybe also need to make child.Open
under flow-control...
some executors will buildCopRequest(make region cache or pd became busy) in "Open phase" 😭
github.com/pingcap/tidb/store/tikv.splitRanges(0xc003586178, 0xc00024bf80, 0xc0bb230bd0, 0xc003585f78, 0x100000000000000, 0xc4a1d6)
/home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:295 +0x592
github.com/pingcap/tidb/store/tikv.buildCopTasks(0xc003586178, 0xc00024bf80, 0xc0bb230bd0, 0x23b0000, 0x1e70e80, 0xc038b15840, 0x240b760, 0xc0bb230ba0, 0x208
0)
/home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:275 +0x114
github.com/pingcap/tidb/store/tikv.(*CopClient).Send(0xc0567eb1a0, 0x240b760, 0xc12e277cb0, 0xc0bba52a80, 0xc02d5a0b00, 0x10, 0x10)
/home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:91 +0x205
github.com/pingcap/tidb/distsql.Select(0x240b760, 0xc12e277cb0, 0x2443e00, 0xc03eb2aa50, 0xc0bba52a80, 0xc01ba72100, 0x1f, 0x1f, 0xc01ba0f680, 0xc0035862f0, ...)
/home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/distsql/distsql.go:44 +0x151
github.com/pingcap/tidb/distsql.SelectWithRuntimeStats(0x240b760, 0xc12e277cb0, 0x2443e00, 0xc03eb2aa50, 0xc0bba52a80, 0xc01ba72100, 0x1f, 0x1f, 0xc01ba0f680, 0xc05246ae00, ...)
/home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/distsql/distsql.go:89 +0x97
github.com/pingcap/tidb/executor.selectResultHook.SelectResult(0x0, 0x240b760, 0xc12e277cb0, 0x2443e00, 0xc03eb2aa50, 0xc0bba52a80, 0xc01ba72100, 0x1f, 0x1f, 0xc01ba0f680, ...)
/home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/executor/table_reader.go:48 +0x1c4
github.com/pingcap/tidb/executor.(*TableReaderExecutor).buildResp(0xc0283e6480, 0x240b760, 0xc12e277cb0, 0xc03d9ef018, 0x1, 0x1, 0x1, 0x0, 0x0, 0x0)
/home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/executor/table_reader.go:187 +0x35d
github.com/pingcap/tidb/executor.(*TableReaderExecutor).Open(0xc0283e6480, 0x240b760, 0xc12e277cb0, 0x1f, 0x1f)
/home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/executor/table_reader.go:120 +0x3cd
github.com/pingcap/tidb/executor.(*baseExecutor).Open(0xc01ba703c0, 0x240b760, 0xc12e277cb0, 0x0, 0x0)
/home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/executor/executor.go:99 +0x7a
github.com/pingcap/tidb/executor.(*LimitExec).Open(0xc01ba703c0, 0x240b760, 0xc12e277cb0, 0x0, 0x0)
close since #19827 |
No release note, Please follow https://github.com/pingcap/community/blob/master/contributors/release-note-checker.md |
What problem does this PR solve?
If we do not limit the concurrency of the union executor,
'select * from t limit 1000' could make TiDB OOM on a large partition table.
Problem Summary:
TiDB v3.0.3, on a large partition table
The union calls
Close
but the underlying childen node still running:For each partition, the union executor use a new goroutine.
The table reader has load too much data before the limit could take effect.
What is changed and how it works?
What's Changed:
How it Works:
Set a concurrency limiter for the union executor.
Do not spawn all goroutine immediately.
Related changes
Check List
Tests
The concurrency case is not easy to test, in this commit there is a simple test that just cover the new code.
Side effects
Maybe, because of the introduce of the concurrency limit.
Release note