Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: add concurrency limit on the union executor #16815

Closed

Conversation

tiancaiamao
Copy link
Contributor

What problem does this PR solve?

If we do not limit the concurrency of the union executor,
'select * from t limit 1000' could make TiDB OOM on a large partition table.

Problem Summary:

TiDB v3.0.3, on a large partition table

select * from t limit 0,1000

image

The union calls Close but the underlying childen node still running:

goroutine 5579939 [chan receive]:
github.com/pingcap/tidb/executor.(*UnionExec).Close(0xc0123ce620, 0xbc02d7, 0xc00004f900)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/executor/executor.go:1385 +0x9a
github.com/pingcap/tidb/executor.(*baseExecutor).Close(0xc00eee8140, 0x3, 0xc00004f970)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/executor/executor.go:110 +0x7e
github.com/pingcap/tidb/executor.(*LimitExec).Close(0xc00eee8140, 0xc3312a, 0xc01a655470)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/executor/executor.go:874 +0x41
github.com/pingcap/tidb/executor.(*recordSet).Close(0xc00c03b590, 0xc027012580, 0x0)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/executor/adapter.go:133 +0x38
github.com/pingcap/tidb/server.(*tidbResultSet).Close(0xc00c03b5e0, 0xc027012580, 0xc015ee4000)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/server/driver_tidb.go:383 +0x46
github.com/pingcap/parser/terror.Call(0xc01a655580)
/home/jenkins/workspace/release_tidb_3.0/go/pkg/mod/github.com/pingcap/parser@v0.0.0-20190806084718-1a31cabbaef2/terror/terror.go:348 +0x2b
github.com/pingcap/tidb/server.(*clientConn).writeResultset.func1(0x0, 0x228ece0, 0xc00c03b5e0, 0xc01a655738, 0x227cee0, 0xc011835b90, 0xc00a1db110)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/server/conn.go:1254 +0x582
github.com/pingcap/tidb/server.(*clientConn).writeResultset(0xc00a1db110, 0x227cee0, 0xc011835b90, 0x228ece0, 0xc00c03b5e0, 0xc00000fa00, 0x0, 0x0, 0x0)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/server/conn.go:1280 +0x14b
github.com/pingcap/tidb/server.(*clientConn).handleQuery(0xc00a1db110, 0x227cee0, 0xc011835b90, 0xc015eedec1, 0x3c, 0x0, 0x0)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/server/conn.go:1191 +0x212
github.com/pingcap/tidb/server.(*clientConn).dispatch(0xc00a1db110, 0x227cee0, 0xc011835b90, 0xc015eedec1, 0x3d, 0x3d, 0x0, 0x0)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/server/conn.go:897 +0x5c6
github.com/pingcap/tidb/server.(*clientConn).Run(0xc00a1db110, 0x227cee0, 0xc011835b90)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/server/conn.go:652 +0x258
github.com/pingcap/tidb/server.(*Server).onConn(0xc00b59c400, 0xc00a1db110)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/server/server.go:440 +0x481
created by github.com/pingcap/tidb/server.(*Server).Run     /home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/server/server.go:357 +0x83d
goroutine 5582348 [chan receive]:
github.com/pingcap/tidb/distsql.(*selectResult).getSelectResp(0xc026dbf740, 0x3564890, 0x0)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/distsql/select_result.go:166 +0x22b
github.com/pingcap/tidb/distsql.(*selectResult).Next(0xc026dbf740, 0x227cee0, 0xc011835b90, 0xc00b529500, 0x0, 0x0)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/distsql/select_result.go:147 +0x82
github.com/pingcap/tidb/executor.(*tableResultHandler).nextChunk(0xc00b86d800, 0x227cee0, 0xc011835b90, 0xc00b529500, 0xc00ccd7d90, 0x1)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/executor/table_reader.go:224 +0x5c
github.com/pingcap/tidb/executor.(*TableReaderExecutor).Next(0xc010ea4000, 0x227cee0, 0xc011835b90, 0xc00b529500, 0x0, 0x0)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/executor/table_reader.go:148 +0xb5
github.com/pingcap/tidb/executor.Next(0x227cee0, 0xc011835b90, 0x2283da0, 0xc010ea4000, 0xc00b529500, 0xc002e2a9c0, 0x20)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/executor/executor.go:191 +0xbd
github.com/pingcap/tidb/executor.(*LimitExec).Next(0xc00b576960, 0x227cee0, 0xc011835b90, 0xc00b529500, 0x0, 0x0)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/executor/executor.go:843 +0x29d
github.com/pingcap/tidb/executor.Next(0x227cee0, 0xc011835b90, 0x2283620, 0xc00b576960, 0xc00b529500, 0xc0123ce620, 0xc0262a4058)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/executor/executor.go:191 +0xbd
github.com/pingcap/tidb/executor.(*UnionExec).resultPuller(0xc0123ce620, 0x227cee0, 0xc011835b90, 0x12)
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/executor/executor.go:1340 +0x272
created by github.com/pingcap/tidb/executor.(*UnionExec).initialize
/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/executor/executor.go:1308 +0x131

For each partition, the union executor use a new goroutine.
The table reader has load too much data before the limit could take effect.

What is changed and how it works?

What's Changed:

How it Works:

Set a concurrency limiter for the union executor.
Do not spawn all goroutine immediately.

Related changes

  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test

The concurrency case is not easy to test, in this commit there is a simple test that just cover the new code.

Side effects

  • Performance regression

Maybe, because of the introduce of the concurrency limit.

Release note

If we do not limit the concurrency on the union executor,
'select * from t limit 1000' could make TiDB OOM on a large partition table.
@tiancaiamao tiancaiamao requested a review from a team as a code owner April 24, 2020 17:40
@ghost ghost requested review from wshwsh12 and removed request for a team April 24, 2020 17:40
@tiancaiamao
Copy link
Contributor Author

PTAL @XuHuaiyu

@github-actions github-actions bot added the sig/execution SIG execution label Apr 24, 2020
@tiancaiamao tiancaiamao added this to the v4.0.0-ga milestone Apr 26, 2020
Copy link
Contributor

@wshwsh12 wshwsh12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -1461,19 +1461,66 @@ func (e *UnionExec) Open(ctx context.Context) error {
return nil
}
Copy link
Contributor

@lysu lysu Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiancaiamao maybe also need to make child.Open under flow-control...

some executors will buildCopRequest(make region cache or pd became busy) in "Open phase" 😭

github.com/pingcap/tidb/store/tikv.splitRanges(0xc003586178, 0xc00024bf80, 0xc0bb230bd0, 0xc003585f78, 0x100000000000000, 0xc4a1d6)
        /home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:295 +0x592
github.com/pingcap/tidb/store/tikv.buildCopTasks(0xc003586178, 0xc00024bf80, 0xc0bb230bd0, 0x23b0000, 0x1e70e80, 0xc038b15840, 0x240b760, 0xc0bb230ba0, 0x208
0)
        /home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:275 +0x114
github.com/pingcap/tidb/store/tikv.(*CopClient).Send(0xc0567eb1a0, 0x240b760, 0xc12e277cb0, 0xc0bba52a80, 0xc02d5a0b00, 0x10, 0x10)
        /home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:91 +0x205
github.com/pingcap/tidb/distsql.Select(0x240b760, 0xc12e277cb0, 0x2443e00, 0xc03eb2aa50, 0xc0bba52a80, 0xc01ba72100, 0x1f, 0x1f, 0xc01ba0f680, 0xc0035862f0, ...)
        /home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/distsql/distsql.go:44 +0x151
github.com/pingcap/tidb/distsql.SelectWithRuntimeStats(0x240b760, 0xc12e277cb0, 0x2443e00, 0xc03eb2aa50, 0xc0bba52a80, 0xc01ba72100, 0x1f, 0x1f, 0xc01ba0f680, 0xc05246ae00, ...)
        /home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/distsql/distsql.go:89 +0x97
github.com/pingcap/tidb/executor.selectResultHook.SelectResult(0x0, 0x240b760, 0xc12e277cb0, 0x2443e00, 0xc03eb2aa50, 0xc0bba52a80, 0xc01ba72100, 0x1f, 0x1f, 0xc01ba0f680, ...)
        /home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/executor/table_reader.go:48 +0x1c4
github.com/pingcap/tidb/executor.(*TableReaderExecutor).buildResp(0xc0283e6480, 0x240b760, 0xc12e277cb0, 0xc03d9ef018, 0x1, 0x1, 0x1, 0x0, 0x0, 0x0)
        /home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/executor/table_reader.go:187 +0x35d
github.com/pingcap/tidb/executor.(*TableReaderExecutor).Open(0xc0283e6480, 0x240b760, 0xc12e277cb0, 0x1f, 0x1f)
        /home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/executor/table_reader.go:120 +0x3cd
github.com/pingcap/tidb/executor.(*baseExecutor).Open(0xc01ba703c0, 0x240b760, 0xc12e277cb0, 0x0, 0x0)
        /home/jenkins/agent/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/executor/executor.go:99 +0x7a
github.com/pingcap/tidb/executor.(*LimitExec).Open(0xc01ba703c0, 0x240b760, 0xc12e277cb0, 0x0, 0x0)

@AilinKid
Copy link
Contributor

AilinKid commented Sep 8, 2020

close since #19827

@AilinKid AilinKid closed this Sep 8, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Sep 8, 2020

@tiancaiamao tiancaiamao deleted the union-concurrency-limit branch September 16, 2020 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.