Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hashJoin,NLJoin当左表一条记录可以匹配右表大批记录时容易OOM #38384

Open
wencycool opened this issue Oct 10, 2022 · 2 comments
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@wencycool
Copy link

Enhancement

看这里的代码,当做hashjoin的时候probe表选择一条记录时候,build表如果有大批量结果集匹配到那么没有被内存追踪到,不能落盘,容易引起OOM,可以通过select * from a(少量记录) cross join b(百万或者千万记录)走hashjoin来模拟此行为。
对于hashjoin怀疑和方法:func (c *hashRowContainer) GetMatchedRowsAndPtrs中的matched = append(matched, matchedRow)没有consume有关。

https://github.com/pingcap/tidb/blob/master/executor/hash_table.go#:~:text=matched%20%3D%20append(matched%2C%20matchedRow)

@wencycool wencycool added the type/enhancement The issue or PR belongs to an enhancement. label Oct 10, 2022
@AilinKid
Copy link
Contributor

AilinKid commented Oct 11, 2022

s/chinese/english

the hash joins build side will collect all the matched build rows for each probeRow without memory tracing, which has the potential to be OOM when doing the matching procedure.

already acknowledged. actually, we have the returned ptrs when call back from GetMatchedRowsAndPtrs, we can use ptrs to do the iteration, thx.

Ref: #35630

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

2 participants