Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statistics: release session met error to avoid possible leak #59522

Closed

Conversation

winoros
Copy link
Member

@winoros winoros commented Feb 13, 2025

What problem does this PR solve?

Issue Number: ref #54022 close #59524

Problem Summary:

We should release the session pool after abandoning it.

What changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Fix the issue that the internal session used by statistics may not be released and cause possible memory leak.
修复统计信息是使用的内部会话在遇到错误时可能没有被释放的问题,该问题可能导致内存溢出。

@ti-chi-bot ti-chi-bot bot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Feb 13, 2025
Copy link

ti-chi-bot bot commented Feb 13, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from winoros and additionally assign wjhuang2016 for approval(Please ensuring that each of them provides their approval before proceeding). For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added sig/planner SIG: Planner size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 13, 2025
@winoros
Copy link
Member Author

winoros commented Feb 13, 2025

/hold
feel free to unhold after 1 lgtm of sql-infra team

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 13, 2025
@winoros
Copy link
Member Author

winoros commented Feb 13, 2025

/unhold
oh, the approver from the domain package is enough.
no need to add the manually checking

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-triage-completed and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Feb 13, 2025
Copy link

codecov bot commented Feb 13, 2025

Codecov Report

Attention: Patch coverage is 35.29412% with 11 lines in your changes missing coverage. Please review.

Project coverage is 74.8468%. Comparing base (27f8ff3) to head (410d3ef).
Report is 16 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #59522        +/-   ##
================================================
+ Coverage   73.0108%   74.8468%   +1.8359%     
================================================
  Files          1693       1739        +46     
  Lines        468044     476199      +8155     
================================================
+ Hits         341723     356420     +14697     
+ Misses       105314      97299      -8015     
- Partials      21007      22480      +1473     
Flag Coverage Δ
integration 48.8469% <11.7647%> (?)
unit 72.1958% <35.2941%> (-0.0175%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.6910% <ø> (ø)
parser ∅ <ø> (∅)
br 61.6709% <ø> (+16.1884%) ⬆️

@@ -85,6 +86,8 @@ func CallWithSCtx(pool util.SessionPool, f func(sctx sessionctx.Context) error,
defer func() {
if err == nil { // only recycle when no error
pool.Put(se)
} else {
infosync.DeleteInternalSession(se)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to put the session in the pool after cleaning up the dirty state..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but how to clean it(

Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that if we return the session back to pool, the pool should be able to call DeleteInternalSession

tidb/pkg/domain/domain.go

Lines 1332 to 1346 in f6bf8e8

sysSessionPool: util.NewSessionPool(
capacity, factory,
func(r pools.Resource) {
_, ok := r.(sessionctx.Context)
intest.Assert(ok)
infosync.StoreInternalSession(r)
},
func(r pools.Resource) {
sctx, ok := r.(sessionctx.Context)
intest.Assert(ok)
intest.AssertFunc(func() bool {
txn, _ := sctx.Txn(false)
return txn == nil || !txn.Valid()
})
infosync.DeleteInternalSession(r)

Can you reproduce this problem in our environment first? So we can use goref or other ways to know why the reference is kept

@Rustin170506
Copy link
Member

I see that if we return the session back to pool, the pool should be able to call DeleteInternalSession

It is challenging to determine whether an error stems from the business logic or from the internal session. This complicates error handling when issues arise. As a result, we decided not to return the session to the pool, but this decision has led to memory leak problems. If we continue failing and create numerous sessions without properly releasing them, it exacerbates the issue.

I propose that we introduce a new API for the session pool that allows the caller to destroy a session instead of directly returning it.

@lance6716
Copy link
Contributor

I see that if we return the session back to pool, the pool should be able to call DeleteInternalSession

It is challenging to determine whether an error stems from the business logic or from the internal session. This complicates error handling when issues arise. As a result, we decided not to return the session to the pool, but this decision has led to memory leak problems. If we continue failing and create numerous sessions without properly releasing them, it exacerbates the issue.

I propose that we introduce a new API for the session pool that allows the caller to destroy a session instead of directly returning it.

This is not related to check the source of error? Line 88 of pkg/statistics/handle/util/util.go should also call DeleteInternalSession

@Rustin170506
Copy link
Member

This is not related to check the source of error? Line 88 of pkg/statistics/handle/util/util.go should also call DeleteInternalSession

If we do not check the source of the error, we cannot put it back in the pool directly. This is also why we call DeleteInternalSession from the caller side.

@winoros
Copy link
Member Author

winoros commented Feb 17, 2025

closed since a better one is implemented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

release the internal session which may meet error like #54022
4 participants