Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in instrumentation-pg #2479

Closed
yandrushchak opened this issue Oct 11, 2024 · 21 comments · Fixed by #2484
Closed

Memory leak in instrumentation-pg #2479

yandrushchak opened this issue Oct 11, 2024 · 21 comments · Fixed by #2484
Assignees
Labels
bug Something isn't working pkg:instrumentation-pg priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies

Comments

@yandrushchak
Copy link

What version of OpenTelemetry are you using?

@opentelemetry/instrumentation-pg: 0.45.1

What version of Node are you using?

v20.15.1

What did you do?

Used @opentelemetry/instrumentation-pg with TypeORM and pg-pool. Received warning about possible memory leak, which happens since listeners are constantly attached for each query. Partial stack trace:

MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 connect listeners added to [BoundPool]. MaxListeners is 10. Use emitter.setMaxListeners() to increase limit
    at genericNodeError (node:internal/errors:984:15)
    at wrappedFn (node:internal/errors:538:14)
    at _addListener (node:events:593:17)
    at BoundPool.addListener (node:events:611:10)
    at BoundPool.connect (c:/test/node_modules\@opentelemetry\instrumentation-pg\src\instrumentation.ts:461:12)
    at <anonymous> (c:/test/node_modules\typeorm\driver\src\driver\postgres\PostgresDriver.ts:1175:25)
    at new Promise (<anonymous>)
    at PostgresDriver.obtainMasterConnection (c:/test/node_modules\typeorm\driver\src\driver\postgres\PostgresDriver.ts:1174:16)
    at PostgresQueryRunner.connect (c:/test/node_modules\typeorm\driver\src\driver\postgres\PostgresQueryRunner.ts:111:18)
    at PostgresQueryRunner.query (c:/test/node_modules\typeorm\driver\src\driver\postgres\PostgresQueryRunner.ts:248:47)
....

What did you expect to see?

No memory leak.

Additional context

This issue was introduced in this commit - 12adb43#diff-1360de1aed58c38c5cbd3a1759a8af761049ea2c562a477964623fb2358b2730R379
Since pg.Pool.connect is called for each query to obtain connection, duplicated event listeners are attached each time, which would probably also lead to wrong metrics data.

It seems in order to fix it, additional check is needed to make sure event listeners are added only on a first connect call.

@yandrushchak yandrushchak added the bug Something isn't working label Oct 11, 2024
@astrot1988
Copy link

same issue

@clintonb
Copy link

We observed this issue after upgrading from @opentelemetry/auto-instrumentations-node 0.46.1 to 0.50.1.

@clintonb
Copy link

Here is a stack trace from our container startup. The line in instrumentation.js corresponds with:

this.on('acquire', () => {
plugin._connectionsCounter = utils.updateCounter(
this,
plugin._connectionsCount,
plugin._connectionPendingRequests,
plugin._connectionsCounter
);
});

(node:1) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 remove listeners added to [BoundPool]. MaxListeners is 10. Use emitter.setMaxListeners() to increase limit
    at new Promise (<anonymous>)
    at /srv/node_modules/.pnpm/@nestjs+terminus@10.2.3_drcoupryhn6e72cgxfoljijmlu/node_modules/@nestjs/terminus/dist/health-indicator/database/typeorm.health.js:17:71
    at Generator.next (<anonymous>)
    at TypeOrmHealthIndicator.<anonymous> (/srv/node_modules/.pnpm/@nestjs+terminus@10.2.3_drcoupryhn6e72cgxfoljijmlu/node_modules/@nestjs/terminus/dist/health-indicator/database/typeorm.health.js:134:28)
    at TypeOrmHealthIndicator.pingDb (/srv/node_modules/.pnpm/@nestjs+terminus@10.2.3_drcoupryhn6e72cgxfoljijmlu/node_modules/@nestjs/terminus/dist/health-indicator/database/typeorm.health.js:94:16)
    at __awaiter (/srv/node_modules/.pnpm/@nestjs+terminus@10.2.3_drcoupryhn6e72cgxfoljijmlu/node_modules/@nestjs/terminus/dist/health-indicator/database/typeorm.health.js:13:12)
    at new Promise (<anonymous>)
    at /srv/node_modules/.pnpm/@nestjs+terminus@10.2.3_drcoupryhn6e72cgxfoljijmlu/node_modules/@nestjs/terminus/dist/health-indicator/database/typeorm.health.js:17:71
    at Generator.next (<anonymous>)
    at TypeOrmHealthIndicator.<anonymous> (/srv/node_modules/.pnpm/@nestjs+terminus@10.2.3_drcoupryhn6e72cgxfoljijmlu/node_modules/@nestjs/terminus/dist/health-indicator/database/typeorm.health.js:107:40)
    at DataSource.<anonymous> (/srv/node_modules/.pnpm/opentelemetry-instrumentation-typeorm@0.41.0_@opentelemetry+api@1.9.0/node_modules/opentelemetry-instrumentation-typeorm/dist/src/typeorm.js:267:42)
    at ContextAPI.with (/srv/node_modules/.pnpm/@opentelemetry+api@1.9.0/node_modules/@opentelemetry/api/build/src/api/context.js:60:46)
    at AsyncLocalStorageContextManager.with (/srv/node_modules/.pnpm/@opentelemetry+context-async-hooks@1.26.0_@opentelemetry+api@1.9.0/node_modules/@opentelemetry/context-async-hooks/build/src/AsyncLocalStorageContextManager.js:33:40)
    at AsyncLocalStorage.run (node:async_hooks:346:14)
    at /srv/node_modules/.pnpm/opentelemetry-instrumentation-typeorm@0.41.0_@opentelemetry+api@1.9.0/node_modules/opentelemetry-instrumentation-typeorm/dist/src/typeorm.js:267:82
    at TypeormInstrumentation._endSpan (/srv/node_modules/.pnpm/opentelemetry-instrumentation-typeorm@0.41.0_@opentelemetry+api@1.9.0/node_modules/opentelemetry-instrumentation-typeorm/dist/src/typeorm.js:283:30)
    at /srv/node_modules/.pnpm/opentelemetry-instrumentation-typeorm@0.41.0_@opentelemetry+api@1.9.0/node_modules/opentelemetry-instrumentation-typeorm/dist/src/typeorm.js:267:106
    at DataSource.query (/srv/node_modules/.pnpm/typeorm@0.3.17_better-sqlite3@9.4.0_ioredis@5.3.2_pg-query-stream@4.5.3_pg@8.11.3__pg@8.11.3__nbwcw7yzxgjrpbcbytvl7a4bfy/node_modules/typeorm/data-source/DataSource.js:345:42)
    at PostgresQueryRunner.query (/srv/node_modules/.pnpm/typeorm@0.3.17_better-sqlite3@9.4.0_ioredis@5.3.2_pg-query-stream@4.5.3_pg@8.11.3__pg@8.11.3__nbwcw7yzxgjrpbcbytvl7a4bfy/node_modules/typeorm/driver/postgres/PostgresQueryRunner.js:174:47)
    at PostgresQueryRunner.connect (/srv/node_modules/.pnpm/typeorm@0.3.17_better-sqlite3@9.4.0_ioredis@5.3.2_pg-query-stream@4.5.3_pg@8.11.3__pg@8.11.3__nbwcw7yzxgjrpbcbytvl7a4bfy/node_modules/typeorm/driver/postgres/PostgresQueryRunner.js:68:18)
    at PostgresDriver.obtainMasterConnection (/srv/node_modules/.pnpm/typeorm@0.3.17_better-sqlite3@9.4.0_ioredis@5.3.2_pg-query-stream@4.5.3_pg@8.11.3__pg@8.11.3__nbwcw7yzxgjrpbcbytvl7a4bfy/node_modules/typeorm/driver/postgres/PostgresDriver.js:870:16)
    at new Promise (<anonymous>)
    at /srv/node_modules/.pnpm/typeorm@0.3.17_better-sqlite3@9.4.0_ioredis@5.3.2_pg-query-stream@4.5.3_pg@8.11.3__pg@8.11.3__nbwcw7yzxgjrpbcbytvl7a4bfy/node_modules/typeorm/driver/postgres/PostgresDriver.js:871:25
    at BoundPool.connect (/srv/node_modules/.pnpm/@opentelemetry+instrumentation-pg@0.45.1_@opentelemetry+api@1.9.0/node_modules/@opentelemetry/instrumentation-pg/build/src/instrumentation.js:306:22)
    at BoundPool.addListener (node:events:611:10)
    at _addListener (node:events:593:17)
    at wrappedFn (node:internal/errors:538:14)
    at genericNodeError (node:internal/errors:984:15)
    

@thecopy
Copy link

thecopy commented Oct 15, 2024

Same issue, our database calls are now very very slow as well. Downgrading to auto-instrumentations-node: v0.48.0 solved this issue for us

@yandrushchak
Copy link
Author

As a workaround, downgrade of auto-instrumentation-node to 0.50.0 solves it, since it references "@opentelemetry/instrumentation-pg": "^0.44.0", which doesn't have this issue.

@pichlermarc pichlermarc added priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies pkg:instrumentation-pg labels Oct 15, 2024
@pichlermarc
Copy link
Member

FYI @maryliag (component owner)

@maryliag
Copy link
Contributor

Hi everyone, thank you for raising this issue! Indeed I made a mistake and was adding several event listeners to the same pool. Although this is obviously not ideal, I want to clarify the comment

duplicated event listeners are attached each time, which would probably also lead to wrong metrics data.

This (wrong metric data) is not happening, because I did noticed the updater was getting called a few times, so I made sure it always sends the correct value, event if called multiple times. I just didn't realize the reason why it was getting called multiple times 😓

Thanks to the input from this issue, I was able to create a fix for it. Also some minor performance improvements on #2484

If you continue seeing this issue on a version with this fix, please let me know!

@maryliag maryliag self-assigned this Oct 15, 2024
@zuk
Copy link

zuk commented Oct 15, 2024

Thanks @maryliag !

We've been seeing some serious performance issues that coincidentally began right after we updated our opentelemetry deps.

Specifically, over the course of several hours, db-pool.connect durations would get slower and slower, eventually growing from a few ms to 2+ seconds until eventually the whole process would come to a halt, waiting for db connections to open. Rolling back @opentelemetry/auto-instrumentations-node ^0.50.1 -> 0.48.0 fixed the problem for now.

@julioantonini
Copy link

Hi guys!
I was facing the same issue, so for now, I downgraded the @opentelemetry/auto-instrumentations-node to version 0.49.0.
This temporarily fixed the problem until a hotfix is released

"@opentelemetry/auto-instrumentations-node": "0.49.0"

@maryliag
Copy link
Contributor

durations would get slower and slower, eventually growing from a few ms to 2+ seconds until eventually the whole process would come to a halt

this seems to align with a new listeners being added on every new connection, which is what the PR I created is fixing, so hopefully this will solve your problem.
I appreciate everybody's patience until this gets merged and added to a new release 🙏 And I apologize for any problems this might have caused.

Also, hope the new metrics related to pool will be helpful 😄

@peterabbott
Copy link

Expereienced this too, had to rollback to 0.44.0 of instrumentation-pg. Caused a significant performance impact on one of our systems.

@peterabbott
Copy link

peterabbott commented Oct 16, 2024

To add, when using the event listeners, they should be a matching remove to avoid this.

https://github.com/open-telemetry/opentelemetry-js-contrib/commit/12adb4354f09ade438cd96340bdfd1f715b5fed3#diff-1360de1aed58c3[…]4623fb2358b2730R379

It would be nice of the metrics could be an opt-in

@maryliag
Copy link
Contributor

maryliag commented Oct 16, 2024

they should be a matching remove to avoid this

when do you suggest the removal would occur?

@maryliag
Copy link
Contributor

@opentelemetry/instrumentation-pg: 0.46.0 has been released with the fix

@clintonb
Copy link

I still see an error and immense DB slowness with the latest version of the package.

(node:1) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 release listeners added to [BoundPool]. MaxListeners is 10. Use emitter.setMaxListeners() to increase limit

@pichlermarc
Copy link
Member

pichlermarc commented Oct 18, 2024

@clintonb thanks for the feedback - would you mind sharing the full stacktrace of that error again (for the new version)? I'm trying to track down possible ways this flag could fail.

(the memory leak is currently the highest priority, once we've sorted this out we can tackle performance improvements in a separate issue)

@pichlermarc pichlermarc reopened this Oct 18, 2024
@clintonb
Copy link

Hmm...trace shows @opentelemetry+instrumentation-pg@0.45.0 in the PNPM path. I think there may be a funky interaction with @opentelemetry/auto-instrumentations-node going on here.

(node:1) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 release listeners added to [BoundPool]. MaxListeners is 10. Use emitter.setMaxListeners() to increase limit
    at fromFirestoreWithSqlAnnotations (/srv/dist/shared/dataAccess/User.js:152:27)
    at getUserByFirestoreID (/srv/dist/shared/dataAccess/User.js:247:46)
    at Repository.findOneOrFail (/srv/node_modules/.pnpm/typeorm@0.3.17_better-sqlite3@9.4.0_ioredis@5.3.2_pg-query-stream@4.5.3_pg@8.11.3__pg@8.11.3__nbwcw7yzxgjrpbcbytvl7a4bfy/node_modules/typeorm/repository/Repository.js:266:29)
    at EntityManager.findOneOrFail (/srv/node_modules/.pnpm/typeorm@0.3.17_better-sqlite3@9.4.0_ioredis@5.3.2_pg-query-stream@4.5.3_pg@8.11.3__pg@8.11.3__nbwcw7yzxgjrpbcbytvl7a4bfy/node_modules/typeorm/entity-manager/EntityManager.js:637:21)
    at EntityManager.<anonymous> (/srv/node_modules/.pnpm/opentelemetry-instrumentation-typeorm@0.41.0_@opentelemetry+api@1.9.0/node_modules/opentelemetry-instrumentation-typeorm/dist/src/typeorm.js:175:42)
    at ContextAPI.with (/srv/node_modules/.pnpm/@opentelemetry+api@1.9.0/node_modules/@opentelemetry/api/build/src/api/context.js:60:46)
    at AsyncLocalStorageContextManager.with (/srv/node_modules/.pnpm/@opentelemetry+context-async-hooks@1.26.0_@opentelemetry+api@1.9.0/node_modules/@opentelemetry/context-async-hooks/build/src/AsyncLocalStorageContextManager.js:33:40)
    at AsyncLocalStorage.run (node:async_hooks:346:14)
    at /srv/node_modules/.pnpm/opentelemetry-instrumentation-typeorm@0.41.0_@opentelemetry+api@1.9.0/node_modules/opentelemetry-instrumentation-typeorm/dist/src/typeorm.js:175:82
    at TypeormInstrumentation._endSpan (/srv/node_modules/.pnpm/opentelemetry-instrumentation-typeorm@0.41.0_@opentelemetry+api@1.9.0/node_modules/opentelemetry-instrumentation-typeorm/dist/src/typeorm.js:283:30)
    at /srv/node_modules/.pnpm/opentelemetry-instrumentation-typeorm@0.41.0_@opentelemetry+api@1.9.0/node_modules/opentelemetry-instrumentation-typeorm/dist/src/typeorm.js:175:106
    at EntityManager.findOne (/srv/node_modules/.pnpm/typeorm@0.3.17_better-sqlite3@9.4.0_ioredis@5.3.2_pg-query-stream@4.5.3_pg@8.11.3__pg@8.11.3__nbwcw7yzxgjrpbcbytvl7a4bfy/node_modules/typeorm/entity-manager/EntityManager.js:596:14)
    at SelectQueryBuilder.<anonymous> (/srv/node_modules/.pnpm/opentelemetry-instrumentation-typeorm@0.41.0_@opentelemetry+api@1.9.0/node_modules/opentelemetry-instrumentation-typeorm/dist/src/typeorm.js:185:37)
    at SelectQueryBuilder.getOne (/srv/node_modules/.pnpm/typeorm@0.3.17_better-sqlite3@9.4.0_ioredis@5.3.2_pg-query-stream@4.5.3_pg@8.11.3__pg@8.11.3__nbwcw7yzxgjrpbcbytvl7a4bfy/node_modules/typeorm/query-builder/SelectQueryBuilder.js:711:36)
    at SelectQueryBuilder.<anonymous> (/srv/node_modules/.pnpm/opentelemetry-instrumentation-typeorm@0.41.0_@opentelemetry+api@1.9.0/node_modules/opentelemetry-instrumentation-typeorm/dist/src/typeorm.js:185:37)
    at SelectQueryBuilder.getRawAndEntities (/srv/node_modules/.pnpm/typeorm@0.3.17_better-sqlite3@9.4.0_ioredis@5.3.2_pg-query-stream@4.5.3_pg@8.11.3__pg@8.11.3__nbwcw7yzxgjrpbcbytvl7a4bfy/node_modules/typeorm/query-builder/SelectQueryBuilder.js:684:40)
    at SelectQueryBuilder.executeEntitiesAndRawResults (/srv/node_modules/.pnpm/typeorm@0.3.17_better-sqlite3@9.4.0_ioredis@5.3.2_pg-query-stream@4.5.3_pg@8.11.3__pg@8.11.3__nbwcw7yzxgjrpbcbytvl7a4bfy/node_modules/typeorm/query-builder/SelectQueryBuilder.js:2034:37)
    at SelectQueryBuilder.loadRawResults (/srv/node_modules/.pnpm/typeorm@0.3.17_better-sqlite3@9.4.0_ioredis@5.3.2_pg-query-stream@4.5.3_pg@8.11.3__pg@8.11.3__nbwcw7yzxgjrpbcbytvl7a4bfy/node_modules/typeorm/query-builder/SelectQueryBuilder.js:2183:43)
    at PostgresQueryRunner.query (/srv/node_modules/.pnpm/typeorm@0.3.17_better-sqlite3@9.4.0_ioredis@5.3.2_pg-query-stream@4.5.3_pg@8.11.3__pg@8.11.3__nbwcw7yzxgjrpbcbytvl7a4bfy/node_modules/typeorm/driver/postgres/PostgresQueryRunner.js:174:47)
    at PostgresQueryRunner.connect (/srv/node_modules/.pnpm/typeorm@0.3.17_better-sqlite3@9.4.0_ioredis@5.3.2_pg-query-stream@4.5.3_pg@8.11.3__pg@8.11.3__nbwcw7yzxgjrpbcbytvl7a4bfy/node_modules/typeorm/driver/postgres/PostgresQueryRunner.js:68:18)
    at PostgresDriver.obtainMasterConnection (/srv/node_modules/.pnpm/typeorm@0.3.17_better-sqlite3@9.4.0_ioredis@5.3.2_pg-query-stream@4.5.3_pg@8.11.3__pg@8.11.3__nbwcw7yzxgjrpbcbytvl7a4bfy/node_modules/typeorm/driver/postgres/PostgresDriver.js:870:16)
        at new Promise (<anonymous>)
    at /srv/node_modules/.pnpm/typeorm@0.3.17_better-sqlite3@9.4.0_ioredis@5.3.2_pg-query-stream@4.5.3_pg@8.11.3__pg@8.11.3__nbwcw7yzxgjrpbcbytvl7a4bfy/node_modules/typeorm/driver/postgres/PostgresDriver.js:871:25
    at BoundPool.connect (/srv/node_modules/.pnpm/@opentelemetry+instrumentation-pg@0.45.0_@opentelemetry+api@1.9.0/node_modules/@opentelemetry/instrumentation-pg/build/src/instrumentation.js:309:22)
    at BoundPool.addListener (node:events:611:10)
    at _addListener (node:events:593:17)
    at wrappedFn (node:internal/errors:538:14)
    at genericNodeError (node:internal/errors:984:15)

@nanotower
Copy link

nanotower commented Oct 28, 2024

It's happening for me as well.
@opentelemetry/auto-instrumentations-node": "^0.50.2", importing
"@opentelemetry/instrumentation-pg": "^0.47.0", which in turn is one minor version above the fix

(node:66) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 acquire listeners added to [BoundPool]. MaxListeners is 10. Use emitter.setMaxListeners() to increase limit
at genericNodeError (node: internal/errors:984:15)
at wrappedFn (node: internal/errors:538:14)
at _addListener (node:events:593:17)
at BoundPool.addListener (node:events:611:10)
at BoundPooL. connect (/usr/src/app/node modules/@opentelemetry/instrumentation-pg/build/src/instrumentation.js:306:22)
at PostgresDriver.acquireConnection (/usr/src/app/node modules/kysely/dist/cjs/dialect/postgres/postgres-driver.js:21:41)
at RuntimeDriver.acquireConnection (/usr/src/app/node_modules/kysely/dist/cjs/driver/runtime-driver.js:46:47)
at DefaultConnectionProvider.provideConnection (/usr/src/app/node_modules/kysely/dist/cjs/driver/default-connection-provider.js:10:47) at DefaultQueryExecutor.provideConnection (/usr/src/app/node_modules/kysely/dist/cjs/query-executor/default-query-executor.js:22:41) at DefaultQueryExecutor.executeQuery (/us/src/app/node_modules/kysely/dist/cjs/query-executor/query-executor-base.js:36:27

@maryliag
Copy link
Contributor

Hey @nanotower , anyway to get more info on your stacktrace? I want to confirm that the version being used is indeed the one you imported.
For example, on the trace above shared by Clinton you can see

at BoundPool.connect (/srv/node_modules/.pnpm/@opentelemetry+instrumentation-pg@0.45.0_@opentelemetry+api@1.9.0/node_modules/@opentelemetry/instrumentation-pg/build/src/instrumentation.js:309:22)
    at BoundPool.addListener (node:events:611:10)

that has a clear @opentelemetry+instrumentation-pg@0.45.0, but I don't see the version on your trace itself, just

at BoundPooL. connect (/usr/src/app/node modules/@opentelemetry/instrumentation-pg/build/src/instrumentation.js:306:22)

which points to a line (306) on current version that is a comment

@nanotower
Copy link

You're right @maryliag . My node_modules are running

"name": "@opentelemetry/instrumentation-pg",
"version": "0.45.1",

Minor version below the one with the fix. Thanks for the heads up.

@maryliag
Copy link
Contributor

Since there hasn't been more reports of this issue with the new version, I will close it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pkg:instrumentation-pg priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants