-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: WIP - Stress Test Part 2 #557
Conversation
zetaclient/zetacore_observer.go
Outdated
co.logger.ZetaChainWatcher.Info().Msgf("numStreams: %d; protocol: %+v; conn: %+v", numStreams, pCount, cCount) | ||
if outTxMan.numActiveProcessor == 0 { | ||
co.logger.ZetaChainWatcher.Warn().Msgf("no active outbound tx processor; safeMode: %v", safeMode) | ||
numStreamsReleased := releaseAllStreams(host.Network()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinssgh this line works around the leaking stream by manually clearing them when progress is slow.
To see the leakage:
comment this line out (releaseAllStreams) , and grep logs of zetaclients with
docker logs -f zetaclient2 | grep -e 'blocked' -e 'StartTryProcess' -e 'EndTryProcess' -e 'outstanding' -e 'numStreams' -e 'released' -e 'rate'
You will see that when progress is slow and scheduler slows down to 1 keysign every 20 blocks, a lot of times there is no keysign ceremony going on, and the number of strams might still be in the 1000s. I think this is clearly indication that stream is leaking, as there is no reason to have many streams when there is no keysign ceremony.
Decomposing the stream by protocol-ID, you'll see that most leaked streams are protocol ID "/p2p/join-party-leader"
.
This line manually closes/resets all such streams when clearly there is no keysign going on.
By adding this workaround, the stress test is able to complete 35min without stopping at around 1tx/s.
co.logger.ZetaChainWatcher.Info().Msgf("100 blocks outbound tx processing rate: %.2f", float64(lastProcessedNonce-zblockToProcessedNonce[bn-100])/100.0) | ||
co.logger.ZetaChainWatcher.Info().Msgf("since block 0 outbound tx processing rate: %.2f", float64(lastProcessedNonce)/(1.0*float64(bn))) | ||
} | ||
host := co.Tss.Server.P2pCommunication.GetHost() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go-tss is instrumented (zeta-chain/go-tss#6) to expose the Server and P2pCommunication fields, just to be able to GetHost() (libp2p host).
lenMap[k] = len(v) | ||
} | ||
log.Warn().Msgf("analyzing StreamMgr: %d keys; ", numKeys) | ||
log.Warn().Msgf("StreamMgr statistics by msgID: %v", lenMap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinssgh contrast StreamMgr tracked streams with libp2p Host tracked streams: here's a log
2023-05-14T15:48:08Z WRN analyzing StreamMgr: 10 keys;
2023-05-14T15:48:08Z WRN StreamMgr statistics by msgID: map[04f7182d57c0687add6addd0a85bc2955d6dd93a02409c6d41be645952c3cfea:1 2a4dbfbff7b34639be505ab4b5ecae197c62adae727c6f78a3691429bf3cdd4a:1 3de54c25db618d025e5225d8ed50bb9493aa0374b10fc2864ebd4af9a13be0c2:1 6f40b24a1cfe8ee6b4a51613043cbec1d551ede50fd5036b9bddc32b80db1111:1 706ec764bf9937472d2b3835978c0ce5f7e0e2525ec69eaab3575b74423bb3c3:1 8042cd0f74774ddf7657d6aade3079517002ed827dc7fdbac0b52e7feaad67ab:1 9692c933ffff66256164948e0044475e47ac030df1095eadbcebd61bcb8e6113:1 c49035206730506bfc25d7db36b5b8bc1cf1cf6a765636d8c6f47a8f09b63f35:1 ccc8023b6865cc8dd9b95988a998beba209014bf1eb7a8fee10780a77efc0ecd:5 f2276a207aa78d8d5eec235835820dbd0d054c53dc9d4d4430ddfc7b01666257:1]
2023-05-14T15:48:08Z WRN released 1178 streams chain=ZetaChain module=ZetaChainWatcher
It suggests that there is a lot of "stray" streams that is not being tracked by StreamMgr somehow, therefore not released by StreamMgr.ReleaseStreams()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue we're having according to these logs is that the stream manager from go-tss is not consistent with the resource manager on libp2p.
go-tss is trying to organize the stream resources by message id, and releasing them based on that. It is also releasing any unknown streams that are not associated with a message id.
Somehow there are stream resources created without the knowledge of the go-tss stream manager. The change I made does not address the unaccounted stream resource issue. The stream handler functions should be the main point of entry and indication of a new streams being created.
This is probably where I will look into next...
…calculation. Separated smoke test and stress test into different commands
Moved Changes into new PR: #681 |
Description
Added optional stress test to send continuous cross chain swaps and withdrawals.
Closes:
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Include instructions and any relevant details so others can reproduce.
Checklist: