Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: WIP - Stress Test Part 2 #557

Closed
wants to merge 18 commits into from
Closed

feat: WIP - Stress Test Part 2 #557

wants to merge 18 commits into from

Conversation

kevinssgh
Copy link
Contributor

Description

Added optional stress test to send continuous cross chain swaps and withdrawals.

Closes:

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Include instructions and any relevant details so others can reproduce.

  • Tested CCTX in localnet
  • Tested in development environment
  • Go unit tests
  • Go integration tests
  • Tested via GitHub Actions

Checklist:

  • I have added unit tests that prove my fix feature works

@kevinssgh kevinssgh requested a review from brewmaster012 May 8, 2023 15:44
co.logger.ZetaChainWatcher.Info().Msgf("numStreams: %d; protocol: %+v; conn: %+v", numStreams, pCount, cCount)
if outTxMan.numActiveProcessor == 0 {
co.logger.ZetaChainWatcher.Warn().Msgf("no active outbound tx processor; safeMode: %v", safeMode)
numStreamsReleased := releaseAllStreams(host.Network())
Copy link
Collaborator

@brewmaster012 brewmaster012 May 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinssgh this line works around the leaking stream by manually clearing them when progress is slow.

To see the leakage:

comment this line out (releaseAllStreams) , and grep logs of zetaclients with

docker logs -f zetaclient2 | grep -e 'blocked' -e 'StartTryProcess'  -e 'EndTryProcess' -e 'outstanding'   -e 'numStreams' -e 'released' -e 'rate'

You will see that when progress is slow and scheduler slows down to 1 keysign every 20 blocks, a lot of times there is no keysign ceremony going on, and the number of strams might still be in the 1000s. I think this is clearly indication that stream is leaking, as there is no reason to have many streams when there is no keysign ceremony.

Decomposing the stream by protocol-ID, you'll see that most leaked streams are protocol ID "/p2p/join-party-leader".

This line manually closes/resets all such streams when clearly there is no keysign going on.

By adding this workaround, the stress test is able to complete 35min without stopping at around 1tx/s.

co.logger.ZetaChainWatcher.Info().Msgf("100 blocks outbound tx processing rate: %.2f", float64(lastProcessedNonce-zblockToProcessedNonce[bn-100])/100.0)
co.logger.ZetaChainWatcher.Info().Msgf("since block 0 outbound tx processing rate: %.2f", float64(lastProcessedNonce)/(1.0*float64(bn)))
}
host := co.Tss.Server.P2pCommunication.GetHost()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go-tss is instrumented (zeta-chain/go-tss#6) to expose the Server and P2pCommunication fields, just to be able to GetHost() (libp2p host).

lenMap[k] = len(v)
}
log.Warn().Msgf("analyzing StreamMgr: %d keys; ", numKeys)
log.Warn().Msgf("StreamMgr statistics by msgID: %v", lenMap)
Copy link
Collaborator

@brewmaster012 brewmaster012 May 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinssgh contrast StreamMgr tracked streams with libp2p Host tracked streams: here's a log

2023-05-14T15:48:08Z WRN analyzing StreamMgr: 10 keys; 
2023-05-14T15:48:08Z WRN StreamMgr statistics by msgID: map[04f7182d57c0687add6addd0a85bc2955d6dd93a02409c6d41be645952c3cfea:1 2a4dbfbff7b34639be505ab4b5ecae197c62adae727c6f78a3691429bf3cdd4a:1 3de54c25db618d025e5225d8ed50bb9493aa0374b10fc2864ebd4af9a13be0c2:1 6f40b24a1cfe8ee6b4a51613043cbec1d551ede50fd5036b9bddc32b80db1111:1 706ec764bf9937472d2b3835978c0ce5f7e0e2525ec69eaab3575b74423bb3c3:1 8042cd0f74774ddf7657d6aade3079517002ed827dc7fdbac0b52e7feaad67ab:1 9692c933ffff66256164948e0044475e47ac030df1095eadbcebd61bcb8e6113:1 c49035206730506bfc25d7db36b5b8bc1cf1cf6a765636d8c6f47a8f09b63f35:1 ccc8023b6865cc8dd9b95988a998beba209014bf1eb7a8fee10780a77efc0ecd:5 f2276a207aa78d8d5eec235835820dbd0d054c53dc9d4d4430ddfc7b01666257:1]
2023-05-14T15:48:08Z WRN released 1178 streams chain=ZetaChain module=ZetaChainWatcher

It suggests that there is a lot of "stray" streams that is not being tracked by StreamMgr somehow, therefore not released by StreamMgr.ReleaseStreams()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue we're having according to these logs is that the stream manager from go-tss is not consistent with the resource manager on libp2p.

go-tss is trying to organize the stream resources by message id, and releasing them based on that. It is also releasing any unknown streams that are not associated with a message id.

Somehow there are stream resources created without the knowledge of the go-tss stream manager. The change I made does not address the unaccounted stream resource issue. The stream handler functions should be the main point of entry and indication of a new streams being created.
This is probably where I will look into next...

@kevinssgh kevinssgh changed the title WIP - Stress Test Part 2 feat: WIP - Stress Test Part 2 May 16, 2023
@kevinssgh
Copy link
Contributor Author

Moved Changes into new PR: #681

@kevinssgh kevinssgh closed this Jun 6, 2023
@lumtis lumtis deleted the stress-test2 branch October 27, 2023 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants