-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI][Testing] Add BuildJet CI runners for unit tests, integration tests; cache Docker images #4906
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #4906 +/- ##
==========================================
+ Coverage 55.92% 56.11% +0.19%
==========================================
Files 963 965 +2
Lines 89617 89689 +72
==========================================
+ Hits 50115 50331 +216
+ Misses 35752 35606 -146
- Partials 3750 3752 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome
Do we know what is the new runtime approx with this runners? |
Yes - usually about 15 minutes but sometimes takes longer (if needed to retry some jobs) up to about 25 minutes. |
unittest.SkipUnless(s.T(), unittest.TEST_TODO, "flaky") | ||
s.RunTestEpochJoinAndLeave(flow.RoleVerification, s.AssertNetworkHealthyAfterVNChange) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I saw @jordanschalm fixing this test in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will remove this from quarantine once his PR is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, great work Misha.
utils/test_matrix/test_matrix.go
Outdated
@@ -72,8 +80,10 @@ func generateTestMatrix(targetPackages map[string][]string, otherPackages []stri | |||
|
|||
// listTargetPackages returns a map-list of target packages to run as separate CI jobs, based on a list of target package prefixes. | |||
// It also returns a list of the "seen" packages that can then be used to extract the remaining packages to run (in a separate CI job). | |||
func listTargetPackages(targetPackagePrefixes []string, allFlowPackages []string) (map[string][]string, map[string]string) { | |||
func listTargetPackages(targetPackagePrefixes []string, allFlowPackages []string) (targets, map[string]string) { | |||
//func listTargetPackages(targetPackagePrefixes []string, allFlowPackages []string) (map[string][]string, map[string]string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//func listTargetPackages(targetPackagePrefixes []string, allFlowPackages []string) (map[string][]string, map[string]string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -34,5 +36,6 @@ func (s *EpochJoinAndLeaveVNSuite) SetupTest() { | |||
// TestEpochJoinAndLeaveVN should update verification nodes and assert healthy network conditions | |||
// after the epoch transition completes. See health check function for details. | |||
func (s *EpochJoinAndLeaveVNSuite) TestEpochJoinAndLeaveVN() { | |||
unittest.SkipUnless(s.T(), unittest.TEST_TODO, "flaky") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unittest.SkipUnless(s.T(), unittest.TEST_TODO, "flaky") |
can remove after merging with master :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unquarantined! Thanks for the fix!
# push: | ||
# paths: | ||
# - '.github/workflows/flaky-test-monitor.yml' | ||
# - '.github/workflows/ci.yml' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be un-commented or removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Flaky Test Monitor CI workflow needs to have a major overhaul now that the main CI workflow uses custom runners and Docker caching. I will address that in a separate PR.
…ow/flow-go into misha/6894-buildjet-ci-test
[CI][Testing] Add BuildJet CI runners for unit tests, integration tests; cache Docker images
This PR encompasses many features / improvements to CI and flaky tests in order to
Major Features
Docker Build
) on a powerful BuildJet runner and all the integration test jobs load the Docker images from cache - this eliminates the need for each integration test job to build Docker images; this job takes about 5 minutes to build and cache all Docker imagesCustom CI Runners
Unit Tests Improvements
engine/execution:buildjet-4vcpu-ubuntu-2204
,network/test:buildjet-16vcpu-ubuntu-2204
)engine
job and all it's sub-packages are good examples of how this works:engine/access engine/collection engine/common engine/consensus engine/execution/ingestion:buildjet-8vcpu-ubuntu-2204 engine/execution/computation engine/execution engine/verification engine:buildjet-4vcpu-ubuntu-2204
state
andstorage
packages out ofothers
into separate jobs sinceothers
job was becoming too bigUnit Tests Splits (28 CI jobs)
engine
unit tests into multiple smaller jobs (some with custom runners) because the singleengine
job was very large and flaky:engine/access engine/collection engine/common engine/consensus engine/execution/ingestion:buildjet-8vcpu-ubuntu-2204 engine/execution/computation engine/execution engine/verification engine:buildjet-4vcpu-ubuntu-2204
network
tests further split out intonetwork/alsp network/test/cohort1:buildjet-16vcpu-ubuntu-2204 network/test/cohort2:buildjet-4vcpu-ubuntu-2204 network/p2p/connection network/p2p/p2pnode:buildjet-4vcpu-ubuntu-2204 network/p2p/scoring network/p2p network
module
job split up intomodule/dkg
,module
Integration Tests (15 CI jobs)
access
integration tests (integration/tests/access/access_api_test.go
) into smaller tests because the job was very large and flakyQuarantined Flaky Tests
The following tests were quarantined because they kept failing / being flaky even with increased CI runners:
- waiting for [Flaky Test] Modifies block rate in VN test to address sealing lagging finalization #4975 to be merged so can unquarantine this testTestEpochJoinAndLeaveVN
(integration/tests/epochs/cohort2/epoch_join_and_leave_vn_test.go
)TestSealingAndVerificationPassThrough
(integration/tests/bft/framework/passthrough_test.go
)TestUnicastRateLimit_Messages
(network/test/cohort1/network_test.go
)Unquarantined Tests
The following tests were removed from quarantine because now we can run them on larger runners and they don't fail in CI. They were running without problems locally but used to fail in CI:
TestMeshNetTestSuite
(network/test/cohort1/meshengine_test.go
)TestEchoEngineTestSuite
(network/test/cohort2/echoengine_test.go
)network/p2p/p2pnode/resourceManager_test.go
increased to full load test - previously this test had to operate at about 10% load capacity in order to pass in CI but now it's running on a larger runner and can run at full loadOther
localnet-test
CI job as it's not relevant anymore and it was taking a long time to runref: https://github.com/dapperlabs/flow-go/issues/6894