-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Parallelize Tests #197
Conversation
This will use all available cores to run tests, and parallelize by all - suite/class/method. On machines with multiple cores, this will vastly improve test performance. These times are on my M1 MBP with 10 (8P + 2E) cores. They were reported by maven on running `mvn verify`. I first ran `mvn verify` and ignored the time. Then I ran it thrice without this change, and thrice with this change. All times in seconds. | | Run 1 | Run 2 | Run 3 | Average | | ------ | ----: | ----: | ----: | ------: | | Before | 876 | 873 | 878 | 876 | | After | 300 | 300 | 301 | 300 | | Savings| | | | 576 | | % | | | | 66 |
I would love to merge this PR but the tests are failing when building in my Quad cores |
Thanks for checking @allancth. One of the benefits of parallel tests it that they sometimes expose undetected environment assumptions or execution order dependencies. I don't see test failures for the pull request on any of the machines where I tested the pull request, including:
Do the tests fail consistently for you or intermittently? Anything particular about the machine that is different from the configurations that I was running? |
Thanks @MarkEWaite for the response. I am still running the tests, making sure I am checking out the correct repo, tinkering with this and that. The laptop that is running the tests has a Quad core i7 Ivy Bridge chipset running Ubuntu 22.04 + 16GB RAM. One thing that I think might have an effect on this PR is that I have disabled the The number of tests that is failing is inconsistent, but so do the commands I am running. I think I will start the investigation of the warning "The POM for org.jenkins-ci.tools:maven-hpi-plugin:jar:3.47 is missing, no dependency information available" when I did a mvn verify or a mvn clean test. Maybe if you know the reason or if this warning could be the cause of the failing tests? CORRECTION: Not HyperThreading but Intel's SpeedStep technology. |
@MarkEWaite I've enabled SpeedStep in BIOS and reran the test. In one instance, there are 2 failures when running I think I can conclude the speed of the CPU as in my case here, affects the outcome of the test. It takes about ~30 mins (last test took ~26 mins) for each Strange thing is even after I've commented out the change (surefire-plugin), the test still fail. I need to test more though. EDIT: I have deleted the .m2 repo for jenkinsci and jenkins-ci and rebuilding the module. This might take a while. Do not let me stop this PR from being merged since it has no issue on other tested machines. It is very likely the problem is all at my end here. |
I think that it would be unwise for this to be merged when tests are failing for you as the maintainer. The tests are most valuable for you as the maintainer. Others gain some small benefit from the existence of the tests, but you need them to be confident that any changes you make or any changes that you accept are in good condition. Could you share the names of the tests that are failing and the test failure messages in case others can investigate? It may be possible to better understand the failure modes by reading the source code. The Jenkins test harness was recently enhanced to fail tests if they have not correctly ended jobs at the end of the test. That type of failure can depend on processor performance, memory performance, and disc speed. I'll be out of the office until the end of the week, so I won't be able to provide more help until I return, but others may be able to explore further. |
The result of
2 tests have error
I've uploaded the build log here. I am still running the tests to make sure I am getting consistent outcome. UPDATE: Running |
One of the failures is "180 second test timeout exceeded". That means one
of the tests running in parallel on the machine was unable to complete in
the 3 minutes that are usually allowed for tests. I think that means the
machine running the tests will need to run fewer tests in parallel. I
assume that may be due to insufficient memory or slow disc I/O.
Either slow disc I/O or low memory are both valid and reasonable conditions
for a plugin maintainer. Maintainers have different machines that they can
use. No harm in that.
I think that may indicate this pull request should not be merged but we
should consider alternate ways of allowing parallel execution of tests on
ci.jenkins.io while not requiring parallel execution of tests in all
configurations. I think you should not merge this pull request until after
I've returned to the office next week and can investigate further. I'm on
vacation with my family now and won't be able to spend time evaluating
alternatives.
I left my yubikey at home, so I can't easily login to GitHub. ...
…On Wed, Jul 19, 2023 at 1:03 AM allancth ***@***.***> wrote:
The result of mvn verify, 2 tests are failing
- TriggeredBuildSelectorTest#testUseNewest
- TriggeredBuildSelectorTest#testUseOldestNested
2 tests have error
-
LegacyJobConfigMigrationMonitorTest.workflowJob_param_copy_legacy_migration
-
LegacyJobConfigMigrationMonitorTest.workflowJob_param_copy_legacy_production
I've uploaded the build log here
<https://drive.google.com/file/d/17Y0hUqkKUssHaiF7v9QMtNeWoSYjm1ag/view?usp=sharing>.
I am still running the tests to make sure I am getting consistent outcome.
—
Reply to this email directly, view it on GitHub
<#197 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABGIDPFBXEVPP242C2LWWDXQ6BCXANCNFSM6AAAAAA2NCEFDY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I ran
All 3 tests have the error LegacyJobConfigMigrationMonitorTest#workflowJob_param_copy_legacy_migration:577. The blanket solution I think we could have is to have the plugin move under a profile. By the default the profile is set to active and can be deactivated with Meanwhile, I am looking into the surefire-plugin to see if it can specify specific tests or group certain tests to run in parallel, but I think a more thorough understanding of the test structure is required. |
I think there's a workaround for my case. If the property jenkins.test.timeout is 0, jenkins-test-harness will have infinite timeout. I guess I can specify this parameter when I am building the plugin. There's nothing that needs to be changed, I think. |
This will use all available cores to run tests, and parallelize by all - suite/class/method. On machines with multiple cores, this will vastly improve test performance. These times are on my M1 MBP with 10 (8P + 2E) cores. They were reported by maven on running
mvn verify
.I first ran
mvn verify
and ignored the time.Then I ran it thrice without this change, and thrice with this change.
All times in seconds.
Testing done
All existing tests pass
Submitter checklist