chore: Parallelize Tests #197

rahulsom · 2023-07-17T14:55:32Z

This will use all available cores to run tests, and parallelize by all - suite/class/method. On machines with multiple cores, this will vastly improve test performance. These times are on my M1 MBP with 10 (8P + 2E) cores. They were reported by maven on running mvn verify.

I first ran mvn verify and ignored the time.
Then I ran it thrice without this change, and thrice with this change.

All times in seconds.

	Run 1	Run 2	Run 3	Average
Before	876	873	878	876
After	300	300	301	300
Savings				576
%				66

Testing done

All existing tests pass

Submitter checklist

Give feedback

Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
Ensure that the pull request title represents the desired changelog entry
Please describe what you did
Link to relevant issues in GitHub or Jira
Link to relevant pull requests, esp. upstream and downstream changes
Ensure you have provided tests - that demonstrates feature works or fixes the issue
Options

This will use all available cores to run tests, and parallelize by all - suite/class/method. On machines with multiple cores, this will vastly improve test performance. These times are on my M1 MBP with 10 (8P + 2E) cores. They were reported by maven on running `mvn verify`. I first ran `mvn verify` and ignored the time. Then I ran it thrice without this change, and thrice with this change. All times in seconds. | | Run 1 | Run 2 | Run 3 | Average | | ------ | ----: | ----: | ----: | ------: | | Before | 876 | 873 | 878 | 876 | | After | 300 | 300 | 301 | 300 | | Savings| | | | 576 | | % | | | | 66 |

allancth · 2023-07-18T03:11:01Z

I would love to merge this PR but the tests are failing when building in my Quad cores ~~notebook~~ laptop. Not sure why but I am looking into it but I am quite sure it's not introduced in this PR. If other maintainers has no issue with this, please help merge or let me know I can merge it. Thanks.

MarkEWaite · 2023-07-18T03:22:00Z

Thanks for checking @allancth. One of the benefits of parallel tests it that they sometimes expose undetected environment assumptions or execution order dependencies.

I don't see test failures for the pull request on any of the machines where I tested the pull request, including:

4 core Intel i5-2400 running Debian Linux testing with 32 GB RAM
4 core Intel i5-6400 running Ubuntu Linux 22.04 with 32 GB RAM
12 core AMD Ryzen 5 5600X running Red Hat Enterprise Linux 8 with 32 GB RAM
8 core Intel Atom C2750 running FreeBSD 13.1 with 32 GB RAM

Do the tests fail consistently for you or intermittently? Anything particular about the machine that is different from the configurations that I was running?

allancth · 2023-07-18T04:46:02Z

Thanks @MarkEWaite for the response.

I am still running the tests, making sure I am checking out the correct repo, tinkering with this and that. The laptop that is running the tests has a Quad core i7 Ivy Bridge chipset running Ubuntu 22.04 + 16GB RAM.

One thing that I think might have an effect on this PR is that I have disabled the ~~HyperThreading~~ SpeedStep feature in BIOS so that I can cap the CPU to run at 1.2GHz to prevent over heating, which happens easily. Having said that, as I am responding and running the tests in the background, I don't think this is the cause.

The number of tests that is failing is inconsistent, but so do the commands I am running. I think I will start the investigation of the warning "The POM for org.jenkins-ci.tools:maven-hpi-plugin:jar:3.47 is missing, no dependency information available" when I did a mvn verify or a mvn clean test. Maybe if you know the reason or if this warning could be the cause of the failing tests?

CORRECTION: Not HyperThreading but Intel's SpeedStep technology.

allancth · 2023-07-18T07:39:52Z

@MarkEWaite I've enabled SpeedStep in BIOS and reran the test. In one instance, there are 2 failures when running mvn verify. Next, I reran the test with for the particular failed test case and the test passed.

I think I can conclude the speed of the CPU as in my case here, affects the outcome of the test. It takes about ~30 mins (last test took ~26 mins) for each mvn verify. I'm not sure what else I can do.

Strange thing is even after I've commented out the change (surefire-plugin), the test still fail. I need to test more though.

EDIT: I have deleted the .m2 repo for jenkinsci and jenkins-ci and rebuilding the module. This might take a while. Do not let me stop this PR from being merged since it has no issue on other tested machines. It is very likely the problem is all at my end here.

MarkEWaite · 2023-07-18T11:04:21Z

I think that it would be unwise for this to be merged when tests are failing for you as the maintainer. The tests are most valuable for you as the maintainer. Others gain some small benefit from the existence of the tests, but you need them to be confident that any changes you make or any changes that you accept are in good condition.

Could you share the names of the tests that are failing and the test failure messages in case others can investigate? It may be possible to better understand the failure modes by reading the source code. The Jenkins test harness was recently enhanced to fail tests if they have not correctly ended jobs at the end of the test. That type of failure can depend on processor performance, memory performance, and disc speed.

I'll be out of the office until the end of the week, so I won't be able to provide more help until I return, but others may be able to explore further.

allancth · 2023-07-19T07:02:56Z

The result of mvn verify, 2 tests are failing

TriggeredBuildSelectorTest#testUseNewest
TriggeredBuildSelectorTest#testUseOldestNested

2 tests have error

LegacyJobConfigMigrationMonitorTest#workflowJob_param_copy_legacy_migration
LegacyJobConfigMigrationMonitorTest#workflowJob_param_copy_legacy_production

I've uploaded the build log here. I am still running the tests to make sure I am getting consistent outcome.

UPDATE: Running mvn verify with the changes commented out returns success. Build log here. I'll run a few more tests over the next few days to verify the outcome is consistent.

MarkEWaite · 2023-07-19T12:52:05Z

One of the failures is "180 second test timeout exceeded". That means one of the tests running in parallel on the machine was unable to complete in the 3 minutes that are usually allowed for tests. I think that means the machine running the tests will need to run fewer tests in parallel. I assume that may be due to insufficient memory or slow disc I/O. Either slow disc I/O or low memory are both valid and reasonable conditions for a plugin maintainer. Maintainers have different machines that they can use. No harm in that. I think that may indicate this pull request should not be merged but we should consider alternate ways of allowing parallel execution of tests on ci.jenkins.io while not requiring parallel execution of tests in all configurations. I think you should not merge this pull request until after I've returned to the office next week and can investigate further. I'm on vacation with my family now and won't be able to spend time evaluating alternatives. I left my yubikey at home, so I can't easily login to GitHub. ...

…

On Wed, Jul 19, 2023 at 1:03 AM allancth ***@***.***> wrote: The result of mvn verify, 2 tests are failing - TriggeredBuildSelectorTest#testUseNewest - TriggeredBuildSelectorTest#testUseOldestNested 2 tests have error - LegacyJobConfigMigrationMonitorTest.workflowJob_param_copy_legacy_migration - LegacyJobConfigMigrationMonitorTest.workflowJob_param_copy_legacy_production I've uploaded the build log here <https://drive.google.com/file/d/17Y0hUqkKUssHaiF7v9QMtNeWoSYjm1ag/view?usp=sharing>. I am still running the tests to make sure I am getting consistent outcome. — Reply to this email directly, view it on GitHub <#197 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABGIDPFBXEVPP242C2LWWDXQ6BCXANCNFSM6AAAAAA2NCEFDY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

allancth · 2023-07-20T02:01:08Z

I ran mvn verify another 2 times.

Result for second test
- 2 failures, 1 error
- Failures:
  - CopyArtifactTest#testTriggeredBuildSelectorToMatrix:908
  - CopyArtifactTest#testTriggeredBuildSelectorWithParentOfParent:836
- Error:
  - LegacyJobConfigMigrationMonitorTest#workflowJob_param_copy_legacy_migration:577
Result of third test
- 0 failure, 1 error
- Error:
  - LegacyJobConfigMigrationMonitorTest#remove_succeeded:838
  - LegacyJobConfigMigrationMonitorTest#workflowJob_param_copy_legacy_migration:577

All 3 tests have the error LegacyJobConfigMigrationMonitorTest#workflowJob_param_copy_legacy_migration:577. The blanket solution I think we could have is to have the plugin move under a profile. By the default the profile is set to active and can be deactivated with mvn verify -P -parallel.

Meanwhile, I am looking into the surefire-plugin to see if it can specify specific tests or group certain tests to run in parallel, but I think a more thorough understanding of the test structure is required.

allancth · 2023-07-20T07:24:50Z

I think there's a workaround for my case. If the property jenkins.test.timeout is 0, jenkins-test-harness will have infinite timeout. I guess I can specify this parameter when I am building the plugin. There's nothing that needs to be changed, I think.

rahulsom requested a review from a team as a code owner July 17, 2023 14:55

allancth merged commit d689167 into jenkinsci:master Jul 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Parallelize Tests #197

chore: Parallelize Tests #197

rahulsom commented Jul 17, 2023

Submitter checklist

allancth commented Jul 18, 2023 •

edited

Loading

MarkEWaite commented Jul 18, 2023

allancth commented Jul 18, 2023 •

edited

Loading

allancth commented Jul 18, 2023 •

edited

Loading

MarkEWaite commented Jul 18, 2023

allancth commented Jul 19, 2023 •

edited

Loading

MarkEWaite commented Jul 19, 2023 via email

allancth commented Jul 20, 2023

allancth commented Jul 20, 2023

chore: Parallelize Tests #197

chore: Parallelize Tests #197

Conversation

rahulsom commented Jul 17, 2023

Testing done

Submitter checklist

allancth commented Jul 18, 2023 • edited Loading

MarkEWaite commented Jul 18, 2023

allancth commented Jul 18, 2023 • edited Loading

allancth commented Jul 18, 2023 • edited Loading

MarkEWaite commented Jul 18, 2023

allancth commented Jul 19, 2023 • edited Loading

MarkEWaite commented Jul 19, 2023 via email

allancth commented Jul 20, 2023

allancth commented Jul 20, 2023

allancth commented Jul 18, 2023 •

edited

Loading

allancth commented Jul 18, 2023 •

edited

Loading

allancth commented Jul 18, 2023 •

edited

Loading

allancth commented Jul 19, 2023 •

edited

Loading