-
Notifications
You must be signed in to change notification settings - Fork 0
Tracking flaky tests #41
Comments
Test frameworks that I've used all had the ability to add tags or labels to individual test cases. Use of these tags ranges from skipping tests that require network availability or other hard resources to isolating platform specific tests to tracking flaky tests. I've only looked at gtest, not any of the other test suites we use. gtest itself only has one mechanism for selecting which tests to run and it's based on the test name. Which means we'd have to get ugly with test names like FLAKY_WINONLY_ActualTestName and then invoke gtest with a filter argument. Even this doesn't buy us everything we would want. A moderately sophisticated use would let us run flaky tests but not fail the build if the test fails and the most sophisticated use would have us run the tests and if they start failing or succeeding consistently, that would raise an error or open a GitHub issue. In order to get something like that across our myriad testing providers I'd expect we would need to process xunit or tap output ourselves. |
Summary of flaky test categories. Flaky tests requiring missing featuresrmw_get_node_names forgetting dead nodes (ros2/ros2#371):
Python wait_for_service (ros2/rclpy#58, ros2/rclpy#161, ros2/system_tests#244):
Unexpected test output prevent launch_testing from matchingassignee @dhood eProsima/Fast-DDS#128, ros2/demos#126
Test output not being captured correctly
Shutdown not being triggered/responded to correctlyassignee @dhood ros2/rmw_implementation#25
Performance-related
Maybe performance-relatedassignee @dirk-thomas ros2/rclcpp#355, ros2/system_tests#217
Looks like an actual bugassignee @clalancette ros2/ros2#387
Service not available
Parameter-relatedassignee @dhood ros2/rclcpp#356, ros2/rcl_interfaces#18, ros2/rmw_fastrtps#143, ros2/rmw_fastrtps#142
Errors during shutdownassignee @dhood ros2/system_tests#220
No connection
|
Update: Which makes all test_communications passing the nightlies: http://ci.ros2.org/view/nightly/job/nightly_win_rep/898/#showFailuresLink One think to keep in mind is that now the nightlies will take much longer as |
@mikaelarguedas @ros2/team I'm thinking about closing this old list of flaky tests. We definitely have new flakiness, but it is tracked elsewhere. Any objections? |
@clalancette The goal to close this in favor of individual tickets for all the tests listed here that are still flaky, is that correct? |
My main motivation is that this list of flaky tests is pretty stale now. Some of them are still around ( |
Sounds good, as long as we have a way to track them and not forget that works for me 👍 |
I think I now have issues covering all of the problems that we still had lingering from this list. In particular:
The rest don't appear to be happening anymore. I'm going to close this out in favor of those split out tickets; feel free to comment here or add new tickets if I've missed anything. |
To avoid reducing the SNR I'll open a meta ticket for this.
TL;DR we need a good way to track flaky tests over time.
Long term solution: Improve the testing logic to run every single test many times whether or not another test of the same package failed. This is captured in ros2/ci#19, we could also find a way to label them as flaky on a per build basis, @nuclearsandwich mentionned that he did that in other test frameworks in the past and we could look into doing it on our test suites
Short term solution: Aggregate failures on repeated nightlies by poking the jenkins API. Minimal script available here
Result of the script ran today on all jobs since beta2 release:
The text was updated successfully, but these errors were encountered: