Skip to content
This repository has been archived by the owner on Feb 4, 2021. It is now read-only.

Tracking flaky tests #41

Closed
mikaelarguedas opened this issue Jul 21, 2017 · 8 comments
Closed

Tracking flaky tests #41

mikaelarguedas opened this issue Jul 21, 2017 · 8 comments
Labels
enhancement New feature or request

Comments

@mikaelarguedas
Copy link
Member

To avoid reducing the SNR I'll open a meta ticket for this.

TL;DR we need a good way to track flaky tests over time.
Long term solution: Improve the testing logic to run every single test many times whether or not another test of the same package failed. This is captured in ros2/ci#19, we could also find a way to label them as flaky on a per build basis, @nuclearsandwich mentionned that he did that in other test frameworks in the past and we could look into doing it on our test suites

Short term solution: Aggregate failures on repeated nightlies by poking the jenkins API. Minimal script available here

Result of the script ran today on all jobs since beta2 release:

$ python3 ./aggregate_flaky_tests.py -n 21 --skip
 - [ ] AllocatorTest__rmw_connext_cpp.allocator_unique_ptr, ['nightly_linux_repeated']
 - [ ] TestGetNodeNames__rmw_connext_cpp.test_rcl_get_node_names, ['nightly_osx_repeated']
 - [ ] TestGetNodeNames__rmw_fastrtps_cpp.test_rcl_get_node_names, ['nightly_linux-aarch64_repeated', 'nightly_osx_repeated']
 - [ ] TestServiceFixture__rmw_connext_cpp.test_service_nominal, ['nightly_osx_repeated']
 - [ ] TestServiceFixture__rmw_fastrtps_cpp.test_service_nominal, ['nightly_osx_repeated']
 - [ ] WaitSetTestFixture__rmw_connext_cpp.zero_timeout, ['nightly_win_rep']
 - [ ] WaitSetTestFixture__rmw_fastrtps_cpp.finite_timeout, ['nightly_win_rep']
 - [ ] projectroot.gtest_executor__rmw_connext_cpp, ['nightly_linux_repeated']
 - [ ] projectroot.gtest_executor__rmw_fastrtps_cpp, ['nightly_osx_repeated']
 - [ ] projectroot.gtest_local_parameters__rmw_connext_cpp, ['nightly_linux_repeated', 'nightly_win_rep']
 - [ ] projectroot.gtest_local_parameters__rmw_fastrtps_cpp, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated']
 - [ ] projectroot.gtest_multiple_service_calls__rmw_connext_cpp, ['nightly_linux_repeated']
 - [ ] projectroot.gtest_multithreaded__rmw_connext_cpp, ['nightly_linux_repeated', 'nightly_osx_repeated', 'nightly_win_rep']
 - [ ] projectroot.gtest_multithreaded__rmw_fastrtps_cpp, ['nightly_linux_repeated', 'nightly_osx_repeated']
 - [ ] projectroot.gtest_services_in_constructor__rmw_connext_cpp, ['nightly_osx_repeated']
 - [ ] projectroot.gtest_services_in_constructor__rmw_fastrtps_cpp, ['nightly_osx_repeated']
 - [ ] projectroot.gtest_timeout_subscriber__rmw_connext_cpp, ['nightly_win_rep']
 - [ ] projectroot.rclpytests, ['nightly_osx_repeated']
 - [ ] projectroot.rosidl_generator_py.test_publisher_subscriber__Nested__rclpy__rmw_fastrtps_cpp, ['nightly_linux_repeated']
 - [ ] projectroot.rosidl_generator_py.test_requester_replier__Empty__rclpy__rmw_fastrtps_cpp, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_osx_repeated', 'nightly_win_rep']
 - [ ] projectroot.rosidl_generator_py.test_requester_replier__Primitives__rclpy__rmw_fastrtps_cpp, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_osx_repeated', 'nightly_win_rep']
 - [ ] projectroot.test_client_scope_consistency_cpp__rmw_connext_cpp, ['nightly_linux_repeated', 'nightly_osx_repeated', 'nightly_win_rep']
 - [ ] projectroot.test_client_scope_consistency_cpp__rmw_fastrtps_cpp, ['nightly_linux_repeated', 'nightly_osx_repeated']
 - [ ] projectroot.test_client_scope_cpp__rmw_connext_cpp, ['nightly_linux_repeated', 'nightly_osx_repeated']
 - [ ] projectroot.test_client_scope_cpp__rmw_fastrtps_cpp, ['nightly_osx_repeated']
 - [ ] projectroot.test_composition__rmw_connext_cpp, ['nightly_linux_repeated', 'nightly_osx_repeated']
 - [ ] projectroot.test_composition__rmw_fastrtps_cpp, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_osx_repeated', 'nightly_win_rep']
 - [ ] projectroot.test_demo_cyclic_pipeline__rmw_connext_cpp, ['nightly_linux_repeated', 'nightly_osx_repeated']
 - [ ] projectroot.test_demo_cyclic_pipeline__rmw_fastrtps_cpp, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_osx_repeated']
 - [ ] projectroot.test_externally_defined_services, ['nightly_osx_repeated']
 - [ ] projectroot.test_find_weak_nodes, ['nightly_osx_repeated']
 - [ ] projectroot.test_get_node_names__rmw_connext_cpp, ['nightly_osx_repeated']
 - [ ] projectroot.test_get_node_names__rmw_fastrtps_cpp, ['nightly_linux-aarch64_repeated', 'nightly_osx_repeated']
 - [ ] projectroot.test_graph__rmw_fastrtps_cpp, ['nightly_linux_repeated', 'nightly_osx_repeated', 'nightly_win_rep']
 - [ ] projectroot.test_node, ['nightly_linux-aarch64_repeated']
 - [ ] projectroot.test_pendulum__rmw_connext_cpp.test_pendulum__rmw_connext_cpp, ['nightly_linux_repeated']
 - [ ] projectroot.test_pendulum_teleop__rmw_connext_cpp, ['nightly_linux_repeated']
 - [ ] projectroot.test_secure_publisher_subscriber__DynamicArrayNested__rmw_fastrtps_cpp__secure_comm_0, ['nightly_linux_repeated', 'nightly_osx_repeated', 'nightly_win_rep']
 - [ ] projectroot.test_secure_publisher_subscriber__DynamicArrayNested__rmw_fastrtps_cpp__secure_comm_1, ['nightly_win_rep']
 - [ ] projectroot.test_secure_publisher_subscriber__DynamicArrayNested__rmw_fastrtps_cpp__secure_comm_2, ['nightly_win_rep']
 - [ ] projectroot.test_secure_publisher_subscriber__DynamicArrayNested__rmw_fastrtps_cpp__secure_comm_3, ['nightly_win_rep']
 - [ ] projectroot.test_secure_publisher_subscriber__Empty__rmw_fastrtps_cpp__secure_comm_0, ['nightly_win_rep']
 - [ ] projectroot.test_secure_publisher_subscriber__Empty__rmw_fastrtps_cpp__secure_comm_2, ['nightly_win_rep']
 - [ ] projectroot.test_service__rmw_connext_cpp, ['nightly_osx_repeated']
 - [ ] projectroot.test_service__rmw_fastrtps_cpp, ['nightly_osx_repeated']
 - [ ] projectroot.test_services__rmw_connext_cpp, ['nightly_win_rep']
 - [ ] projectroot.test_services__rmw_fastrtps_cpp, ['nightly_linux_repeated']
 - [ ] projectroot.test_services_cpp__rmw_fastrtps_cpp, ['nightly_linux_repeated']
 - [ ] projectroot.test_showimage_cam2image__rmw_fastrtps_cpp, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_win_rep']
 - [ ] projectroot.test_tlsf__rmw_connext_cpp, ['nightly_linux_repeated']
 - [ ] projectroot.test_tutorial_add_two_ints_server_add_two_ints_client__rmw_connext_cpp, ['nightly_osx_repeated']
 - [ ] projectroot.test_tutorial_add_two_ints_server_add_two_ints_client__rmw_fastrtps_cpp, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_win_rep']
 - [ ] projectroot.test_tutorial_add_two_ints_server_add_two_ints_client_async__rmw_fastrtps_cpp, ['nightly_linux_repeated', 'nightly_osx_repeated', 'nightly_win_rep']
 - [ ] projectroot.test_tutorial_list_parameters__rmw_connext_cpp, ['nightly_linux_repeated']
 - [ ] projectroot.test_tutorial_list_parameters__rmw_fastrtps_cpp, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_osx_repeated']
 - [ ] projectroot.test_tutorial_list_parameters_async__rmw_fastrtps_cpp, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_osx_repeated']
 - [ ] projectroot.test_tutorial_parameter_events__rmw_connext_cpp, ['nightly_osx_repeated']
 - [ ] projectroot.test_tutorial_parameter_events__rmw_fastrtps_cpp, ['nightly_osx_repeated']
 - [ ] projectroot.test_tutorial_parameter_events_async__rmw_fastrtps_cpp, ['nightly_linux_repeated']
 - [ ] projectroot.test_tutorial_set_and_get_parameters__rmw_connext_cpp, ['nightly_osx_repeated']
 - [ ] projectroot.test_tutorial_set_and_get_parameters__rmw_fastrtps_cpp, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_osx_repeated']
 - [ ] projectroot.test_tutorial_set_and_get_parameters_async__rmw_fastrtps_cpp, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_osx_repeated']
 - [ ] projectroot.test_wait__rmw_connext_cpp, ['nightly_win_rep']
 - [ ] projectroot.test_wait__rmw_fastrtps_cpp, ['nightly_win_rep']
 - [ ] test.test_timer.test_timer_zero_callbacks1000hertz, ['nightly_osx_repeated']
 - [ ] test_add_two_ints_server_add_two_ints_client__rmw_connext_cpp_.test_executable, ['nightly_osx_repeated']
 - [ ] test_add_two_ints_server_add_two_ints_client__rmw_fastrtps_cpp_.test_executable, ['nightly_linux-aarch64_repeated']
 - [ ] test_add_two_ints_server_add_two_ints_client__rmw_fastrtps_cpp_Release.test_executable, ['nightly_win_rep']
 - [ ] test_add_two_ints_server_add_two_ints_client_async__rmw_fastrtps_cpp_.test_executable, ['nightly_linux_repeated', 'nightly_osx_repeated']
 - [ ] test_add_two_ints_server_add_two_ints_client_async__rmw_fastrtps_cpp_Release.test_executable, ['nightly_win_rep']
 - [ ] test_client_scope_consistency_cpp__rmw_fastrtps_cpp_.test_client_scope_consistency_cpp, ['nightly_linux_repeated', 'nightly_osx_repeated']
 - [ ] test_client_scope_cpp__rmw_fastrtps_cpp_.test_client_scope_cpp, ['nightly_osx_repeated']
 - [ ] test_composition__rmw_connext_cpp_.test_dlopen_composition, ['nightly_osx_repeated']
 - [ ] test_composition__rmw_connext_cpp_.test_linktime_composition, ['nightly_osx_repeated']
 - [ ] test_composition__rmw_fastrtps_cpp_.test_dlopen_composition, ['nightly_osx_repeated']
 - [ ] test_composition__rmw_fastrtps_cpp_Release.test_api_srv_composition, ['nightly_win_rep']
 - [ ] test_multithreaded__rmw_connext_cpp.multi_access_publisher_intra_process, ['nightly_linux_repeated', 'nightly_osx_repeated', 'nightly_win_rep']
 - [ ] test_multithreaded__rmw_fastrtps_cpp.multi_consumer_intra_process, ['nightly_linux_repeated', 'nightly_osx_repeated']
 - [ ] test_pendulum__rmw_connext_cpp.test_executable, ['nightly_linux_repeated']
 - [ ] test_publisher_subscriber__Nested__rclpy__rmw_fastrtps_cpp_.test_publisher_subscriber, ['nightly_linux_repeated']
 - [ ] test_requester_replier__Empty__rclpy__rmw_fastrtps_cpp_.test_requester_replier, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_osx_repeated']
 - [ ] test_requester_replier__Empty__rclpy__rmw_fastrtps_cpp_Release.test_requester_replier, ['nightly_win_rep']
 - [ ] test_requester_replier__Primitives__rclpy__rmw_fastrtps_cpp_.test_requester_replier, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_osx_repeated']
 - [ ] test_requester_replier__Primitives__rclpy__rmw_fastrtps_cpp_Release.test_requester_replier, ['nightly_win_rep']
 - [ ] test_secure_publisher_subscriber__DynamicArrayNested__rmw_fastrtps_cpp__secure_comm_0_.test_secure_publisher_subscriber, ['nightly_linux_repeated', 'nightly_osx_repeated']
 - [ ] test_secure_publisher_subscriber__DynamicArrayNested__rmw_fastrtps_cpp__secure_comm_0_Release.test_secure_publisher_subscriber, ['nightly_win_rep']
 - [ ] test_secure_publisher_subscriber__DynamicArrayNested__rmw_fastrtps_cpp__secure_comm_1_Release.test_secure_publisher_subscriber, ['nightly_win_rep']
 - [ ] test_secure_publisher_subscriber__DynamicArrayNested__rmw_fastrtps_cpp__secure_comm_2_Release.test_secure_publisher_subscriber, ['nightly_win_rep']
 - [ ] test_secure_publisher_subscriber__DynamicArrayNested__rmw_fastrtps_cpp__secure_comm_3_Release.test_secure_publisher_subscriber, ['nightly_win_rep']
 - [ ] test_secure_publisher_subscriber__Empty__rmw_fastrtps_cpp__secure_comm_0_Release.test_secure_publisher_subscriber, ['nightly_win_rep']
 - [ ] test_secure_publisher_subscriber__Empty__rmw_fastrtps_cpp__secure_comm_2_Release.test_secure_publisher_subscriber, ['nightly_win_rep']
 - [ ] test_services__rmw_connext_cpp_Release.test_services, ['nightly_win_rep']
 - [ ] test_services_cpp__rmw_fastrtps_cpp_.test_services_cpp, ['nightly_linux_repeated']
 - [ ] test_showimage_cam2image__rmw_fastrtps_cpp_.test_reliable_qos, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated']
 - [ ] test_showimage_cam2image__rmw_fastrtps_cpp_Release.test_reliable_qos, ['nightly_win_rep']
 - [ ] test_timeout_subscriber__rmw_connext_cpp.timeout_subscriber, ['nightly_win_rep']
@nuclearsandwich
Copy link
Member

@nuclearsandwich mentioned that he did that in other test frameworks in the past

Test frameworks that I've used all had the ability to add tags or labels to individual test cases. Use of these tags ranges from skipping tests that require network availability or other hard resources to isolating platform specific tests to tracking flaky tests.

I've only looked at gtest, not any of the other test suites we use. gtest itself only has one mechanism for selecting which tests to run and it's based on the test name. Which means we'd have to get ugly with test names like FLAKY_WINONLY_ActualTestName and then invoke gtest with a filter argument.

Even this doesn't buy us everything we would want. A moderately sophisticated use would let us run flaky tests but not fail the build if the test fails and the most sophisticated use would have us run the tests and if they start failing or succeeding consistently, that would raise an error or open a GitHub issue.

In order to get something like that across our myriad testing providers I'd expect we would need to process xunit or tap output ourselves.

@dhood
Copy link
Member

dhood commented Aug 4, 2017

Summary of flaky test categories.

Flaky tests requiring missing features

rmw_get_node_names forgetting dead nodes (ros2/ros2#371):

  • projectroot.test_get_node_names__rmw_connext_cpp, ['nightly_osx_repeated']
  • projectroot.test_get_node_names__rmw_fastrtps_cpp, ['nightly_linux-aarch64_repeated', 'nightly_osx_repeated'] example

Python wait_for_service (ros2/rclpy#58, ros2/rclpy#161, ros2/system_tests#244):

  • test_requester_replier__Empty__rclpy__rmw_fastrtps_cpp_.test_requester_replier, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_osx_repeated']
  • test_requester_replier__Empty__rclpy__rmw_fastrtps_cpp_Release.test_requester_replier, ['nightly_win_rep']
  • test_requester_replier__Primitives__rclpy__rmw_fastrtps_cpp_.test_requester_replier, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_osx_repeated']
  • test_requester_replier__Primitives__rclpy__rmw_fastrtps_cpp_Release.test_requester_replier, ['nightly_win_rep']

Unexpected test output prevent launch_testing from matching

assignee @dhood eProsima/Fast-DDS#128, ros2/demos#126

  • test_showimage_cam2image__rmw_fastrtps_cpp_.test_reliable_qos, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated'] example: Test should pass, regex fails due to fastrtps error log

  • projectroot.test_tutorial_add_two_ints_server_add_two_ints_client_async__rmw_fastrtps_cpp, ['nightly_linux_repeated', 'nightly_osx_repeated', 'nightly_win_rep'] example: Test should pass, regex fails due to fastrtps error log

  • projectroot.test_tutorial_add_two_ints_server_add_two_ints_client__rmw_fastrtps_cpp, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_win_rep'] example: Test should pass, regex fails due to fastrtps error log

  • test_pendulum__rmw_connext_cpp PRESWriterHistoryDriver_completeBeAsynchPub:!make_sample_reclaimable Small depth of pendulum demo causes Connext output demos#126

  • projectroot.test_tutorial_add_two_ints_server_add_two_ints_client__rmw_connext_cpp, ['nightly_osx_repeated'] example: PRESPsService_assertFilteredwrrRecord:pres filtered writer remote reader already created asked for details on forum

Test output not being captured correctly

  • test_composition__rmw_fastrtps_cpp_Release.test_api_srv_composition, ['nightly_win_rep'] example: No output from test_api_composition process captured but they were loaded successfully
  • test_composition__rmw_fastrtps_cpp_.test_manual_composition, ['nightly_osx_repeated'] example: Publisher and client appear to be alive but don't output anything

Shutdown not being triggered/responded to correctly

assignee @dhood ros2/rmw_implementation#25

  • projectroot.test_tutorial_list_parameters_async__rmw_fastrtps_cpp, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_osx_repeated']
  • projectroot.test_tutorial_parameter_events__rmw_fastrtps_cpp, ['nightly_osx_repeated'] example: All output looks to be the same as the passing version of the test, but it doesn't shutdown.
  • projectroot.test_tutorial_list_parameters__rmw_fastrtps_cpp, ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_osx_repeated'] example

Performance-related

  • WaitSetTestFixture__rmw_connext_cpp.zero_timeout, ['nightly_win_rep'] example windows only
  • WaitSetTestFixture__rmw_fastrtps_cpp.finite_timeout, ['nightly_win_rep'] example windows only, portable and eatable (at least)
  • test.test_timer.test_timer_zero_callbacks1000hertz, ['nightly_osx_repeated'] example: waits for 0.0005s for a timer with period 0.001s, checking it doesn't receive anything, but it does. Underneath it must have taken longer than expected to wake up. osx only, mini2 and dosa (at least)

Maybe performance-related

assignee @dirk-thomas ros2/rclcpp#355, ros2/system_tests#217

  • test_multithreaded__rmw_connext_cpp.multi_access_publisher_intra_process, ['nightly_linux_repeated', 'nightly_osx_repeated', 'nightly_win_rep'] example
  • test_multithreaded__rmw_fastrtps_cpp.multi_consumer_intra_process, ['nightly_linux_repeated', 'nightly_osx_repeated'] example

Looks like an actual bug

assignee @clalancette ros2/ros2#387

  • projectroot.test_find_weak_nodes, ['nightly_osx_repeated'] example: mutex lock failed
  • projectroot.gtest_executor__rmw_fastrtps_cpp, ['nightly_osx_repeated'] example: mutex lock failed
  • projectroot.test_graph__rmw_fastrtps_cpp, ['nightly_linux_repeated', 'nightly_osx_repeated', 'nightly_win_rep'] example1 osx: mutex lock failed; example 2 osx: pointer being freed was not allocated; example linux: double free; example windows
  • projectroot.test_node, ['nightly_linux-aarch64_repeated', 'nightly_osx_repeated'] example osx: mutex lock failed; example arm: guard condition handle not from this implementation
  • test_services__rmw_connext_cpp_Release.test_services, ['nightly_win_rep'] example: Error in take_request: error not set @mikaelarguedas

Service not available

  • test_services_cpp__rmw_fastrtps_cpp_.test_services_cpp, ['nightly_linux_repeated'] example
  • test_client_scope_consistency_cpp__rmw_fastrtps_cpp_.test_client_scope_consistency_cpp, ['nightly_linux_repeated', 'nightly_osx_repeated'] example

Parameter-related

assignee @dhood ros2/rclcpp#356, ros2/rcl_interfaces#18, ros2/rmw_fastrtps#143, ros2/rmw_fastrtps#142

  • projectroot.test_tutorial_list_parameters__rmw_fastrtps_cpp, + async version + connext version ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_osx_repeated']
  • projectroot.test_tutorial_parameter_events__rmw_fastrtps_cpp, + async version + connext version ['nightly_osx_repeated']
  • projectroot.test_tutorial_set_and_get_parameters__rmw_fastrtps_cpp, + async version + connext version ['nightly_linux-aarch64_repeated', 'nightly_linux_repeated', 'nightly_osx_repeated']

Errors during shutdown

assignee @dhood ros2/system_tests#220

No connection

  • test_secure_publisher_subscriber__DynamicArrayNested__rmw_fastrtps_cpp__secure_comm_0_Release.test_secure_publisher_subscriber, ['nightly_win_rep'] example: No output from subscriber, at least timer callback prints should be there

@mikaelarguedas
Copy link
Member Author

mikaelarguedas commented Dec 3, 2017

Update:
Now that a version of python wait for service has been merged (ros2/rclpy#161, ros2/system_tests#244). All the python service test pass 🎉

Which makes all test_communications passing the nightlies: http://ci.ros2.org/view/nightly/job/nightly_win_rep/898/#showFailuresLink

One think to keep in mind is that now the nightlies will take much longer as test_communication tests will be repeated 20 times. e.g. last night on windows they took ~3h40min for a total job time of ~7h45min

@clalancette
Copy link

@mikaelarguedas @ros2/team I'm thinking about closing this old list of flaky tests. We definitely have new flakiness, but it is tracked elsewhere. Any objections?

@mikaelarguedas
Copy link
Member Author

@clalancette The goal to close this in favor of individual tickets for all the tests listed here that are still flaky, is that correct?

@clalancette
Copy link

My main motivation is that this list of flaky tests is pretty stale now. Some of them are still around (get_node_names is one of them), but a lot of them we haven't seen in a while. Since that is the case, I think that it would make more sense to just split out the remaining ones that we still see, and then close this out. I'll open a couple of individual issues for the ones I know are definitely still around.

@mikaelarguedas
Copy link
Member Author

Sounds good, as long as we have a way to track them and not forget that works for me 👍

@clalancette
Copy link

I think I now have issues covering all of the problems that we still had lingering from this list. In particular:

The rest don't appear to be happening anymore. I'm going to close this out in favor of those split out tickets; feel free to comment here or add new tickets if I've missed anything.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants