Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test for multithreaded execution #72

Merged
merged 1 commit into from
Dec 3, 2015
Merged

Test for multithreaded execution #72

merged 1 commit into from
Dec 3, 2015

Conversation

jacquelinekay
Copy link
Contributor

This PR tests MultiThreadedExecutor spin and spin_some in 4 test cases:

  1. Create 1 publisher and 2*N subscribers where N is the hardware concurrency of the machine, in an attempt to saturate the execution with work that must be done.
  2. Same as above, but with intraprocess turned on.
  3. Create 1 service and 2*N clients.
  4. Create 1 publisher, 1 subscriber, and N timers that all attempt to publish a message on the publisher at the same time.

Connects to ros2/ros2#92

@jacquelinekay jacquelinekay added the in progress Actively being worked on (Kanban column) label Nov 12, 2015
@jacquelinekay jacquelinekay self-assigned this Nov 12, 2015
#endif

static inline void multi_consumer_pub_sub_test(bool intra_process) {
rclcpp::init(0, nullptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This calls init twice which currently is not a problem but might be as soon as the function does some real initialization work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't have a custom main which calls init before starting the tests. Where does init get called twice?

Also, I call shutdown in all of the test cases, generally to stop the while loop in spin. Do I have to call init at the beginning of each test case since it is likely that the previous test case called shutdown?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both tests multi_consumer_single_producer and multi_consumer_intraprocess are calling this function and therefore init.

Currently we don't have a function symmetric to init. So I can't recommend anything but calling it only once in the main.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't shutdown be made symmetric to init?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be made symmetric. I was just referring to that it currently is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (calling init, then shutdown, then init) actually is giving me a problem when I run all the test cases at once. I believe ros2/rclcpp#152 fixes the problem.

@jacquelinekay jacquelinekay added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Nov 13, 2015
ament_add_gtest(
gtest_multithreaded__${middleware_impl}
"test/test_multithreaded.cpp"
TIMEOUT 30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long is the expected runtime? Could the timeout be shorter?

@jacquelinekay jacquelinekay added in progress Actively being worked on (Kanban column) and removed in review Waiting for review (Kanban column) labels Nov 16, 2015
@jacquelinekay
Copy link
Contributor Author

I'm moving this back to In Progress to work on a segfault I've discovered in this test.

@jacquelinekay
Copy link
Contributor Author

With the fix @gerkey pushed yesterday, I ran this test on repeat and got a segfault in the multi_access_publisher_intra_process case after 245 iterations. Previously it would take 1-10 iterations to get a segfault on my machine. I also no longer see deadlock.

I will investigate the segfault and try to reproduce it. The core dump appears to have a corrupt stack and the code wasn't built in Debug mode.

@dirk-thomas
Copy link
Member

For better debugging I would also recommend to consider using a debug build of the rmw impl. you are using.

@gerkey
Copy link
Member

gerkey commented Dec 3, 2015

@jacquelinekay, are you testing with the changes in ros2/rclcpp#165?

@jacquelinekay
Copy link
Contributor Author

With the changes made in ros2/rclcpp#165, I have run the test for over 400 iterations without seeing a segfault or deadlock.

@jacquelinekay
Copy link
Contributor Author

I've rebased this branch intra_process_lock since that set of branches is required to make this test pass reliably.

@gerkey
Copy link
Member

gerkey commented Dec 3, 2015

+1 (with ros2/rclcpp#165 required for reliable execution)

This test helped us find some nasty races. Good stuff.

@gerkey gerkey mentioned this pull request Dec 3, 2015
@@ -12,6 +12,9 @@ endif()

find_package(ament_cmake REQUIRED)

#set (CMAKE_CXX_COMPILER "clang++")
#set (CMAKE_LINKER "llvm-ld")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed before merging.

@jacquelinekay jacquelinekay added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Dec 3, 2015
jacquelinekay added a commit that referenced this pull request Dec 3, 2015
Test for multithreaded execution
@jacquelinekay jacquelinekay merged commit d585bff into master Dec 3, 2015
@jacquelinekay jacquelinekay deleted the multithreaded branch December 3, 2015 18:06
@dirk-thomas
Copy link
Member

This PR seem to be the reason that the master build on Windows is failing now: http://ci.ros2.org/job/ci_windows_opensplice/733/

Why did the previous CI jobs pass? Did you change further stuff after that?

@dirk-thomas
Copy link
Member

With the PR ros2/rclcpp#167 the code builds at least. But some tests fail: http://ci.ros2.org/job/ci_windows_opensplice/734/testReport/

@jacquelinekay
Copy link
Contributor Author

Those test cases pass reliably on my Linux machine. I suspect the failures could be due to quirks in the Windows thread scheduler.

@dirk-thomas
Copy link
Member

While running these tests I also saw the following output. The test passed but the values for the first iterations look pretty wrong. Shouldn't this fail the test somehow?

[ RUN      ] test_multithreaded__rmw_connext_dynamic_cpp.multi_consumer_intra_process
RTI Data Distribution Service Evaluation License issued to OSRF (OSRF01) dthomas@osrfoundation.org For non-production use only.
Expires on 08-dec-2015 See www.rti.com for more information.
callback()    1 with message data 3223089
callback()    2 with message data 3223089
callback()    3 with message data 3223089
callback()    4 with message data 3223089
callback()    1 with message data 1
callback()    2 with message data 1
callback()    3 with message data 1
callback()    4 with message data 1
callback()    5 with message data 2
callback()    6 with message data 2
callback()    7 with message data 2
callback()    8 with message data 2
callback()    9 with message data 3
callback()   10 with message data 3
callback()   11 with message data 3
callback()   12 with message data 3
callback()   13 with message data 4
callback()   14 with message data 4
callback()   15 with message data 4
callback()   16 with message data 4
callback()   17 with message data 5
callback()   18 with message data 5
callback()   19 with message data 5
callback()   20 with message data 5
[       OK ] test_multithreaded__rmw_connext_dynamic_cpp.multi_consumer_intra_process (3875 ms)

@jacquelinekay
Copy link
Contributor Author

The tests actually never check the data, they only check the number of times the callback was called (store in atomic uint called counter).

I also never see that error on my machine. Which platform are you running on? And is it every time you run the test case or does it happen randomly?

@dirk-thomas
Copy link
Member

The snippet is from http://ci.ros2.org/job/ci_osx/541/consoleFull at time 33:57.272.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in review Waiting for review (Kanban column)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants