Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock the condition mutex during setup #143

Closed
wants to merge 1 commit into from

Conversation

dhood
Copy link
Member

@dhood dhood commented Aug 18, 2017

Otherwise one-shot notifications will be missed and we'll wait for them forever if no timeout is in use.

This happens often for parameter tests because list_parameters, for example, gets its response during the setup. It will call notify_one() on the condition variable, but it does it before we're actually waiting on the condition variable.

  • Linux Build Status (console output shows no test failures but jenkins is showing it as unstable)
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status (known flaky tests)

@dhood dhood added the in review Waiting for review (Kanban column) label Aug 18, 2017
@dhood dhood self-assigned this Aug 18, 2017
@dhood
Copy link
Member Author

dhood commented Aug 18, 2017

this looks to have made a connext service test fail 15 times. I'm running this job to see if it still happens when combined with #142 and ros2/system_tests#219: Build Status

@mikaelarguedas
Copy link
Member

Can you point to which job you are referring to for the 15 failures? The failing test is a test across rmw_implementations ?

@dhood
Copy link
Member Author

dhood commented Aug 18, 2017

I'm referring to CI linux #3039

Something weird is going on with the Jenkins display. clinking on the failing test takes you to http://ci.ros2.org/job/ci_linux/3039/testReport/(root)/projectroot/test_client_scope_cpp__rmw_connext_cpp_2/ (note the _2 at the end) but the http://ci.ros2.org/job/ci_linux/3039/testReport/(root)/projectroot/test_client_scope_cpp__rmw_connext_cpp and the console output show there were no failures 😕 maybe it's just my browser.. do you see the linux CI job as green or yellow @mikaelarguedas ?

@mikaelarguedas
Copy link
Member

mikaelarguedas commented Aug 18, 2017

I see the icon as yellow.
Yeah I think there is a bug in the test failure process. The test failed once and was thus marked as failed but the status didnt get updated the second time when it passed.

This text being connext only I don't think this PR is the source of the 1 failure though (unrelated flakyness IMO)

@dhood
Copy link
Member Author

dhood commented Aug 18, 2017

Ok, so that's weird, but either way, I agree that it should be unrelated to this change

Otherwise one-shot notifications will be missed and we'll wait for them forever
@dirk-thomas dirk-thomas force-pushed the lock_condition_mutex_earlier branch from d750107 to a173bfa Compare August 22, 2017 00:27
@dirk-thomas
Copy link
Member

Likely superseded by #147.

@dhood
Copy link
Member Author

dhood commented Aug 22, 2017

I didn't see that the block of code after attaching the condition variable to the listeners is checking if they have triggered. That check should have been all that's necessary to prevent us from going into wait after the condition variable has been notified, but there was a race condition in the client listener that caused its hasData call to be inaccurate. This has been fixed in https://github.com/ros2/rmw_fastrtps/pull/147/files#diff-094a7e569b5045ca9d05bc1251ed8e4eL84 so this PR is not needed

@dhood dhood closed this Aug 22, 2017
@dhood dhood deleted the lock_condition_mutex_earlier branch August 22, 2017 17:19
@dhood dhood removed the in review Waiting for review (Kanban column) label Aug 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants