Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock views during system PostUpdates #1001

Merged
merged 2 commits into from
Sep 1, 2021
Merged

Conversation

adlarkin
Copy link
Contributor

@adlarkin adlarkin commented Aug 30, 2021

Signed-off-by: Ashton Larkin ashton@openrobotics.org

🦟 Bug fix

Fixes a race condition discovered in #959 (comment)

Summary

As a consequence of the changes in #856, new entities aren't added to a view until the view is used, because we need access to the template signature of the view when adding entities to it. This introduces a potential race condition when processing system PostUpdates since system PostUpdates are run in parallel (see #959 (comment) for a more detailed explanation).

The fix proposed here is to assign a mutex to every view that exists. Then, when processing system PostUpdates, we lock the view while entities are being added to it. Since every view has its own mutex, this means that different views can still be accessed in parallel during PostUpdate calls - the only time blocking may occur is when two system PostUpdates are trying to read from the same view. So, there shouldn't be any major performance hits here.

Checklist

  • Signed all commits for DCO
  • Added tests
  • Updated documentation (as needed)
  • Updated migration guide (as needed)
  • codecheck passed (See contributing)
  • All tests passed (See test coverage)
  • While waiting for a review on your PR, please help review another open pull request to support the maintainers

Note to maintainers: Remember to use Squash-Merge

Signed-off-by: Ashton Larkin <ashton@openrobotics.org>
@adlarkin adlarkin requested review from iche033 and chapulina August 30, 2021 22:28
@github-actions github-actions bot added the 🏯 fortress Ignition Fortress label Aug 30, 2021
Copy link
Contributor Author

@adlarkin adlarkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a few comments/questions for PR reviewers.

@adlarkin adlarkin mentioned this pull request Aug 30, 2021
8 tasks
@codecov
Copy link

codecov bot commented Aug 30, 2021

Codecov Report

Merging #1001 (3954958) into main (628fe9e) will increase coverage by 0.02%.
The diff coverage is 90.90%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1001      +/-   ##
==========================================
+ Coverage   63.77%   63.80%   +0.02%     
==========================================
  Files         246      246              
  Lines       19996    20015      +19     
==========================================
+ Hits        12753    12771      +18     
- Misses       7243     7244       +1     
Impacted Files Coverage Δ
include/ignition/gazebo/EntityComponentManager.hh 100.00% <ø> (ø)
...e/ignition/gazebo/detail/EntityComponentManager.hh 93.42% <70.00%> (-1.69%) ⬇️
src/EntityComponentManager.cc 84.83% <100.00%> (+0.19%) ⬆️
src/SimulationRunner.cc 93.92% <100.00%> (+0.18%) ⬆️
src/ServerPrivate.cc 87.15% <0.00%> (+0.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 628fe9e...3954958. Read the comment docs.

@iche033
Copy link
Contributor

iche033 commented Aug 31, 2021

what's a good way to test this PR? Should I merge these changes with #959 and try cloning lights?

@adlarkin
Copy link
Contributor Author

adlarkin commented Aug 31, 2021

what's a good way to test this PR? Should I merge these changes with #959 and try cloning lights?

Yeah, that's what I did! Cloning lights should segfault without these changes. I'd try cloning a light several times with/without these changes to make sure that the race condition is fixed.

Signed-off-by: Ashton Larkin <ashton@openrobotics.org>
@iche033
Copy link
Contributor

iche033 commented Aug 31, 2021

hmm I merged these changes into the adlarkin/clone_entities branch but I'm still getting crashes. I see this error msg in the console:

[Err] [BaseStorage.hh:927] Another item already exists with name: tube_light_copy

I followed the steps you mentioned in #959:

  1. Pause sim
  2. clone tube light
  3. remove tube light clone through Entity Tree
  4. clone tube light again
  5. Run sim -> crash

I don't get the crash if sim is not paused though.

@adlarkin
Copy link
Contributor Author

I don't get the crash if sim is not paused though.

Oh yeah, so the steps I mentioned about multiple clones/removals while simulation is paused in #959 is actually a separate bug that I don't believe is related to this PR (sorry about the confusion). That seems to be something separate that needs to be fixed, which is why I brought it up, but it has nothing to do with the race condition (at least, I don't think). With this PR, you should have no issues cloning/removing lights while simulation is running, and should have no issues cloning an entity while simulation is paused, as long as you don't remove it and try to re-clone it again while simulation is paused.

@iche033
Copy link
Contributor

iche033 commented Sep 1, 2021

ok I see. I tried the adlarkin/clone_entities again and I get a crash when cloning the light while sim is running. With the changes in this PR, I no longer get the crash.

@adlarkin adlarkin merged commit f48085b into main Sep 1, 2021
@adlarkin adlarkin deleted the adlarkin/views_race_condition branch September 1, 2021 03:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏯 fortress Ignition Fortress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants