Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

Fixes #1482, removed unsafe double RLock #1525

Merged
merged 1 commit into from
Feb 24, 2017

Conversation

katarzyna-z
Copy link
Contributor

@katarzyna-z katarzyna-z commented Feb 16, 2017

Fixes #1482

Summary of changes:

  • removed unsafe double RLock, Rlock is hold before SelectAP() in these lines: L404, L453, L487

Testing done:

  • tested using this script: test.sh

@intelsdi-x/snap-maintainers

@IRCody
Copy link
Contributor

IRCody commented Feb 16, 2017

What makes the double RLock unsafe and how is it related to #1482?

@katarzyna-z
Copy link
Contributor Author

katarzyna-z commented Feb 16, 2017

in golang documentation (https://golang.org/src/sync/rwmutex.go) there is following information:

// If a goroutine holds a RWMutex for reading, it must not expect this or any
// other goroutine to be able to also take the read lock until the first read
// lock is released. In particular, this prohibits recursive read locking.
// This is to ensure that the lock eventually becomes available;
// a blocked Lock call excludes new readers from acquiring the lock.

I've also found this issue: golang/go#15418.
Testing Snap I've noticed that there is a problem with locking mutex in (*pool).Unsubscribe and (*pool).Subscribe.

@@ -361,10 +361,8 @@ func (p *pool) SubscriptionCount() int {
}

// SelectAP selects an available plugin from the pool
// the method is not thread safety, it should be protected outside of the body
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/safety/safe/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected :]

Copy link
Collaborator

@jcooklin jcooklin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once @IRCody also gives a +1

@IRCody
Copy link
Contributor

IRCody commented Feb 17, 2017

Thanks for the links. That's an interesting and subtle difference (more subtle if you read the entire comment on the type).

Do we want the locks to be removed from inside pool or for them to be used inside pool only? Other pool functions contain the same code for locking/unlocking that we are removing here, do they have a similar bug?

@katarzyna-z katarzyna-z force-pushed the fix-unstoppable-task branch 2 times, most recently from 20f15b2 to ac9e9bf Compare February 17, 2017 14:35
@katarzyna-z
Copy link
Contributor Author

I've found one more double RLock in (p *pool) Eligible() (there are Rlocks in (p *pool) Count() and (p *pool) SubscriptionCount()) and lock for reading in (p *pool) IncRestartCount where value is modified.

I anlyzed locking of pool and pool is locked outside only three times in:
(ap *availablePlugins) collectMetrics, (ap *availablePlugins) publishMetrics, (ap *availablePlugins) processMetrics so it seems that the rest of locks inside pool is correct.

Copy link
Collaborator

@jcooklin jcooklin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP - Do not merge. Testing using the harness @katarzyna-z provided is still underway. We are seeing failures on a multi core physical system that have yet to be understood. On dev systems and VMs with multiple cores the test harness works (with the PR) as expected.

@IRCody
Copy link
Contributor

IRCody commented Feb 22, 2017

The changes in this PR do seem to fix some pretty serious issues that are identified by the script. The failures we were seeing were a fail by the tester(me) and not the PR 👎 . With fresh eyes the fix works as described/expected.

I think we might want to have more discussion about whether we want the pool mutex to be locked from inside or outside the pool. I think it might be confusing to do both (since that is at least partially what led to this issue). I know after talking with @jcooklin about it that we want to have some consistency. Maybe we can have that discussion here?

@katarzyna-z
Copy link
Contributor Author

katarzyna-z commented Feb 23, 2017

I've updated the PR - I moved Rlocks outside pool to (p *pool) SelectAP, please see 4cf884c. Testing Snap I don't see bad consequences of these changes but I would rather remove the latest commit in my PR.
Taking look at code control/available_plugin.go#L443, before moving the Rlock to (p *pool) SelectAP pool was locked for reading (L453) till L471. When RLock is moved to (p *pool) SelectAP scope of lock is shorter, there are operation on availablePlugin but available plugin may be removed by (p *pool) SelectAndKill(control/strategy/pool.go#L307 and control/strategy/pool.go#L331).

@jcooklin
Copy link
Collaborator

jcooklin commented Feb 24, 2017

Testing Snap I don't see bad consequences of these changes but I would rather remove the latest commit in my PR....

I agree. LGTM after removing latest commit.

@katarzyna-z katarzyna-z force-pushed the fix-unstoppable-task branch 2 times, most recently from ac9e9bf to 70d5b5e Compare February 24, 2017 06:38
Removed double RLock in (p *pool) Eligible()

Replaced lock for reading with lock for writing (RLock/RUnlock -> Lock/Unlock) in (p *pool) IncRestartCount
@katarzyna-z
Copy link
Contributor Author

katarzyna-z commented Feb 24, 2017

I've removed the latest commit and squashed the rest of commits.

@IRCody @jcooklin Thanks for tests and review! 🌹

@PatrykMatyjasek
Copy link
Contributor

LGTM

@PatrykMatyjasek PatrykMatyjasek merged commit 00d1c24 into intelsdi-x:master Feb 24, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants