-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix concurrent read and wirte group error #1447
fix concurrent read and wirte group error #1447
Conversation
Signed-off-by: denghuan <denghuan@actionsky.com>
@juliusv Could you please confirm if this problem is existed? |
Sorry for the delay, I'll look at this soon. Adding a testcase that fails because of the race detector would be helpful and prevent a regression. |
@@ -271,7 +271,9 @@ func (d *Dispatcher) processAlert(alert *types.Alert, route *Route) { | |||
ag, ok := group[fp] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part should probably be protected too. It might be simpler to just add a defer
after the first lock (L262):
fp := groupLabels.Fingerprint()
d.mtx.Lock()
defer d.mtx.Unlock()
@stuartnelson3 ok, I'll add a testcase. |
Signed-off-by: denghuan <denghuan@actionsky.com>
@stuartnelson3 I'm sorry that I can't add a testcase because it's very hard to recurrent. It only occurred once in my production, and the panic reason is that concurrent read and write map. |
👍 for me. Looking at the code it is indeed quite complicated to add a test that will catch the issue in a deterministic fashion. |
Looks good. In general, it feels like we're throwing locks all over the place because we're directly accessing these maps. It might simplify things to have an abstraction around a map that manages this, and access that abstraction from our code. |
* fix concurrent read and wirte group Signed-off-by: denghuan <denghuan@actionsky.com> * make lock more elegant Signed-off-by: denghuan <denghuan@actionsky.com>
There is no lock when adding aggrGroup to group map. If another goroutine call Dispatcher.Groups or Dispatcher.run clear up aggrGroups, alertmanager will be panic.