-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix actual master and add -race
in CI
#907
Conversation
@@ -32,7 +34,7 @@ type testServer struct { | |||
|
|||
type mockHandler struct { | |||
// the number of times a query executed | |||
queryCount int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we have this race. (Although I can reproduce it with -race
test in my local environment.)
In the unit test (I choose TestDriverOptions_SetRetriesOff
), the happened-before relation should be conn.QueryContent
starts -> mockHandler.handleQuery
modifies h.queryCount
-> conn.QueryContent
ends -> TestDriverOptions_SetRetriesOff
reads srv.handler.queryCount
. I need more time to check it 😢
Because even if we use atomic variable to avoid race on this variable, the race of testing logic will let the check fail if atomic.Load happens before atomic.Add
==================
WARNING: DATA RACE
Read at 0x00c0000147e8 by goroutine 19:
github.com/go-mysql-org/go-mysql/driver.TestDriverOptions_SetRetriesOff()
/home/lance/Projects/go-mysql/driver/driver_options_test.go:78 +0x26d
testing.tRunner()
/usr/lib/go/src/testing/testing.go:1689 +0x21e
testing.(*T).Run.gowrap1()
/usr/lib/go/src/testing/testing.go:1742 +0x44
Previous write at 0x00c0000147e8 by goroutine 23:
github.com/go-mysql-org/go-mysql/driver.(*mockHandler).handleQuery()
/home/lance/Projects/go-mysql/driver/driver_options_test.go:313 +0xb2
github.com/go-mysql-org/go-mysql/driver.(*mockHandler).HandleQuery()
/home/lance/Projects/go-mysql/driver/driver_options_test.go:370 +0x44
github.com/go-mysql-org/go-mysql/server.(*Conn).dispatch()
/home/lance/Projects/go-mysql/server/command.go:78 +0x3e9
github.com/go-mysql-org/go-mysql/server.(*Conn).HandleCommand()
/home/lance/Projects/go-mysql/server/command.go:53 +0xd6
github.com/go-mysql-org/go-mysql/driver.CreateMockServer.func1.1()
/home/lance/Projects/go-mysql/driver/driver_options_test.go:287 +0x96
Goroutine 19 (running) created at:
testing.(*T).Run()
/usr/lib/go/src/testing/testing.go:1742 +0x825
testing.runTests.func1()
/usr/lib/go/src/testing/testing.go:2161 +0x85
testing.tRunner()
/usr/lib/go/src/testing/testing.go:1689 +0x21e
testing.runTests()
/usr/lib/go/src/testing/testing.go:2159 +0x8be
testing.(*M).Run()
/usr/lib/go/src/testing/testing.go:2027 +0xf17
main.main()
_testmain.go:67 +0x2bd
Goroutine 23 (running) created at:
github.com/go-mysql-org/go-mysql/driver.CreateMockServer.func1()
/home/lance/Projects/go-mysql/driver/driver_options_test.go:281 +0x53
==================
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into this more closely too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should stop the mock server before reading the queryCount:
func TestDriverOptions_SetRetriesOff(t *testing.T) {
log.SetLevel(log.LevelDebug)
srv := CreateMockServer(t)
defer srv.Stop()
...
rows, err := conn.QueryContext(context.TODO(), "select * from slow;")
...
require.EqualValues(t, 1, srv.handler.queryCount.Load())
}
func TestDriverOptions_SetRetriesOff(t *testing.T) {
log.SetLevel(log.LevelDebug)
srv := CreateMockServer(t)
defer srv.Stop() // it's ok here to call it twice
...
rows, err := conn.QueryContext(context.TODO(), "select * from slow;")
srv.Stop() // stop here before accessing to queryCount
...
require.EqualValues(t, 1, srv.handler.queryCount.Load())
}
What do you think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the original int
variable, I see the data race is still reported even if we manually stop the mock server. So it's likely that with atomic variables the Load may still happen before Add
Sorry I'm oncall this week, I think I can start to locate the problem in next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have unsynchronized access to this int
from different goroutines that are created here:
handler := &mockHandler{}
go func() {
for {
conn, _ := l.Accept()
go func() {
co, _ := server.NewCustomizedConn(conn, defaultServer, inMemProvider, handler)
for {
_ = co.HandleCommand() // common `handler` is reused by different goroutines
}
}()
}
}()
TestDriverOptions_SetRetriesOn sends 3 queries, and we panic on the second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I have understood now. We have set readTimeout
for the tests TestDriverOptions_SetRetriesOn
and TestDriverOptions_SetRetriesOff
. So when conn.QueryContext
returns it's caused by the timeout, rather than receieved the response. It means when conn.QueryContext
returns (*mockHandler).handleQuery
may not be executed to generate the response.
I was mistakenly thought that there's a happened-before relation of h.queryCount++
-> (*mockHandler).handleQuery
finished -> mock server write response -> conn.QueryContext
read the response and return -> testing goroutine reads srv.handler.queryCount
. But it's not true because the returning of conn.QueryContext
is not caused by getting a response. So the write and read of queryCount
indeed has no happened-before relation which is a data race.
And I suggest this fix
diff --git a/driver/driver_options_test.go b/driver/driver_options_test.go
index eaa863f..204d2bd 100644
--- a/driver/driver_options_test.go
+++ b/driver/driver_options_test.go
@@ -10,6 +10,7 @@ import (
"reflect"
"strconv"
"strings"
+ "sync"
"testing"
"time"
@@ -33,12 +34,16 @@ type testServer struct {
type mockHandler struct {
// the number of times a query executed
queryCount int
+ modifier *sync.WaitGroup
}
func TestDriverOptions_SetRetriesOn(t *testing.T) {
log.SetLevel(log.LevelDebug)
srv := CreateMockServer(t)
defer srv.Stop()
+ var wg sync.WaitGroup
+ srv.handler.modifier = &wg
+ wg.Add(3)
conn, err := sql.Open("mysql", "root@127.0.0.1:3307/test?readTimeout=1s")
defer func() {
@@ -52,6 +57,7 @@ func TestDriverOptions_SetRetriesOn(t *testing.T) {
// we want to get a golang database/sql/driver ErrBadConn
require.ErrorIs(t, err, sqlDriver.ErrBadConn)
+ wg.Wait()
// here we issue assert that even though we only issued 1 query, that the retries
// remained on and there were 3 calls to the DB.
require.Equal(t, 3, srv.handler.queryCount)
@@ -61,6 +67,9 @@ func TestDriverOptions_SetRetriesOff(t *testing.T) {
log.SetLevel(log.LevelDebug)
srv := CreateMockServer(t)
defer srv.Stop()
+ var wg sync.WaitGroup
+ srv.handler.modifier = &wg
+ wg.Add(1)
conn, err := sql.Open("mysql", "root@127.0.0.1:3307/test?readTimeout=1s&retries=off")
defer func() {
@@ -73,6 +82,7 @@ func TestDriverOptions_SetRetriesOff(t *testing.T) {
// we want the native error from this driver implementation
require.ErrorIs(t, err, mysql.ErrBadConn)
+ wg.Wait()
// here we issue assert that even though we only issued 1 query, that the retries
// remained on and there were 3 calls to the DB.
require.Equal(t, 1, srv.handler.queryCount)
@@ -309,6 +319,11 @@ func (h *mockHandler) UseDB(dbName string) error {
}
func (h *mockHandler) handleQuery(query string, binary bool, args []interface{}) (*mysql.Result, error) {
+ defer func() {
+ if h.modifier != nil {
+ h.modifier.Done()
+ }
+ }()
h.queryCount++
ss := strings.Split(query, " ")
switch strings.ToLower(ss[0]) {
We add synchronization and still use int
. It does not rely on time interval 1 second (readTimeout
) or 5 seconds (we inject the sleep in around line 345 of driver_options_test.go). And we can reduce these two values now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we don't need int
, because WaitGroup.Wait()
will hang until the required number of requests.
The only case is if there are more retries than necessary (eg. h.queryCount==4 and wg.Add(3)
). But here will we get unsynchronized access?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we can remove the int
now. And maybe add -race
to ci.yml to find other unsynchronized access and fail that PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'll update this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, if you are busy (not response in a few days) I can update this PR.
part of the fix is included in #912 |
Before the new tag we need to fix the actual main:
1.