Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix actual master and add -race in CI #907

Merged
merged 5 commits into from
Sep 10, 2024
Merged

Fix actual master and add -race in CI #907

merged 5 commits into from
Sep 10, 2024

Conversation

atercattus
Copy link
Member

Before the new tag we need to fix the actual main:
1.

run golangci-lint
  Running [/home/runner/golangci-lint-1.60.1-linux-amd64/golangci-lint run  --timeout=3m] in [/home/runner/work/go-mysql/go-mysql] ...
  driver/driver.go:206:8: SA1032: arguments have the wrong order (staticcheck)
  			if !goErrors.Is(sqldriver.ErrSkip, err) {
  			    ^
  mysql/util.go:200:7: SA4003: every value of type uint64 is <= math.MaxUint64 (staticcheck)
  	case n <= 0xffffffffffffffff:
  	     ^
  
  Error: issues found
=== RUN   TestDriverOptions_SetRetriesOn
==================
WARNING: DATA RACE
Read at 0x00c0004ed2e8 by goroutine 22:
  github.com/go-mysql-org/go-mysql/driver.(*mockHandler).handleQuery()
      /data/go/github.com/go-mysql-org/go-mysql/driver/driver_options_test.go:312 +0x68
  github.com/go-mysql-org/go-mysql/driver.(*mockHandler).HandleQuery()
      /data/go/github.com/go-mysql-org/go-mysql/driver/driver_options_test.go:369 +0x44
  github.com/go-mysql-org/go-mysql/server.(*Conn).dispatch()
      /data/go/github.com/go-mysql-org/go-mysql/server/command.go:78 +0x3e9
  github.com/go-mysql-org/go-mysql/server.(*Conn).HandleCommand()
      /data/go/github.com/go-mysql-org/go-mysql/server/command.go:53 +0xd6
  github.com/go-mysql-org/go-mysql/driver.CreateMockServer.func1.1()
      /data/go/github.com/go-mysql-org/go-mysql/driver/driver_options_test.go:287 +0x96

Previous write at 0x00c0004ed2e8 by goroutine 19:
  github.com/go-mysql-org/go-mysql/driver.(*mockHandler).handleQuery()
      /data/go/github.com/go-mysql-org/go-mysql/driver/driver_options_test.go:312 +0x7d
  github.com/go-mysql-org/go-mysql/driver.(*mockHandler).HandleQuery()
      /data/go/github.com/go-mysql-org/go-mysql/driver/driver_options_test.go:369 +0x44
  github.com/go-mysql-org/go-mysql/server.(*Conn).dispatch()
      /data/go/github.com/go-mysql-org/go-mysql/server/command.go:78 +0x3e9
  github.com/go-mysql-org/go-mysql/server.(*Conn).HandleCommand()
      /data/go/github.com/go-mysql-org/go-mysql/server/command.go:53 +0xd6
  github.com/go-mysql-org/go-mysql/driver.CreateMockServer.func1.1()
      /data/go/github.com/go-mysql-org/go-mysql/driver/driver_options_test.go:287 +0x96

@atercattus atercattus self-assigned this Aug 20, 2024
@atercattus atercattus requested a review from lance6716 August 20, 2024 06:18
@@ -32,7 +34,7 @@ type testServer struct {

type mockHandler struct {
// the number of times a query executed
queryCount int
Copy link
Collaborator

@lance6716 lance6716 Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we have this race. (Although I can reproduce it with -race test in my local environment.)

In the unit test (I choose TestDriverOptions_SetRetriesOff), the happened-before relation should be conn.QueryContentstarts -> mockHandler.handleQuery modifies h.queryCount -> conn.QueryContent ends -> TestDriverOptions_SetRetriesOff reads srv.handler.queryCount. I need more time to check it 😢

Because even if we use atomic variable to avoid race on this variable, the race of testing logic will let the check fail if atomic.Load happens before atomic.Add


==================
WARNING: DATA RACE
Read at 0x00c0000147e8 by goroutine 19:
  github.com/go-mysql-org/go-mysql/driver.TestDriverOptions_SetRetriesOff()
      /home/lance/Projects/go-mysql/driver/driver_options_test.go:78 +0x26d
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/lib/go/src/testing/testing.go:1742 +0x44

Previous write at 0x00c0000147e8 by goroutine 23:
  github.com/go-mysql-org/go-mysql/driver.(*mockHandler).handleQuery()
      /home/lance/Projects/go-mysql/driver/driver_options_test.go:313 +0xb2
  github.com/go-mysql-org/go-mysql/driver.(*mockHandler).HandleQuery()
      /home/lance/Projects/go-mysql/driver/driver_options_test.go:370 +0x44
  github.com/go-mysql-org/go-mysql/server.(*Conn).dispatch()
      /home/lance/Projects/go-mysql/server/command.go:78 +0x3e9
  github.com/go-mysql-org/go-mysql/server.(*Conn).HandleCommand()
      /home/lance/Projects/go-mysql/server/command.go:53 +0xd6
  github.com/go-mysql-org/go-mysql/driver.CreateMockServer.func1.1()
      /home/lance/Projects/go-mysql/driver/driver_options_test.go:287 +0x96

Goroutine 19 (running) created at:
  testing.(*T).Run()
      /usr/lib/go/src/testing/testing.go:1742 +0x825
  testing.runTests.func1()
      /usr/lib/go/src/testing/testing.go:2161 +0x85
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1689 +0x21e
  testing.runTests()
      /usr/lib/go/src/testing/testing.go:2159 +0x8be
  testing.(*M).Run()
      /usr/lib/go/src/testing/testing.go:2027 +0xf17
  main.main()
      _testmain.go:67 +0x2bd

Goroutine 23 (running) created at:
  github.com/go-mysql-org/go-mysql/driver.CreateMockServer.func1()
      /home/lance/Projects/go-mysql/driver/driver_options_test.go:281 +0x53
==================

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into this more closely too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stop the mock server before reading the queryCount:

func TestDriverOptions_SetRetriesOff(t *testing.T) {
	log.SetLevel(log.LevelDebug)
	srv := CreateMockServer(t)
	defer srv.Stop()
...
	rows, err := conn.QueryContext(context.TODO(), "select * from slow;")
...
	require.EqualValues(t, 1, srv.handler.queryCount.Load())
}
func TestDriverOptions_SetRetriesOff(t *testing.T) {
	log.SetLevel(log.LevelDebug)
	srv := CreateMockServer(t)
	defer srv.Stop() // it's ok here to call it twice
...
	rows, err := conn.QueryContext(context.TODO(), "select * from slow;")
	srv.Stop() // stop here before accessing to queryCount
...
	require.EqualValues(t, 1, srv.handler.queryCount.Load())
}

What do you think about it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the original int variable, I see the data race is still reported even if we manually stop the mock server. So it's likely that with atomic variables the Load may still happen before Add

Sorry I'm oncall this week, I think I can start to locate the problem in next week.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have unsynchronized access to this int from different goroutines that are created here:

handler := &mockHandler{}

go func() {
	for {
		conn, _ := l.Accept()
		go func() {
			co, _ := server.NewCustomizedConn(conn, defaultServer, inMemProvider, handler)
			for {
				_ = co.HandleCommand() // common `handler` is reused by different goroutines
			}
		}()
	}
}()

TestDriverOptions_SetRetriesOn sends 3 queries, and we panic on the second.

Copy link
Collaborator

@lance6716 lance6716 Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I have understood now. We have set readTimeout for the tests TestDriverOptions_SetRetriesOn and TestDriverOptions_SetRetriesOff. So when conn.QueryContext returns it's caused by the timeout, rather than receieved the response. It means when conn.QueryContext returns (*mockHandler).handleQuery may not be executed to generate the response.

I was mistakenly thought that there's a happened-before relation of h.queryCount++ -> (*mockHandler).handleQuery finished -> mock server write response -> conn.QueryContext read the response and return -> testing goroutine reads srv.handler.queryCount. But it's not true because the returning of conn.QueryContext is not caused by getting a response. So the write and read of queryCount indeed has no happened-before relation which is a data race.

And I suggest this fix

diff --git a/driver/driver_options_test.go b/driver/driver_options_test.go
index eaa863f..204d2bd 100644
--- a/driver/driver_options_test.go
+++ b/driver/driver_options_test.go
@@ -10,6 +10,7 @@ import (
        "reflect"
        "strconv"
        "strings"
+       "sync"
        "testing"
        "time"
 
@@ -33,12 +34,16 @@ type testServer struct {
 type mockHandler struct {
        // the number of times a query executed
        queryCount int
+       modifier   *sync.WaitGroup
 }
 
 func TestDriverOptions_SetRetriesOn(t *testing.T) {
        log.SetLevel(log.LevelDebug)
        srv := CreateMockServer(t)
        defer srv.Stop()
+       var wg sync.WaitGroup
+       srv.handler.modifier = &wg
+       wg.Add(3)
 
        conn, err := sql.Open("mysql", "root@127.0.0.1:3307/test?readTimeout=1s")
        defer func() {
@@ -52,6 +57,7 @@ func TestDriverOptions_SetRetriesOn(t *testing.T) {
        // we want to get a golang database/sql/driver ErrBadConn
        require.ErrorIs(t, err, sqlDriver.ErrBadConn)
 
+       wg.Wait()
        // here we issue assert that even though we only issued 1 query, that the retries
        // remained on and there were 3 calls to the DB.
        require.Equal(t, 3, srv.handler.queryCount)
@@ -61,6 +67,9 @@ func TestDriverOptions_SetRetriesOff(t *testing.T) {
        log.SetLevel(log.LevelDebug)
        srv := CreateMockServer(t)
        defer srv.Stop()
+       var wg sync.WaitGroup
+       srv.handler.modifier = &wg
+       wg.Add(1)
 
        conn, err := sql.Open("mysql", "root@127.0.0.1:3307/test?readTimeout=1s&retries=off")
        defer func() {
@@ -73,6 +82,7 @@ func TestDriverOptions_SetRetriesOff(t *testing.T) {
        // we want the native error from this driver implementation
        require.ErrorIs(t, err, mysql.ErrBadConn)
 
+       wg.Wait()
        // here we issue assert that even though we only issued 1 query, that the retries
        // remained on and there were 3 calls to the DB.
        require.Equal(t, 1, srv.handler.queryCount)
@@ -309,6 +319,11 @@ func (h *mockHandler) UseDB(dbName string) error {
 }
 
 func (h *mockHandler) handleQuery(query string, binary bool, args []interface{}) (*mysql.Result, error) {
+       defer func() {
+               if h.modifier != nil {
+                       h.modifier.Done()
+               }
+       }()
        h.queryCount++
        ss := strings.Split(query, " ")
        switch strings.ToLower(ss[0]) {

We add synchronization and still use int. It does not rely on time interval 1 second (readTimeout) or 5 seconds (we inject the sleep in around line 345 of driver_options_test.go). And we can reduce these two values now.

Copy link
Member Author

@atercattus atercattus Aug 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we don't need int, because WaitGroup.Wait() will hang until the required number of requests.

The only case is if there are more retries than necessary (eg. h.queryCount==4 and wg.Add(3)). But here will we get unsynchronized access?

Copy link
Collaborator

@lance6716 lance6716 Aug 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we can remove the int now. And maybe add -race to ci.yml to find other unsynchronized access and fail that PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll update this PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, if you are busy (not response in a few days) I can update this PR.

@lance6716
Copy link
Collaborator

part of the fix is included in #912

…tual-master

Signed-off-by: lance6716 <lance6716@gmail.com>
Signed-off-by: lance6716 <lance6716@gmail.com>
Signed-off-by: lance6716 <lance6716@gmail.com>
@lance6716 lance6716 changed the title Fix actual master Fix actual master and add -race in CI Sep 10, 2024
@lance6716 lance6716 merged commit 3a665d0 into master Sep 10, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants