fix data race on tx.Stats #373

fuweid · 2023-01-02T14:36:02Z

Fixes: #213

Signed-off-by: Wei Fu fuweid89@gmail.com

RFC: Even if updating the type of Stats interface breaks the package usage, I think it should define the size of field with specific type instead of int. So we can use atomic to update the stats instead of introducing the lock.

cc @ahrtr @ptabor

Fixes: etcd-io#213 Signed-off-by: Wei Fu <fuweid89@gmail.com>

ahrtr · 2023-01-03T00:00:38Z

The change looks good to me.

Could you please update the workflows/tests.yaml to add race test using matrix? It can prevent such kind of issue from happening again in future.

Reference: https://github.com/etcd-io/etcd/blob/main/.github/workflows/tests.yaml

Signed-off-by: Wei Fu <fuweid89@gmail.com>

fuweid · 2023-01-03T06:43:04Z

mark it draft for testing.

Signed-off-by: Wei Fu <fuweid89@gmail.com>

The test log: ================== WARNING: DATA RACE Write at 0x00c000388370 by goroutine 37: go.etcd.io/bbolt/internal/btesting.MustOpenDBWithOption() /home/runner/work/bbolt/bbolt/internal/btesting/btesting.go:52 +0x1c7 go.etcd.io/bbolt_test.TestDB_Concurrent_WriteTo.func1() /home/runner/work/bbolt/bbolt/db_test.go:685 +0x1f4 go.etcd.io/bbolt_test.TestDB_Concurrent_WriteTo·dwrap·18() /home/runner/work/bbolt/bbolt/db_test.go:714 +0x47 Previous write at 0x00c000388370 by goroutine 83: go.etcd.io/bbolt/internal/btesting.MustOpenDBWithOption() /home/runner/work/bbolt/bbolt/internal/btesting/btesting.go:52 +0x1c7 go.etcd.io/bbolt_test.TestDB_Concurrent_WriteTo.func1() /home/runner/work/bbolt/bbolt/db_test.go:685 +0x1f4 go.etcd.io/bbolt_test.TestDB_Concurrent_WriteTo·dwrap·18() /home/runner/work/bbolt/bbolt/db_test.go:714 +0x47 Goroutine 37 (running) created at: go.etcd.io/bbolt_test.TestDB_Concurrent_WriteTo() /home/runner/work/bbolt/bbolt/db_test.go:714 +0x724 testing.tRunner() /opt/hostedtoolcache/go/1.17.13/x64/src/testing/testing.go:1259 +0x22f testing.(*T).Run·dwrap·21() /opt/hostedtoolcache/go/1.17.13/x64/src/testing/testing.go:1306 +0x47 Goroutine 83 (running) created at: go.etcd.io/bbolt_test.TestDB_Concurrent_WriteTo() /home/runner/work/bbolt/bbolt/db_test.go:714 +0x724 testing.tRunner() /opt/hostedtoolcache/go/1.17.13/x64/src/testing/testing.go:1259 +0x22f testing.(*T).Run·dwrap·21() /opt/hostedtoolcache/go/1.17.13/x64/src/testing/testing.go:1306 +0x47 ================== Signed-off-by: Wei Fu <fuweid89@gmail.com>

fuweid · 2023-01-03T14:32:11Z

.github/workflows/tests.yaml

+            ;;
+          linux-amd64-unit-test-4-cpu-race)
+            CPU=4 ENABLE_RACE=true make test-simulate
+            CPU=4 ENABLE_RACE=true make test


No idea about the cpu usage when enable -race.

It seems that the standard github action runner doesn't allow user to run high cpu usage even if it provides two vcore to us. It will shutdown the runner or kill the process with SIGTERM.

ubuntu-latest: jobs fail with error code 143 actions/runner-images#6680 (comment)

The runner has received a shutdown signal. actions/runner-images#6709

I didn't see it before. maybe we can enable -race later?

@ahrtr

REF:

https://github.com/fuweid/bbolt/actions/runs/3827981238/jobs/6513066220 (runner sends SIGTERM to go test

https://github.com/fuweid/bbolt/actions/runs/3828126044/jobs/6513369814 (shutdown the runner)

https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources

Etcd is running unit tests with 'race': https://github.com/etcd-io/etcd/actions/runs/3821152800/jobs/6500015229

I would try...
If it would not work, I would try running the --shorttests alone (that should be quick enough to not trigger high CPU usage).

Yeah. I was trying to use new go version, use taskset -c 0 to limit it in one core and limit it with one go queue.
But it all failed. Maybe it hits some limitation in the underlay of runner...

Side note: It always fails when test for array freelist type.

My mental model is that we should be deprecating array-freelist...
I'm not aware about adventages over the map-driven implementation.
So I would be OK for not testing in --race the array one.

ptabor · 2023-01-03T14:49:00Z

Makefile


-.PHONY: race fmt test lint
+.PHONY: fmt test test-simulate lint


Does it add value to have both here: "test" and "test-simulate" ?
It seems to me that test-simulate is subset of the 'test'.

If we cannot afford running all tests in the --race mode, probably --short and 'test-simulate' in the race mode is offering a cross-cutting coverage.

Agreed.

Since we still need to support array-freelist although I agree to deprecate it, I would suggest to only run test with flag --short for array-freelist.

FYI.

https://github.com/etcd-io/etcd/blob/e73f55d4e94666c99558baa2fd4e365aeaca4dc4/scripts/test.sh#L100

https://github.com/etcd-io/etcd/blob/e73f55d4e94666c99558baa2fd4e365aeaca4dc4/tests/e2e/etcd_config_test.go#L33

bbolt/simulation_test.go

Lines 34 to 36 in b654ce9

if testing.Short() {

t.Skip("skipping test in short mode.")

}

I deleted the test-simulate since it is duplicate.
And I split the make test into make test-freelist-hashmap and make test-freelist-array so that we can use EXTRA_TESTFLAGS=-short for array.

ptabor · 2023-01-03T14:58:50Z

tx.go

@@ -672,27 +673,27 @@ func (tx *Tx) Page(id int) (*PageInfo, error) {
 // TxStats represents statistics about the actions performed by the transaction.
 type TxStats struct {
 	// Page statistics.
-	PageCount int // number of page allocations
-	PageAlloc int // total bytes allocated
+	PageCount int64 // number of page allocations


I would go a step forward:

Warn on the tx.Stats() method that direct usage of the fields is deprecated
due to lack of safety on e.g. 32-bit architectures.

Add to this class 2 methods for each field:

func (t* TxStats)IncrementPageCount(delta int64) {
return atomic...
}

func (t* TxStats)GetPageCount(delta int64) {
return atomic.LoadInt64(...)
}

such that user's can write safe code without depending on the arch-properties.

Great point!

The only minor comment is about the name on the Getter method, It isn't a best practice to add the prefix Get, for example, PageCount is better than GetPageCount in above example. FYI. https://go.dev/doc/effective_go#Getters

Agree. But we already have txStats.PageCount field.
I assume we cannot rename it (within major version) to txStats.pageCount as it would be breaking change of public API surface.
We also cannot have a field and method with the same name in a single class.

Updated. PTAL~

I would suggest to add comment to notify that we are planning not to expose each field, e.g change PageCount to pageCount .

Signed-off-by: Wei Fu <fuweid89@gmail.com>

ptabor · 2023-01-04T12:33:04Z

Makefile


-.PHONY: race fmt test lint
+.PHONY: fmt test test-freelist-hashmap test-freelist-array lint


Now it's still redundant... test resolves to both: test-freelist-{...} targets. And no race run by default at all.

I tried this combination in action and 75% successful rate for the race matrix 😂

Let me add the following matrix in the follow-up pr.

linux-amd64-unit-test-2-cpu-simulate-race) CPU=2 ENABLE_RACE=true EXTRA_TESTFLAGS="-test.run='TestSimulate_(100op|1000op)'" make test ;;

ahrtr · 2023-01-04T22:44:08Z

.github/workflows/tests.yaml

-        os: [ubuntu-latest, windows-latest]
-    runs-on: ${{ matrix.os }}
+        target:
+        - linux-amd64-unit-test-4-cpu


It would be better we have the following three targets to cover the no-race cases:

linux-amd64-unit-test-1-cpu

linux-amd64-unit-test-2-cpu

linux-amd64-unit-test-4-cpu

Of course, it can be addressed in a separate PR if there is no any objection.

ahrtr

Overall looks good to me with a couple of minor comments which can be addressed in a separate PR.

Thank you @fuweid

fuweid · 2023-01-05T02:39:15Z

Thanks @ahrtr @ptabor for the review!

#373 Follow-up

fix data race on tx.Stats

27ac0b8

Fixes: etcd-io#213 Signed-off-by: Wei Fu <fuweid89@gmail.com>

fuweid force-pushed the fix-213 branch from 8cecc67 to ab1cb49 Compare January 3, 2023 06:34

Enable -race in github action

99a93a6

Signed-off-by: Wei Fu <fuweid89@gmail.com>

fuweid force-pushed the fix-213 branch from ab1cb49 to 99a93a6 Compare January 3, 2023 06:36

fuweid marked this pull request as draft January 3, 2023 06:42

.github: run coverage in individual job

f8fa364

Signed-off-by: Wei Fu <fuweid89@gmail.com>

fuweid force-pushed the fix-213 branch 2 times, most recently from 662922b to e878ba8 Compare January 3, 2023 07:26

.github: split test job into test-{linux/windows}

5104c82

Signed-off-by: Wei Fu <fuweid89@gmail.com>

fuweid force-pushed the fix-213 branch from e878ba8 to 5104c82 Compare January 3, 2023 07:27

fuweid force-pushed the fix-213 branch 7 times, most recently from cbc6940 to 4e98e8f Compare January 3, 2023 14:20

fuweid commented Jan 3, 2023

View reviewed changes

fuweid marked this pull request as ready for review January 3, 2023 14:33

ptabor reviewed Jan 3, 2023

View reviewed changes

fuweid added 2 commits January 4, 2023 20:05

.github/Makefile: support EXTRA_TESTFLAGS for array freelist

b7f2da4

Signed-off-by: Wei Fu <fuweid89@gmail.com>

TxStats: provides Getter/Inc functions

dd4458c

Signed-off-by: Wei Fu <fuweid89@gmail.com>

ptabor reviewed Jan 4, 2023

View reviewed changes

ptabor approved these changes Jan 4, 2023

View reviewed changes

ahrtr reviewed Jan 4, 2023

View reviewed changes

ahrtr approved these changes Jan 4, 2023

View reviewed changes

ahrtr merged commit a938f00 into etcd-io:master Jan 4, 2023

fuweid deleted the fix-213 branch January 5, 2023 02:38

fuweid mentioned this pull request Jan 5, 2023

#373 Follow-up #375

Merged

ahrtr added a commit that referenced this pull request Jan 5, 2023

Merge pull request #375 from fuweid/213-follow-up

ff5cb3f

#373 Follow-up

fuweid mentioned this pull request Mar 27, 2023

[3.4] tests: Avoid testing package root tests in e2e etcd-io/etcd#15562

Merged

This was referenced Oct 14, 2023

panic: 64bit unaligned in arm32 #577

Closed

Remove the usage of atomic package #582

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix data race on tx.Stats #373

fix data race on tx.Stats #373

fuweid commented Jan 2, 2023

ahrtr commented Jan 3, 2023

fuweid commented Jan 3, 2023

fuweid Jan 3, 2023

ptabor Jan 3, 2023

fuweid Jan 3, 2023 •

edited

Loading

ptabor Jan 3, 2023

ptabor Jan 3, 2023

ahrtr Jan 4, 2023

fuweid Jan 4, 2023

ptabor Jan 3, 2023

ahrtr Jan 4, 2023

ptabor Jan 4, 2023

fuweid Jan 4, 2023

ahrtr Jan 4, 2023

ptabor Jan 4, 2023

fuweid Jan 4, 2023

fuweid Jan 4, 2023

ahrtr Jan 4, 2023

fuweid Jan 5, 2023

ahrtr left a comment

fuweid commented Jan 5, 2023


		.PHONY: race fmt test lint
		.PHONY: fmt test test-simulate lint


		.PHONY: race fmt test lint
		.PHONY: fmt test test-freelist-hashmap test-freelist-array lint

fix data race on tx.Stats #373

fix data race on tx.Stats #373

Conversation

fuweid commented Jan 2, 2023

ahrtr commented Jan 3, 2023

fuweid commented Jan 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fuweid Jan 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahrtr left a comment

Choose a reason for hiding this comment

fuweid commented Jan 5, 2023

fuweid Jan 3, 2023 •

edited

Loading