Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race when using testutil/network #11692

Closed
4 tasks
prettymuchbryce opened this issue Apr 19, 2022 · 2 comments · Fixed by #11724
Closed
4 tasks

Data race when using testutil/network #11692

prettymuchbryce opened this issue Apr 19, 2022 · 2 comments · Fixed by #11724

Comments

@prettymuchbryce
Copy link
Contributor

Summary of Bug

When running tests which use testutil/network (such as the cli tests generated via ignite), data races occur. I believe this is because the listener in server/api/server.go is being written and read concurrently in separate goroutines without any locking.

This happens because while there is a global lock which attempts to wrap the testutil/network creation, the startInProcess function invokes apiSrv.Start in a separate goroutine. The main routine then simply waits 5 seconds and assumes the API server has been properly created during this time. I believe the Go race detector will consider this is a data race since it's possible one routine could call Close() concurrently while the initial goroutine is still in the process of calling Start(), both of which access the underlying listener.

WARNING: DATA RACE
Read at 0x00c000187170 by goroutine 18:
  github.com/cosmos/cosmos-sdk/server/api.(*Server).Close()
      /home/runner/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.45.3/server/api/server.go:123 +0x269
  github.com/cosmos/cosmos-sdk/testutil/network.(*Network).Cleanup()


Previous write at 0x00c000187170 by goroutine 13:
  github.com/cosmos/cosmos-sdk/server/api.(*Server).Start()
      /home/runner/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.45.3/server/api/server.go:109 +0x2e4
  github.com/cosmos/cosmos-sdk/testutil/network.startInProcess.func1()
      /home/runner/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.45.3/testutil/network/util.go:86 +0x104

Goroutine 18 (running) created at:
  testing.(*T).Run()
      /opt/hostedtoolcache/go/1.17.8/x64/src/testing/testing.go:1306 +0x726
  testing.runTests.func1()

  Goroutine 13 (running) created at:
  github.com/cosmos/cosmos-sdk/testutil/network.startInProcess()
      /home/runner/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.45.3/testutil/network/util.go:85 +0xe0d
  github.com/syndtr/goleveldb/leveldb/storage.(*fileStorage).List()
      /home/runner/go/pkg/mod/github.com/syndtr/goleveldb@v1.0.1-0.20200815110645-5c35d600f0ca/leveldb/storage/file_storage.go:458 +0x3ea
  fmt.Fscanf()

One way to properly resolve this would be to wrap access to the listener in a mutex in server/api/server.go:

func (s *Server) Start(cfg config.Config) error {
+	m.Lock()
	if cfg.Telemetry.Enabled {
		m, err := telemetry.New(cfg.Telemetry)
		if err != nil {
+                       m.Unlock()
			return err
		}

		s.metrics = m
		s.registerMetrics()
	}

	tmCfg := tmrpcserver.DefaultConfig()
	tmCfg.MaxOpenConnections = int(cfg.API.MaxOpenConnections)
	tmCfg.ReadTimeout = time.Duration(cfg.API.RPCReadTimeout) * time.Second
	tmCfg.WriteTimeout = time.Duration(cfg.API.RPCWriteTimeout) * time.Second
	tmCfg.MaxBodyBytes = int64(cfg.API.RPCMaxBodyBytes)

	listener, err := tmrpcserver.Listen(cfg.API.Address, tmCfg)
	if err != nil {
+              m.Unlock()
		return err
	}

	s.registerGRPCGatewayRoutes()

	s.listener = listener
	var h http.Handler = s.Router

	if cfg.API.EnableUnsafeCORS {
		allowAllCORS := handlers.CORS(handlers.AllowedHeaders([]string{"Content-Type"}))
+	         m.Unlock()
		return tmrpcserver.Serve(s.listener, allowAllCORS(h), s.logger, tmCfg)
	}

	s.logger.Info("starting API server...")
+	m.Unlock()
	return tmrpcserver.Serve(s.listener, s.Router, s.logger, tmCfg)
}

// Close closes the API server.
func (s *Server) Close() error {
+	m.Lock()
+	defer m.Unlock()
	return s.listener.Close()
}

Version

v0.45.3

Steps to Reproduce

  1. Create a project with ignite
  2. Generate a new module with cli tests. i.e.
ignite scaffold module mymodule --require-registration
ignite scaffold message MyMessage mymodule --module mymodule
  1. Run tests with race detector. i.e. go test ./... -mod=readonly -timeout 12m -race -coverprofile=coverage.out -covermode=atomic -tags='ledger test_ledger_mock'

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@alexanderbez
Copy link
Contributor

alexanderbez commented Apr 19, 2022

Thanks @prettymuchbryce for creating such a detailed issue! Yeah, as you've pointed out it's technically possible to call Close prior to Start returning from the goroutine it was executed in.

While in practice, this would never happen as you'd wait for Start to return (as we do), the go compiler doesn't know this. So we should fix it for the sake of passing race detecting tests.

Since you've already posted the diff, which looks reasonable, would you open to creating a PR?

@prettymuchbryce
Copy link
Contributor Author

Since you've already posted the diff, which looks reasonable, would you open to creating a PR?

Sure. I can take a crack at it shortly.

@mergify mergify bot closed this as completed in #11724 Apr 25, 2022
mergify bot pushed a commit that referenced this issue Apr 25, 2022
## Description

Closes: #11692

This PR adds a `sync.Mutex` to `api.Server` in order to prevent data race issues. In practice, these data races are unlikely to occur as you'd typically wait for Start to return. However, this does cause an issue with testing using the `-race` flag when using the `testutil/network` package. Currently races occur by default when running tests on projects generated using `ignite`. We should fix this for the sake of passing race detecting tests.

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [X] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [X] added `!` to the type prefix if API or client breaking change
- [X] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [X] provided a link to the relevant issue or specification
- [X] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules)
- [X] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [x] added a changelog entry to `CHANGELOG.md`
- [X] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [X] updated the relevant documentation or specification
- [x] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed 
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)
mergify bot pushed a commit that referenced this issue Apr 25, 2022
## Description

Closes: #11692

This PR adds a `sync.Mutex` to `api.Server` in order to prevent data race issues. In practice, these data races are unlikely to occur as you'd typically wait for Start to return. However, this does cause an issue with testing using the `-race` flag when using the `testutil/network` package. Currently races occur by default when running tests on projects generated using `ignite`. We should fix this for the sake of passing race detecting tests.

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [X] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [X] added `!` to the type prefix if API or client breaking change
- [X] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [X] provided a link to the relevant issue or specification
- [X] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules)
- [X] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [x] added a changelog entry to `CHANGELOG.md`
- [X] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [X] updated the relevant documentation or specification
- [x] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)

(cherry picked from commit 97d735f)

# Conflicts:
#	CHANGELOG.md
mergify bot pushed a commit that referenced this issue Apr 25, 2022
## Description

Closes: #11692

This PR adds a `sync.Mutex` to `api.Server` in order to prevent data race issues. In practice, these data races are unlikely to occur as you'd typically wait for Start to return. However, this does cause an issue with testing using the `-race` flag when using the `testutil/network` package. Currently races occur by default when running tests on projects generated using `ignite`. We should fix this for the sake of passing race detecting tests.

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [X] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [X] added `!` to the type prefix if API or client breaking change
- [X] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [X] provided a link to the relevant issue or specification
- [X] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules)
- [X] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [x] added a changelog entry to `CHANGELOG.md`
- [X] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [X] updated the relevant documentation or specification
- [x] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)

(cherry picked from commit 97d735f)

# Conflicts:
#	CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants