Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed mutex in the server's websocket client #200

Merged
merged 3 commits into from
Sep 20, 2023

Conversation

evan-bradley
Copy link
Contributor

The Gorilla websocket package states that only one concurrent write can be made to a connection at a time. The library will try to detect concurrent writes and will panic if it detects that one is happening.

We might see concurrent writes in two places:

  1. If the server application doesn't guard writes when writing to an agent connection.
  2. If the OpAMP server library is responding to a message from the agent while the server application is also sending a message using its reference to the agent connection obtained from OnConnectedFunc.

In my opinion it would make sense to prevent concurrent writes within the library to relieve the application from having to worry about this detail. This does necessitate a mutex per connection, but I don't see any significant performance impact from the mutexes.

Without the mutex in wsConnection:

go test -bench=. -run=^# -count=5
goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opamp-go/server
cpu: 12th Gen Intel(R) Core(TM) i9-12900
BenchmarkSendToClient-24           14610            118877 ns/op
BenchmarkSendToClient-24           17510            326918 ns/op
BenchmarkSendToClient-24           16641            267099 ns/op
BenchmarkSendToClient-24           16287            239085 ns/op
BenchmarkSendToClient-24           16243            238157 ns/op
PASS
ok      github.com/open-telemetry/opamp-go/server       23.877s

With the mutex:

go test -bench=. -run=^# -count=5
goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opamp-go/server
cpu: 12th Gen Intel(R) Core(TM) i9-12900
BenchmarkSendToClient-24           15170            175727 ns/op
BenchmarkSendToClient-24           16916            307997 ns/op
BenchmarkSendToClient-24           16446            249592 ns/op
BenchmarkSendToClient-24           16563            260181 ns/op
BenchmarkSendToClient-24           17136            315825 ns/op
PASS
ok      github.com/open-telemetry/opamp-go/server       25.690s

This is a fairly rudimentary solution, but it does solve the issue I was seeing and is straightforward. I'm open to more sophisticated ways of managing the write concurrency if there's a deficiency with this approach.

This will prevent concurrent writes both in server implementations
that store a reference to the client from the OnConnectedFunc callback.
Concurrent writes can either occur when the server library
responds to agent messages or when the implementation
tries to send messages to the server on different threads.
@evan-bradley evan-bradley requested a review from a team September 15, 2023 15:54
@codecov
Copy link

codecov bot commented Sep 15, 2023

case <-timeout.Done():
t.Error("Client failed to connect before timeout")
default:
if _, ok := srvConnVal.Load().(types.Connection); ok == true {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get a linter warning that ok is unused if I just have ok as the if-statement condition. ok == true fixes the warning, not sure how else to address this.

Copy link
Member

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for fixing.

@tigrannajaryan
Copy link
Member

Hint: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat is very useful for showing before/after benchmark comparison.

Here is what it produces for your benchmark results:

name             old time/op  new time/op  delta
SendToClient-24   238µs ±50%   262µs ±33%   ~     (p=0.690 n=5+5)

Looks like it is close, but could use a more stable execution environment for benchmarking. :-)

@tigrannajaryan tigrannajaryan merged commit fd3066f into open-telemetry:main Sep 20, 2023
@tigrannajaryan
Copy link
Member

@evan-bradley opamp-go v0.9.0 released: https://github.com/open-telemetry/opamp-go/releases/tag/v0.9.0

evan-bradley added a commit to open-telemetry/opentelemetry-collector-contrib that referenced this pull request Dec 12, 2023
**Description:**

Add tests that verify the Supervisor's behavior using a real Collector
binary against an OpAMP server written with the opamp-go library. This
tests a basic set of functionality mostly based on what is currently
listed as implemented in the Supervisor's feature table.

Right now these tests start the Supervisor using the Go API to avoid
process handling, but could be changed to start it through a binary in
the future. I've placed these tests in the E2E testing workflow even
though they run fairly quickly since they depend on a built Collector
binary, may become more expansive in the future, and don't fit in any of
the existing jobs in the `build-and-test` workflow. I'm open to placing
them in another location if we'd prefer them elsewhere.

This also updates opamp-go to v0.9.0 to take advantage of
open-telemetry/opamp-go#200.

**Link to tracking Issue:**

Resolves
#24292
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants