You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CoordinatorState, in internal/pkg/agent/application/coordinator/state.go, has several race conditions / potential blockages:
Congestion in one subscriber can block the whole Coordinator. This is already noted in (*CoordinatorState).Subscribe, where the documentation says // Note: Not reading from a subscription channel will cause the Coordinator to block.. (The primary client of the CoordinatorState change notifications is the ElasticAgentControl.StateWatch RPC stream, which is subject to congestion based on network conditions.)
Multiple state changes in a short interval can be sent out of order, causing subscribers to finalize on the wrong value: a mutex lock on subMx ensures that a changed state is sent to all subscribers before advancing to the next one, but if multiple changes accumulate while it is being sent there is nothing that checks which order the accumulated changes are sent in.
Related to the first two, because every change tries to send fully to every subscriber, even when states are sent in the right order they may not be the most current values at the time they are sent.
Subscribers may not receive state changes that happen shortly after they subscribe: the subscriber list isn't updated until after the initial state has been queued, and changes that happen during that interval will be missed.
Subscribers that become congested may never receive some updates even after they recover -- after a one-second timeout, that state change will no longer be sent to that subscriber.
This should probably be addressed by making state notifications a variable-size select with reflect.Select -- that way updates will always happen immediately for any active listeners, and a congested subscriber won't affect other subscribers or the Coordinator.
The text was updated successfully, but these errors were encountered:
CoordinatorState
, ininternal/pkg/agent/application/coordinator/state.go
, has several race conditions / potential blockages:Coordinator
. This is already noted in(*CoordinatorState).Subscribe
, where the documentation says// Note: Not reading from a subscription channel will cause the Coordinator to block.
. (The primary client of theCoordinatorState
change notifications is theElasticAgentControl.StateWatch
RPC stream, which is subject to congestion based on network conditions.)subMx
ensures that a changed state is sent to all subscribers before advancing to the next one, but if multiple changes accumulate while it is being sent there is nothing that checks which order the accumulated changes are sent in.This should probably be addressed by making state notifications a variable-size select with
reflect.Select
-- that way updates will always happen immediately for any active listeners, and a congested subscriber won't affect other subscribers or the Coordinator.The text was updated successfully, but these errors were encountered: