-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FNS polish - metrics, max msg size, default flag values #2517
Conversation
WalkthroughThis pull request introduces several changes across multiple files related to gRPC streaming configuration and metrics management. Key modifications include updating default values for gRPC streaming constants, enhancing the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (5)
protocol/streaming/ws/websocket_server.go (1)
59-61
: Approve with suggestions: Consider making the message size limit configurableThe addition of a maximum message size limit for WebSocket connections is a good security practice. It helps prevent potential denial-of-service attacks and improves overall system stability.
However, consider the following improvements:
Make the limit configurable, perhaps through an environment variable or configuration file. This would allow for easier adjustments without code changes.
Add a comment explaining the rationale behind the 10 MB limit.
Example implementation:
const defaultMaxMessageSize = 10 * 1024 * 1024 // 10 MB // MaxMessageSize can be set through an environment variable var MaxMessageSize = getEnvInt("WS_MAX_MESSAGE_SIZE", defaultMaxMessageSize) // In the Handler method: // Set maximum message size to prevent potential DoS attacks conn.SetReadLimit(int64(MaxMessageSize))protocol/lib/metrics/metric_keys.go (1)
88-88
: Approved: New SubscriptionId metric keyThe addition of
SubscriptionId
as a metric key is a valuable improvement. It aligns perfectly with the PR objective of tagging metrics by subscription ID, which will enable more granular analysis of system behavior.Consider renaming the constant to
SubscriptionID
(with uppercase "ID") to be consistent with common Go naming conventions for acronyms.protocol/app/flags/flags_test.go (1)
122-124
: LGTM: Consistent update of gRPC streaming parameters with optimistic execution.The increase in
GrpcStreamingMaxBatchSize
andGrpcStreamingMaxChannelBufferSize
from 2000 to 10000 is consistently applied here, demonstrating compatibility with optimistic execution.There's a minor typo in the test case name: "canbe" should be "can be". Consider updating it to improve readability:
- "success - optimistic execution canbe enabled with gRPC streaming": { + "success - optimistic execution can be enabled with gRPC streaming": {protocol/streaming/full_node_streaming_manager.go (2)
723-723
: Monitor buffer state with appropriate metric emission frequencyWhile emitting metrics after adding updates to the cache is useful for monitoring, ensure that the frequency of
sm.EmitMetrics()
calls does not introduce performance overhead, especially if updates are added to the cache very frequently.
761-761
: Prevent excessive metric emissions during error handlingIn
RemoveSubscriptionsAndClearBufferIfFull
, callingsm.EmitMetrics()
immediately after clearing the buffer and subscriptions may not be necessary, as the buffer is now empty. Evaluate whether this metric emission provides additional value or if it can be omitted to reduce overhead.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- protocol/app/flags/flags.go (1 hunks)
- protocol/app/flags/flags_test.go (4 hunks)
- protocol/lib/metrics/metric_keys.go (2 hunks)
- protocol/streaming/full_node_streaming_manager.go (9 hunks)
- protocol/streaming/ws/websocket_server.go (1 hunks)
🧰 Additional context used
🔇 Additional comments (12)
protocol/lib/metrics/metric_keys.go (2)
76-76
: Approved: Improved metric for subaccount updatesThe change from
GrpcSendSubaccountSnapshotLatency
toGrpcSendSubaccountUpdateCount
is a good improvement. This new metric will provide more actionable data on the frequency of subaccount updates, which aligns well with the PR objective of enhancing metrics. It will help in monitoring system activity more effectively, especially in light of the mentioned increase in traffic from subaccounts.
Line range hint
1-94
: Summary: Effective metrics enhancementsThe changes to this file effectively support the PR objectives by:
- Shifting focus from latency to count for subaccount updates, providing more actionable data.
- Introducing a subscription ID metric, enabling more granular analysis.
These modifications will help in better monitoring and understanding system behavior, especially in light of the increased traffic from subaccounts mentioned in the PR description.
protocol/app/flags/flags.go (3)
75-76
: Summary: Increased gRPC streaming capacityThe changes to
DefaultGrpcStreamingMaxBatchSize
andDefaultGrpcStreamingMaxChannelBufferSize
are consistent and align with the PR objectives to handle increased traffic. These modifications should improve the system's ability to handle higher loads, particularly from subaccounts and taker orders.Key points:
- Both values increased from 2000 to 10000, providing a 5x capacity increase.
- The changes are isolated to default values and don't require modifications to other parts of the code.
- The
Validate
function remains valid as it only checks for positive values.Recommendations:
- Conduct thorough performance testing to ensure the system can handle the increased capacity effectively.
- Monitor system resource usage, particularly memory consumption and CPU utilization, after deployment.
- Consider implementing adaptive scaling mechanisms for these values in future iterations if traffic patterns are highly variable.
76-76
: Approved: Increased DefaultGrpcStreamingMaxChannelBufferSizeThe increase from 2000 to 10000 for
DefaultGrpcStreamingMaxChannelBufferSize
is in line with the PR objectives to handle increased traffic. This change should improve the system's ability to handle bursts of messages without dropping them.However, please consider the following:
- Monitor memory usage, as larger channel buffers will increase memory consumption.
- Keep an eye on message processing delays, as larger queues might lead to increased latency.
- Ensure that the system can effectively process messages at this increased rate to prevent buffer overflow.
To verify the impact of this change and ensure system stability, consider running the following commands:
75-75
: Approved: Increased DefaultGrpcStreamingMaxBatchSizeThe increase from 2000 to 10000 for
DefaultGrpcStreamingMaxBatchSize
aligns with the PR objectives to handle increased traffic. This change should improve throughput for gRPC streaming.However, please consider the following:
- Monitor memory usage, as larger batches may increase memory consumption.
- Keep an eye on latency for processing individual messages within a batch.
- Ensure that clients can handle larger batch sizes effectively.
To verify the impact of this change, consider running the following commands:
protocol/app/flags/flags_test.go (4)
92-93
: LGTM: Updated gRPC streaming parameters align with PR objectives.The increase in
GrpcStreamingMaxBatchSize
andGrpcStreamingMaxChannelBufferSize
from 2000 to 10000 is consistent with the PR's goal of updating default flag values to handle increased traffic from subaccounts and taker orders. This change should allow for processing larger batches and maintaining larger buffer sizes in gRPC streaming.
103-104
: LGTM: Consistent update of gRPC streaming parameters.The increase in
GrpcStreamingMaxBatchSize
andGrpcStreamingMaxChannelBufferSize
from 2000 to 10000 is consistently applied here, maintaining parity with the gRPC-only configuration. This ensures that the updated values are used regardless of whether WebSocket streaming is enabled alongside gRPC streaming.
260-261
: LGTM: Updated default values for gRPC streaming parameters.The increase in the expected default values for
GrpcStreamingMaxBatchSize
andGrpcStreamingMaxChannelBufferSize
from 2000 to 10000 ensures consistency with the actual implementation. This change aligns with the PR's objective of updating default flag values to handle increased traffic.
Line range hint
1-380
: Summary: Consistent updates to gRPC streaming parameters in test cases.The changes in this file consistently update the
GrpcStreamingMaxBatchSize
andGrpcStreamingMaxChannelBufferSize
values from 2000 to 10000 across various test cases. These updates align well with the PR's objective of adjusting default flag values to handle increased traffic from subaccounts and taker orders. The changes are applied consistently in different scenarios, including gRPC-only streaming, combined gRPC and WebSocket streaming, and when optimistic execution is enabled.A minor typo was found in one test case name, which has been noted for correction. Overall, these changes should improve the system's ability to handle larger batches and maintain larger buffer sizes in gRPC streaming, contributing to better performance under increased load.
protocol/streaming/full_node_streaming_manager.go (3)
377-381
: Ensure metric increments align with monitoring objectivesIn the
sendStreamUpdates
method, you're incrementingmetrics.GrpcAddToSubscriptionChannelCount
each time updates are attempted to be sent to a subscription's channel. Verify that this metric provides meaningful insights and aligns with your monitoring goals, without introducing significant overhead.
745-745
: Check buffer overflow handling logicAfter adding subaccount updates to the cache, you're calling
sm.RemoveSubscriptionsAndClearBufferIfFull()
. Verify that the conditions for buffer overflow are correctly set and that this function effectively prevents buffer overflows without unnecessarily disconnecting subscribers.
238-241
: Confirm label correctness for metricsWhen incrementing
metrics.GrpcSendResponseToSubscriberCount
, ensure that the labels applied correctly represent the subscription ID. Accurate labeling is crucial for effective metric aggregation and monitoring.Run the following script to verify that metric labels are correctly applied:
// Buffer is full. Emit metric and drop subscription. | ||
sm.EmitMetrics() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid emitting metrics inside tight loops or error paths
Calling sm.EmitMetrics()
inside the default case when the subscription channel is full could lead to performance issues, especially under high load. Emitting metrics is an I/O operation and might be resource-intensive if called frequently. Consider removing this call or ensuring it's throttled appropriately.
metrics.IncrCounterWithLabels( | ||
metrics.GrpcAddToSubscriptionChannelCount, | ||
1, | ||
metrics.GetLabelForIntValue(metrics.SubscriptionId, int(id)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid duplicate metric increments to prevent misleading data
You're incrementing metrics.GrpcAddToSubscriptionChannelCount
in both sendStreamUpdates
and FlushStreamUpdatesWithLock
. This could result in counting the same event twice, leading to inaccurate metrics. Decide on a single place to increment this counter to ensure the metric reflects the actual number of events accurately.
// Buffer is full. Emit metric and drop subscription. | ||
sm.EmitMetrics() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limit metric emissions in high-frequency error scenarios
Similar to previous comments, calling sm.EmitMetrics()
when a subscription's channel is full within a select default case can lead to performance degradation under high load. Consider removing this call or implementing a rate-limiting mechanism to prevent excessive metric emissions.
metrics.IncrCounter( | ||
metrics.GrpcSendSubaccountUpdateCount, | ||
1, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add labels to GrpcSendSubaccountUpdateCount
for better granularity
Currently, metrics.GrpcSendSubaccountUpdateCount
is incremented without any labels. Adding labels such as the subaccount ID can provide more granular insights into metric data, which can be valuable for debugging and monitoring.
Apply this diff to include the subaccount ID as a label:
metrics.IncrCounter(
- metrics.GrpcSendSubaccountUpdateCount,
- 1,
+ metrics.GrpcSendSubaccountUpdateCount,
+ 1,
+ metrics.GetLabelForStringValue(metrics.SubaccountId, subaccountUpdate.SubaccountId.String()),
)
Committable suggestion was skipped due to low confidence.
metrics.AddSample( | ||
metrics.GrpcStreamNumUpdatesBuffered, | ||
float32(len(sm.streamUpdateCache)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use SetGauge
for gauge metrics instead of AddSample
In the EmitMetrics
method, you replaced metrics.SetGauge
with metrics.AddSample
for the GrpcStreamNumUpdatesBuffered
metric. Since this metric represents the current number of updates buffered, which is a gauge (a value that can go up or down), it would be more appropriate to continue using metrics.SetGauge
to accurately reflect the current state rather than accumulating samples.
Apply this diff to revert to using SetGauge
:
- metrics.AddSample(
+ metrics.SetGauge(
metrics.GrpcStreamNumUpdatesBuffered,
float32(len(sm.streamUpdateCache)),
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
metrics.AddSample( | |
metrics.GrpcStreamNumUpdatesBuffered, | |
float32(len(sm.streamUpdateCache)), | |
metrics.SetGauge( | |
metrics.GrpcStreamNumUpdatesBuffered, | |
float32(len(sm.streamUpdateCache)), |
metrics.AddSampleWithLabels( | ||
metrics.GrpcSubscriptionChannelLength, | ||
float32(len(subscription.updatesChannel)), | ||
metrics.GetLabelForIntValue(metrics.SubscriptionId, int(subscription.subscriptionId)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use SetGaugeWithLabels
for current values
For the GrpcSubscriptionChannelLength
metric, you're now using metrics.AddSampleWithLabels
. If the intention is to record the current length of each subscription's update channel (which is a gauge), consider using metrics.SetGaugeWithLabels
to set the value directly. This provides an accurate snapshot of each channel's length at the time of metric emission.
Apply this diff to use SetGaugeWithLabels
:
- metrics.AddSampleWithLabels(
+ metrics.SetGaugeWithLabels(
metrics.GrpcSubscriptionChannelLength,
float32(len(subscription.updatesChannel)),
metrics.GetLabelForIntValue(metrics.SubscriptionId, int(subscription.subscriptionId)),
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
metrics.AddSampleWithLabels( | |
metrics.GrpcSubscriptionChannelLength, | |
float32(len(subscription.updatesChannel)), | |
metrics.GetLabelForIntValue(metrics.SubscriptionId, int(subscription.subscriptionId)), | |
metrics.SetGaugeWithLabels( | |
metrics.GrpcSubscriptionChannelLength, | |
float32(len(subscription.updatesChannel)), | |
metrics.GetLabelForIntValue(metrics.SubscriptionId, int(subscription.subscriptionId)), |
Porting over changes from https://github.com/dydxprotocol/v4-chain/commits/feat/full-node-streaming/ feature branch
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Chores