Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics emitted from BulkExecutor responsible for dynamic batch size tuning. #43716

Merged
merged 10 commits into from
Jan 9, 2025

Large diffs are not rendered by default.

9 changes: 9 additions & 0 deletions sdk/cosmos/azure-cosmos/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,15 @@
* Added support to allow changing http2 max connection pool size with system property `COSMOS.HTTP2_MAX_CONNECTION_POOL_SIZE` and system variable `COSMOS_HTTP2_MAX_CONNECTION_POOL_SIZE`. - [PR 42947](https://github.com/Azure/azure-sdk-for-java/pull/42947)
* Added support to allow changing http2 max connection pool size with system property `COSMOS.HTTP2_MIN_CONNECTION_POOL_SIZE` and system variable `COSMOS_HTTP2_MIN_CONNECTION_POOL_SIZE`. - [PR 42947](https://github.com/Azure/azure-sdk-for-java/pull/42947)
* Added options to fine-tune settings for bulk operations. - [PR 43509](https://github.com/Azure/azure-sdk-for-java/pull/43509)
* Added the following metrics. - See [PR 43716](https://github.com/Azure/azure-sdk-for-java/pull/43716)
*`cosmos.client.req.gw.bulkOpCountPerEvaluation`
*`cosmos.client.req.gw.bulkOpRetriedCountPerEvaluation`
*`cosmos.client.req.gw.bulkGlobalOpCount`
*`cosmos.client.req.gw.bulkTargetMaxMicroBatchSize`
*`cosmos.client.req.rntbd.bulkOpCountPerEvaluation`
*`cosmos.client.req.rntbd.bulkOpRetriedCountPerEvaluation`
*`cosmos.client.req.rntbd.bulkGlobalOpCount`
*`cosmos.client.req.rntbd.bulkTargetMaxMicroBatchSize`

### 4.65.0 (2024-11-19)

Expand Down
52 changes: 30 additions & 22 deletions sdk/cosmos/azure-cosmos/docs/Metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,31 +113,39 @@ The micrometer.io documentation has a list with samples on how to create a `Mete

### Metrics for requests to the Cosmos DB Gateway endpoint

| Name | Unit | Default Percentiles | Description |
|--------------------------------------|-------------|------------------------|------------------------------------------------------------|
| cosmos.client.req.gw.requests | # requests | None | Number of requests |
| cosmos.client.req.gw.latency | duration | 95th, 99th + histogram | End-to-end duration spent for processing the request |
| cosmos.client.req.gw.timeline.xxx | duration | 95th, 99th + histogram | Duration spent in different stages of the request pipeline |
| cosmos.client.req.gw.actualItemCount | # | None | For feed operations (query, readAll, readMany, change feed) and batch operations this meter capture the actual item count in responses from the service |
| cosmos.client.req.reqPayloadSize | bytes | None | The request payload size in bytes |
| cosmos.client.req.rspPayloadSize | bytes | None | The response payload size in bytes |
| Name | Unit | Default Percentiles | Description |
|------------------------------------------------------|------------|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
| cosmos.client.req.gw.requests | # requests | None | Number of requests |
| cosmos.client.req.gw.latency | duration | 95th, 99th + histogram | End-to-end duration spent for processing the request |
| cosmos.client.req.gw.timeline.xxx | duration | 95th, 99th + histogram | Duration spent in different stages of the request pipeline |
| cosmos.client.req.gw.actualItemCount | # | None | For feed operations (query, readAll, readMany, change feed) and batch operations this meter capture the actual item count in responses from the service |
| cosmos.client.req.reqPayloadSize | bytes | None | The request payload size in bytes |
| cosmos.client.req.rspPayloadSize | bytes | None | The response payload size in bytes |
| cosmos.client.req.gw.bulkOpCountPerEvaluation | # | None | Batch operation count (executed as part of bulk operation) per batch size evaluation cycle |
| cosmos.client.req.gw.bulkOpRetriedCountPerEvaluation | # | None | Batch operation retried count (executed as part of bulk operation) per batch size evaluation cycle |
| cosmos.client.req.gw.bulkGlobalOpCount | # | None | Overall Batch operation count (executed as part of bulk operation) per physical partition |
| cosmos.client.req.gw.bulkTargetMaxMicroBatchSize | # | None | Target max batch size for Batch operation executed as part of bulk operation |

### Metrics for communication with the Cosmos DB backend replicas via direct TCP (aka RNTBD)

| Name | Unit | Percentiles | Description |
|----------------------------------------------------------|-------------|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| cosmos.client.req.rntbd.requests | # requests | None | Number of requests |
| cosmos.client.req.rntbd.latency | duration | 95th, 99th + histogram | End-to-end duration spent for processing the request |
| cosmos.client.req.rntbd.backendLatency | duration | 95th, 99th + histogram | Duration spent for processing the request in the Cosmos DB service endpoint (self-attested by backend) |
| cosmos.client.req.rntbd.timeline.xxx | duration | 95th, 99th + histogram | Duration spent in different stages of the request pipeline |
| cosmos.client.req.rntbd.actualItemCount | # | None | For feed operations (query, readAll, readMany, change feed) and batch operations this meter capture the actual item count in responses from the service |
| cosmos.client.req.reqPayloadSize | bytes | None | The request payload size in bytes |
| cosmos.client.req.rspPayloadSize | bytes | None | The response payload size in bytes |
| cosmos.client.req.rntbd.addressResolution.requests | # requests | None | Number of physical address resolution requests of replica for a certain partition |
| cosmos.client.req.rntbd.addressResolution.latency | duration | 95th, 99th + histogram | Duration spent for resolving physical addresses of replica for a certain partition |
| cosmos.client.req.rntbd.stats.endpoint.acquiredChannels | # | None | Number of actively used TCP connections per Cosmos DB service endpoint |
| cosmos.client.req.rntbd.stats.endpoint.availableChannels | # | None | Number of established TCP connections per Cosmos DB service endpoint that are not actively used. The total number of established connections would be availableChannels + acquiredChannels. |
| cosmos.client.req.rntbd.stats.endpoint.inflightRequests | # | 95th, 99th + histogram | Number of concurrently processed requests per Cosmos DB service endpoint |
| Name | Unit | Percentiles | Description |
|----------------------------------------------------------|------------|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| cosmos.client.req.rntbd.requests | # requests | None | Number of requests |
| cosmos.client.req.rntbd.latency | duration | 95th, 99th + histogram | End-to-end duration spent for processing the request |
| cosmos.client.req.rntbd.backendLatency | duration | 95th, 99th + histogram | Duration spent for processing the request in the Cosmos DB service endpoint (self-attested by backend) |
| cosmos.client.req.rntbd.timeline.xxx | duration | 95th, 99th + histogram | Duration spent in different stages of the request pipeline |
| cosmos.client.req.rntbd.actualItemCount | # | None | For feed operations (query, readAll, readMany, change feed) and batch operations this meter capture the actual item count in responses from the service |
| cosmos.client.req.reqPayloadSize | bytes | None | The request payload size in bytes |
| cosmos.client.req.rspPayloadSize | bytes | None | The response payload size in bytes |
| cosmos.client.req.rntbd.addressResolution.requests | # requests | None | Number of physical address resolution requests of replica for a certain partition |
| cosmos.client.req.rntbd.addressResolution.latency | duration | 95th, 99th + histogram | Duration spent for resolving physical addresses of replica for a certain partition |
| cosmos.client.req.rntbd.stats.endpoint.acquiredChannels | # | None | Number of actively used TCP connections per Cosmos DB service endpoint |
| cosmos.client.req.rntbd.stats.endpoint.availableChannels | # | None | Number of established TCP connections per Cosmos DB service endpoint that are not actively used. The total number of established connections would be availableChannels + acquiredChannels. |
| cosmos.client.req.rntbd.stats.endpoint.inflightRequests | # | 95th, 99th + histogram | Number of concurrently processed requests per Cosmos DB service endpoint |
| cosmos.client.req.rntbd.bulkOpCountPerEvaluation | # | None | Batch operation count (executed as part of bulk operation) per batch size evaluation cycle |
| cosmos.client.req.rntbd.bulkOpRetriedCountPerEvaluation | # | None | Batch operation retried count (executed as part of bulk operation) per batch size evaluation cycle |
| cosmos.client.req.rntbd.bulkGlobalOpCount | # | None | Overall Batch operation count (executed as part of bulk operation) per physical partition |
| cosmos.client.req.rntbd.bulkTargetMaxMicroBatchSize | # | None | Target max batch size for Batch operation executed as part of bulk operation |

### Metrics for RNTBD service endpoints (across operations, no operation-level tags)

Expand Down
Loading
Loading