Skip to content

Commit

Permalink
Add metrics emitted from BulkExecutor responsible for dynamic batch…
Browse files Browse the repository at this point in the history
… size tuning. (#43716)

* Comment out failing tests.

* Adding javadoc.

* Adding targetMaxMicroBatchSize as client telemetry metric.

* Remove unnecessary public API from `CosmosBatchResponse`.

* Remove unnecessary public API from `CosmosBatchResponse`.

* Added javadoc and modified CHANGELOG.md

* Added javadoc.

* Fix tests.

* Document new metrics in Metrics.md
  • Loading branch information
jeet1995 authored Jan 9, 2025
1 parent e948b62 commit 71f176d
Show file tree
Hide file tree
Showing 11 changed files with 819 additions and 134 deletions.

Large diffs are not rendered by default.

9 changes: 9 additions & 0 deletions sdk/cosmos/azure-cosmos/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,15 @@
* Added support to allow changing http2 max connection pool size with system property `COSMOS.HTTP2_MAX_CONNECTION_POOL_SIZE` and system variable `COSMOS_HTTP2_MAX_CONNECTION_POOL_SIZE`. - [PR 42947](https://github.com/Azure/azure-sdk-for-java/pull/42947)
* Added support to allow changing http2 max connection pool size with system property `COSMOS.HTTP2_MIN_CONNECTION_POOL_SIZE` and system variable `COSMOS_HTTP2_MIN_CONNECTION_POOL_SIZE`. - [PR 42947](https://github.com/Azure/azure-sdk-for-java/pull/42947)
* Added options to fine-tune settings for bulk operations. - [PR 43509](https://github.com/Azure/azure-sdk-for-java/pull/43509)
* Added the following metrics. - See [PR 43716](https://github.com/Azure/azure-sdk-for-java/pull/43716)
*`cosmos.client.req.gw.bulkOpCountPerEvaluation`
*`cosmos.client.req.gw.bulkOpRetriedCountPerEvaluation`
*`cosmos.client.req.gw.bulkGlobalOpCount`
*`cosmos.client.req.gw.bulkTargetMaxMicroBatchSize`
*`cosmos.client.req.rntbd.bulkOpCountPerEvaluation`
*`cosmos.client.req.rntbd.bulkOpRetriedCountPerEvaluation`
*`cosmos.client.req.rntbd.bulkGlobalOpCount`
*`cosmos.client.req.rntbd.bulkTargetMaxMicroBatchSize`

### 4.65.0 (2024-11-19)

Expand Down
52 changes: 30 additions & 22 deletions sdk/cosmos/azure-cosmos/docs/Metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,31 +113,39 @@ The micrometer.io documentation has a list with samples on how to create a `Mete

### Metrics for requests to the Cosmos DB Gateway endpoint

| Name | Unit | Default Percentiles | Description |
|--------------------------------------|-------------|------------------------|------------------------------------------------------------|
| cosmos.client.req.gw.requests | # requests | None | Number of requests |
| cosmos.client.req.gw.latency | duration | 95th, 99th + histogram | End-to-end duration spent for processing the request |
| cosmos.client.req.gw.timeline.xxx | duration | 95th, 99th + histogram | Duration spent in different stages of the request pipeline |
| cosmos.client.req.gw.actualItemCount | # | None | For feed operations (query, readAll, readMany, change feed) and batch operations this meter capture the actual item count in responses from the service |
| cosmos.client.req.reqPayloadSize | bytes | None | The request payload size in bytes |
| cosmos.client.req.rspPayloadSize | bytes | None | The response payload size in bytes |
| Name | Unit | Default Percentiles | Description |
|------------------------------------------------------|------------|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
| cosmos.client.req.gw.requests | # requests | None | Number of requests |
| cosmos.client.req.gw.latency | duration | 95th, 99th + histogram | End-to-end duration spent for processing the request |
| cosmos.client.req.gw.timeline.xxx | duration | 95th, 99th + histogram | Duration spent in different stages of the request pipeline |
| cosmos.client.req.gw.actualItemCount | # | None | For feed operations (query, readAll, readMany, change feed) and batch operations this meter capture the actual item count in responses from the service |
| cosmos.client.req.reqPayloadSize | bytes | None | The request payload size in bytes |
| cosmos.client.req.rspPayloadSize | bytes | None | The response payload size in bytes |
| cosmos.client.req.gw.bulkOpCountPerEvaluation | # | None | Batch operation count (executed as part of bulk operation) per batch size evaluation cycle |
| cosmos.client.req.gw.bulkOpRetriedCountPerEvaluation | # | None | Batch operation retried count (executed as part of bulk operation) per batch size evaluation cycle |
| cosmos.client.req.gw.bulkGlobalOpCount | # | None | Overall Batch operation count (executed as part of bulk operation) per physical partition |
| cosmos.client.req.gw.bulkTargetMaxMicroBatchSize | # | None | Target max batch size for Batch operation executed as part of bulk operation |

### Metrics for communication with the Cosmos DB backend replicas via direct TCP (aka RNTBD)

| Name | Unit | Percentiles | Description |
|----------------------------------------------------------|-------------|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| cosmos.client.req.rntbd.requests | # requests | None | Number of requests |
| cosmos.client.req.rntbd.latency | duration | 95th, 99th + histogram | End-to-end duration spent for processing the request |
| cosmos.client.req.rntbd.backendLatency | duration | 95th, 99th + histogram | Duration spent for processing the request in the Cosmos DB service endpoint (self-attested by backend) |
| cosmos.client.req.rntbd.timeline.xxx | duration | 95th, 99th + histogram | Duration spent in different stages of the request pipeline |
| cosmos.client.req.rntbd.actualItemCount | # | None | For feed operations (query, readAll, readMany, change feed) and batch operations this meter capture the actual item count in responses from the service |
| cosmos.client.req.reqPayloadSize | bytes | None | The request payload size in bytes |
| cosmos.client.req.rspPayloadSize | bytes | None | The response payload size in bytes |
| cosmos.client.req.rntbd.addressResolution.requests | # requests | None | Number of physical address resolution requests of replica for a certain partition |
| cosmos.client.req.rntbd.addressResolution.latency | duration | 95th, 99th + histogram | Duration spent for resolving physical addresses of replica for a certain partition |
| cosmos.client.req.rntbd.stats.endpoint.acquiredChannels | # | None | Number of actively used TCP connections per Cosmos DB service endpoint |
| cosmos.client.req.rntbd.stats.endpoint.availableChannels | # | None | Number of established TCP connections per Cosmos DB service endpoint that are not actively used. The total number of established connections would be availableChannels + acquiredChannels. |
| cosmos.client.req.rntbd.stats.endpoint.inflightRequests | # | 95th, 99th + histogram | Number of concurrently processed requests per Cosmos DB service endpoint |
| Name | Unit | Percentiles | Description |
|----------------------------------------------------------|------------|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| cosmos.client.req.rntbd.requests | # requests | None | Number of requests |
| cosmos.client.req.rntbd.latency | duration | 95th, 99th + histogram | End-to-end duration spent for processing the request |
| cosmos.client.req.rntbd.backendLatency | duration | 95th, 99th + histogram | Duration spent for processing the request in the Cosmos DB service endpoint (self-attested by backend) |
| cosmos.client.req.rntbd.timeline.xxx | duration | 95th, 99th + histogram | Duration spent in different stages of the request pipeline |
| cosmos.client.req.rntbd.actualItemCount | # | None | For feed operations (query, readAll, readMany, change feed) and batch operations this meter capture the actual item count in responses from the service |
| cosmos.client.req.reqPayloadSize | bytes | None | The request payload size in bytes |
| cosmos.client.req.rspPayloadSize | bytes | None | The response payload size in bytes |
| cosmos.client.req.rntbd.addressResolution.requests | # requests | None | Number of physical address resolution requests of replica for a certain partition |
| cosmos.client.req.rntbd.addressResolution.latency | duration | 95th, 99th + histogram | Duration spent for resolving physical addresses of replica for a certain partition |
| cosmos.client.req.rntbd.stats.endpoint.acquiredChannels | # | None | Number of actively used TCP connections per Cosmos DB service endpoint |
| cosmos.client.req.rntbd.stats.endpoint.availableChannels | # | None | Number of established TCP connections per Cosmos DB service endpoint that are not actively used. The total number of established connections would be availableChannels + acquiredChannels. |
| cosmos.client.req.rntbd.stats.endpoint.inflightRequests | # | 95th, 99th + histogram | Number of concurrently processed requests per Cosmos DB service endpoint |
| cosmos.client.req.rntbd.bulkOpCountPerEvaluation | # | None | Batch operation count (executed as part of bulk operation) per batch size evaluation cycle |
| cosmos.client.req.rntbd.bulkOpRetriedCountPerEvaluation | # | None | Batch operation retried count (executed as part of bulk operation) per batch size evaluation cycle |
| cosmos.client.req.rntbd.bulkGlobalOpCount | # | None | Overall Batch operation count (executed as part of bulk operation) per physical partition |
| cosmos.client.req.rntbd.bulkTargetMaxMicroBatchSize | # | None | Target max batch size for Batch operation executed as part of bulk operation |

### Metrics for RNTBD service endpoints (across operations, no operation-level tags)

Expand Down
Loading

0 comments on commit 71f176d

Please sign in to comment.