-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added connection_pool_size
configuration property (preview)
#276
Conversation
We have three things that are configurable from `requests` connection pooling perspective: * `pool_connections` - Number of urllib3 connection pools to cache before discarding the least recently used pool. Python requests default value is 10. This PR increases it to 20. * `pool_maxsize` - The maximum number of connections to save in the pool. Improves performance in multithreaded situations. For now, we're setting it to the same value as connection_pool_size. * `pool_block` - If pool_block is False, then more connections will are created, but not saved after the first use. Block when no free connections are available. urllib3 ensures that no more than pool_maxsize connections are used at a time. Prevents platform from flooding. By default, requests library doesn't block. We start with blocking. This PR introduces `connection_pool_size` configuration property, which sets `pool_connections` and `pool_maxsize` to the same value. If not set, it uses 20 instead of default 10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on increasing connection pool size & blocking. However, I don't know about increasing # of connection pools.
databricks/sdk/core.py
Outdated
# Number of urllib3 connection pools to cache before discarding the least | ||
# recently used pool. Python requests default value is 10. | ||
pool_connections = cfg.connection_pool_size | ||
if pool_connections is None: | ||
pool_connections = 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the need to increase the connection pool size to support concurrent requests. However, why do we need multiple connection pools? Looking at requests
, HttpAdapter's underlying PoolManager uses the same pool based on the <host,port,scheme> tuple, which for a single client will only be one endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also do think that we need just one pool, but somehow the default client was using the same value for both.
how about exposing two configuration properties, marking them experimental, and figure out the best defaults later? single SDK client will query only one host.
how should we name pool_maxsize
? connection_pool_max
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default uses the same value for both because it uses the same constant for each parameter as default values in its constructor:
def __init__(
self,
pool_connections=DEFAULT_POOLSIZE,
pool_maxsize=DEFAULT_POOLSIZE,
max_retries=DEFAULT_RETRIES,
pool_block=DEFAULT_POOLBLOCK,
):
DEFAULT_POOLSIZE
is set to 10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit on naming, but otherwise the change seems fine.
databricks/sdk/core.py
Outdated
connection_pool_size: int = ConfigAttribute() | ||
connection_pool_max_size: int = ConfigAttribute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my suggestion for the naming:
max_connection_pools
: maximum number of connection pools used by the HttpAdapter.max_connections_per_pool
: maximum number of connections per pool used by the HttpAdapter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. let's get it shipped
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #276 +/- ##
==========================================
+ Coverage 52.55% 52.57% +0.02%
==========================================
Files 33 33
Lines 19823 19833 +10
==========================================
+ Hits 10418 10428 +10
Misses 9405 9405
☔ View full report in Codecov by Sentry. |
* Added `connection_pool_size` configuration property (preview) ([#276](#276)). * Fixed OAuth M2M corner case in `WorkspaceClient` where `DATABRICKS_ACCOUNT_ID` is present in the environment ([#273](#273)). API Changes: * Changed `create()` method for [a.account_storage_credentials](https://databricks-sdk-py.readthedocs.io/en/latest/account/account_storage_credentials.html) account-level service to return `databricks.sdk.service.catalog.AccountsStorageCredentialInfo` dataclass. * Changed `get()` method for [a.account_storage_credentials](https://databricks-sdk-py.readthedocs.io/en/latest/account/account_storage_credentials.html) account-level service to return `databricks.sdk.service.catalog.AccountsStorageCredentialInfo` dataclass. * Changed `update()` method for [a.account_storage_credentials](https://databricks-sdk-py.readthedocs.io/en/latest/account/account_storage_credentials.html) account-level service to return `databricks.sdk.service.catalog.AccountsStorageCredentialInfo` dataclass. * Changed `create()` method for [w.connections](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/connections.html) workspace-level service with new required argument order. * Changed `update()` method for [w.connections](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/connections.html) workspace-level service with new required argument order. * Removed `options_kvpairs` field for `databricks.sdk.service.catalog.ConnectionInfo`. * Removed `properties_kvpairs` field for `databricks.sdk.service.catalog.ConnectionInfo`. * Added `options` field for `databricks.sdk.service.catalog.ConnectionInfo`. * Added `properties` field for `databricks.sdk.service.catalog.ConnectionInfo`. * Added `provisioning_state` field for `databricks.sdk.service.catalog.ConnectionInfo`. * Added `securable_kind` field for `databricks.sdk.service.catalog.ConnectionInfo`. * Added `securable_type` field for `databricks.sdk.service.catalog.ConnectionInfo`. * Removed `options_kvpairs` field for `databricks.sdk.service.catalog.CreateConnection`. * Removed `properties_kvpairs` field for `databricks.sdk.service.catalog.CreateConnection`. * Added `options` field for `databricks.sdk.service.catalog.CreateConnection`. * Added `properties` field for `databricks.sdk.service.catalog.CreateConnection`. * Changed `algorithm` field for `databricks.sdk.service.catalog.SseEncryptionDetails` to no longer be required. * Removed `options_kvpairs` field for `databricks.sdk.service.catalog.UpdateConnection`. * Added `options` field for `databricks.sdk.service.catalog.UpdateConnection`. * Added `databricks.sdk.service.catalog.AccountsStorageCredentialInfo` dataclass. * Added `databricks.sdk.service.catalog.ConnectionInfoSecurableKind` dataclass. * Added `databricks.sdk.service.catalog.ProvisioningState` dataclass. * Added `data_security_mode` field for `databricks.sdk.service.compute.CreateCluster`. * Added `docker_image` field for `databricks.sdk.service.compute.CreateCluster`. * Added `single_user_name` field for `databricks.sdk.service.compute.CreateCluster`. * Removed `schema` field for `databricks.sdk.service.iam.PartialUpdate`. * Added `schemas` field for `databricks.sdk.service.iam.PartialUpdate`. OpenAPI SHA: 1e3533f94335f0e6c5d9262bc1fea95b3ddcb0e1, Date: 2023-08-11
* Added `connection_pool_size` configuration property (preview) ([#276](#276)). * Fixed OAuth M2M corner case in `WorkspaceClient` where `DATABRICKS_ACCOUNT_ID` is present in the environment ([#273](#273)). API Changes: * Changed `create()` method for [a.account_storage_credentials](https://databricks-sdk-py.readthedocs.io/en/latest/account/account_storage_credentials.html) account-level service to return `databricks.sdk.service.catalog.AccountsStorageCredentialInfo` dataclass. * Changed `get()` method for [a.account_storage_credentials](https://databricks-sdk-py.readthedocs.io/en/latest/account/account_storage_credentials.html) account-level service to return `databricks.sdk.service.catalog.AccountsStorageCredentialInfo` dataclass. * Changed `update()` method for [a.account_storage_credentials](https://databricks-sdk-py.readthedocs.io/en/latest/account/account_storage_credentials.html) account-level service to return `databricks.sdk.service.catalog.AccountsStorageCredentialInfo` dataclass. * Changed `create()` method for [w.connections](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/connections.html) workspace-level service with new required argument order. * Changed `update()` method for [w.connections](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/connections.html) workspace-level service with new required argument order. * Removed `options_kvpairs` field for `databricks.sdk.service.catalog.ConnectionInfo`. * Removed `properties_kvpairs` field for `databricks.sdk.service.catalog.ConnectionInfo`. * Added `options` field for `databricks.sdk.service.catalog.ConnectionInfo`. * Added `properties` field for `databricks.sdk.service.catalog.ConnectionInfo`. * Added `provisioning_state` field for `databricks.sdk.service.catalog.ConnectionInfo`. * Added `securable_kind` field for `databricks.sdk.service.catalog.ConnectionInfo`. * Added `securable_type` field for `databricks.sdk.service.catalog.ConnectionInfo`. * Removed `options_kvpairs` field for `databricks.sdk.service.catalog.CreateConnection`. * Removed `properties_kvpairs` field for `databricks.sdk.service.catalog.CreateConnection`. * Added `options` field for `databricks.sdk.service.catalog.CreateConnection`. * Added `properties` field for `databricks.sdk.service.catalog.CreateConnection`. * Changed `algorithm` field for `databricks.sdk.service.catalog.SseEncryptionDetails` to no longer be required. * Removed `options_kvpairs` field for `databricks.sdk.service.catalog.UpdateConnection`. * Added `options` field for `databricks.sdk.service.catalog.UpdateConnection`. * Added `databricks.sdk.service.catalog.AccountsStorageCredentialInfo` dataclass. * Added `databricks.sdk.service.catalog.ConnectionInfoSecurableKind` dataclass. * Added `databricks.sdk.service.catalog.ProvisioningState` dataclass. * Added `data_security_mode` field for `databricks.sdk.service.compute.CreateCluster`. * Added `docker_image` field for `databricks.sdk.service.compute.CreateCluster`. * Added `single_user_name` field for `databricks.sdk.service.compute.CreateCluster`. * Removed `schema` field for `databricks.sdk.service.iam.PartialUpdate`. * Added `schemas` field for `databricks.sdk.service.iam.PartialUpdate`. OpenAPI SHA: 1e3533f94335f0e6c5d9262bc1fea95b3ddcb0e1, Date: 2023-08-11
We have three things that are configurable from
requests
connection pooling perspective:pool_connections
- Number of urllib3 connection pools to cache before discarding the least recently used pool. Python requests default value is 10. This PR increases it to 20.pool_maxsize
- The maximum number of connections to save in the pool. Improves performance in multithreaded situations. For now, we're setting it to the same value as connection_pool_size.pool_block
- If pool_block is False, then more connections will are created, but not saved after the first use. Block when no free connections are available. urllib3 ensures that no more than pool_maxsize connections are used at a time. Prevents platform from flooding. By default, requests library doesn't block. We start with blocking.This PR introduces two new configuration properties:
connection_pool_size
configuration property, which setspool_connections
. Defaults to 20.connection_pool_max_size
, which setspool_maxsize
. Defaults to the same value asconnection_pool_size
.