Expose connection_pool_maxsize on Index and add docstrings #415
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
To explore the impact on performance, I want to expose a configuration kwarg for
connection_pool_maxsize
onIndex
.Solution
This
connection_pool_maxsize
value is passed in tourllib3.PoolManager
asmaxsize
. This param controls how many connections are cached for a given host. If we are using a large number of threads to increase parallelism but this maxsize value is relatively small, we can end up taking unnecessary overhead to establish and discard connections beyond the maxsize that are being cached.By default
connection_pool_maxsize
is set tomultiprocessing.cpu_count() * 5
. In Google colab, cpu count is only 2 so this is fairly limiting.Usage
Type of Change
Test Plan
I ran some local performance tests and saw this does have an impact to performance.