-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize local memory arbitration #9649
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
}; | ||
|
||
// Invoked to check if the memory growth will exceed the memory pool's max | ||
// capacity limit or the arbitrator's node capacity limit. | ||
bool checkCapacityGrowth(const MemoryPool& pool, uint64_t targetBytes) const; | ||
bool checkCapacityGrowth(ArbitrationOperation* op) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of passing around the same object everywhere, shall we just pass this operation object to runLocalArbitration() and keep track of an instance of operation of currently running global one? It seems unnecessary to pass this stateful item around for global aribtration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can set the current running arbitration operation as a member of shared arbitrator object but there a quite a few utilities shared between local and global arbitration operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the only shared one is checkFreeCapacity()?
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Left some comments. Also since this changed the arbitration workflow of SharedArbitrator, shall we also put detailed documentation on top of the SharedArbitrator class?
50e0ce7
to
ce4d9f0
Compare
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Parallelize the local memory arbitration execution. The workload flow of memory arbitration process changes as follows: First wait in the arbitration queue to serialize the memory arbitration request processing from the same query pool. Adds ArbitrationQueue data structure for this which contains the wait promises of the arbitration requests from the same query pool. The arbitration queue is protected by arbitrator lock. Second calls runLocalArbitration to run local arbitration which acquires the reader lock of arbitration lock (which is added by this pr for local/global arbitration execution control). Then it ensures the request memory pool is within the capacity limit (this might trigger spill to reclaim the used memory from the request pool itself), and tries to allocate free capacity from the arbitrator or reclaim the free memory from the other queries which is protected by arbitrator lock. Third if runLocalArbitration can't reclaim sufficient memory, then proceeds with runGlobalArbitration which acquires the writer lock of arbitration lock. Then it reclaims the free capacity from the arbitrator or the other queries. And at last reclaims the used memory from the other queries by spilling. ArbitrationOperation is added to contains the state of arbitration request processing to simplify the code implementation Reviewed By: tanjialiang, oerling Differential Revision: D56685200 Pulled By: xiaoxmeng
This pull request was exported from Phabricator. Differential Revision: D56685200 |
d2f1781
to
cef833d
Compare
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Parallelize the local memory arbitration execution. The workload flow of memory arbitration process changes as follows: First wait in the arbitration queue to serialize the memory arbitration request processing from the same query pool. Adds ArbitrationQueue data structure for this which contains the wait promises of the arbitration requests from the same query pool. The arbitration queue is protected by arbitrator lock. Second calls runLocalArbitration to run local arbitration which acquires the reader lock of arbitration lock (which is added by this pr for local/global arbitration execution control). Then it ensures the request memory pool is within the capacity limit (this might trigger spill to reclaim the used memory from the request pool itself), and tries to allocate free capacity from the arbitrator or reclaim the free memory from the other queries which is protected by arbitrator lock. Third if runLocalArbitration can't reclaim sufficient memory, then proceeds with runGlobalArbitration which acquires the writer lock of arbitration lock. Then it reclaims the free capacity from the arbitrator or the other queries. And at last reclaims the used memory from the other queries by spilling. ArbitrationOperation is added to contains the state of arbitration request processing to simplify the code implementation. This PR also adds an option to disable global arbitration and only do local arbitration which can always run in parallel. With this option, for 2hrs stress test (1000 query concurrency at coordinator) with Meta internal production workloads replay, the averaged query execution time has been reduced by ~30% and cpu time is kept the same. The memory arbitration wall time for query without triggering spilled is only 9 mins in total. Reviewed By: tanjialiang, oerling Differential Revision: D56685200 Pulled By: xiaoxmeng
This pull request was exported from Phabricator. Differential Revision: D56685200 |
Summary: Parallelize the local memory arbitration execution. The workload flow of memory arbitration process changes as follows: First wait in the arbitration queue to serialize the memory arbitration request processing from the same query pool. Adds ArbitrationQueue data structure for this which contains the wait promises of the arbitration requests from the same query pool. The arbitration queue is protected by arbitrator lock. Second calls runLocalArbitration to run local arbitration which acquires the reader lock of arbitration lock (which is added by this pr for local/global arbitration execution control). Then it ensures the request memory pool is within the capacity limit (this might trigger spill to reclaim the used memory from the request pool itself), and tries to allocate free capacity from the arbitrator or reclaim the free memory from the other queries which is protected by arbitrator lock. Third if runLocalArbitration can't reclaim sufficient memory, then proceeds with runGlobalArbitration which acquires the writer lock of arbitration lock. Then it reclaims the free capacity from the arbitrator or the other queries. And at last reclaims the used memory from the other queries by spilling. ArbitrationOperation is added to contains the state of arbitration request processing to simplify the code implementation. This PR also adds an option to disable global arbitration and only do local arbitration which can always run in parallel. With this option, for 2hrs stress test (1000 query concurrency at coordinator) with Meta internal production workloads replay, the averaged query execution time has been reduced by ~30% and cpu time is kept the same. The memory arbitration wall time for query without triggering spilled is only 9 mins in total. Reviewed By: tanjialiang, oerling Differential Revision: D56685200 Pulled By: xiaoxmeng
This pull request was exported from Phabricator. Differential Revision: D56685200 |
@xiaoxmeng merged this pull request in 5911129. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: Parallelize the local memory arbitration execution. The workload flow of memory arbitration process changes as follows: First wait in the arbitration queue to serialize the memory arbitration request processing from the same query pool. Adds ArbitrationQueue data structure for this which contains the wait promises of the arbitration requests from the same query pool. The arbitration queue is protected by arbitrator lock. Second calls runLocalArbitration to run local arbitration which acquires the reader lock of arbitration lock (which is added by this pr for local/global arbitration execution control). Then it ensures the request memory pool is within the capacity limit (this might trigger spill to reclaim the used memory from the request pool itself), and tries to allocate free capacity from the arbitrator or reclaim the free memory from the other queries which is protected by arbitrator lock. Third if runLocalArbitration can't reclaim sufficient memory, then proceeds with runGlobalArbitration which acquires the writer lock of arbitration lock. Then it reclaims the free capacity from the arbitrator or the other queries. And at last reclaims the used memory from the other queries by spilling. ArbitrationOperation is added to contains the state of arbitration request processing to simplify the code implementation. This PR also adds an option to disable global arbitration and only do local arbitration which can always run in parallel. With this option, for 2hrs stress test (1000 query concurrency at coordinator) with Meta internal production workloads replay, the averaged query execution time has been reduced by ~30% and cpu time is kept the same. The memory arbitration wall time for query without triggering spilled is only 9 mins in total. Pull Request resolved: facebookincubator#9649 Reviewed By: tanjialiang, oerling Differential Revision: D56685200 Pulled By: xiaoxmeng fbshipit-source-id: f8db71e4cae05f24f913464c37c5bc1e6528083b
Summary: Parallelize the local memory arbitration execution. The workload flow of memory arbitration process changes as follows: First wait in the arbitration queue to serialize the memory arbitration request processing from the same query pool. Adds ArbitrationQueue data structure for this which contains the wait promises of the arbitration requests from the same query pool. The arbitration queue is protected by arbitrator lock. Second calls runLocalArbitration to run local arbitration which acquires the reader lock of arbitration lock (which is added by this pr for local/global arbitration execution control). Then it ensures the request memory pool is within the capacity limit (this might trigger spill to reclaim the used memory from the request pool itself), and tries to allocate free capacity from the arbitrator or reclaim the free memory from the other queries which is protected by arbitrator lock. Third if runLocalArbitration can't reclaim sufficient memory, then proceeds with runGlobalArbitration which acquires the writer lock of arbitration lock. Then it reclaims the free capacity from the arbitrator or the other queries. And at last reclaims the used memory from the other queries by spilling. ArbitrationOperation is added to contains the state of arbitration request processing to simplify the code implementation. This PR also adds an option to disable global arbitration and only do local arbitration which can always run in parallel. With this option, for 2hrs stress test (1000 query concurrency at coordinator) with Meta internal production workloads replay, the averaged query execution time has been reduced by ~30% and cpu time is kept the same. The memory arbitration wall time for query without triggering spilled is only 9 mins in total. Pull Request resolved: facebookincubator#9649 Reviewed By: tanjialiang, oerling Differential Revision: D56685200 Pulled By: xiaoxmeng fbshipit-source-id: f8db71e4cae05f24f913464c37c5bc1e6528083b
Parallelize the local memory arbitration execution. The workload flow of memory arbitration process
changes as follows:
First wait in the arbitration queue to serialize the memory arbitration request processing from the same
query pool. Adds ArbitrationQueue data structure for this which contains the wait promises of the
arbitration requests from the same query pool. The arbitration queue is protected by arbitrator lock.
Second calls runLocalArbitration to run local arbitration which acquires the reader lock of arbitration
lock (which is added by this pr for local/global arbitration execution control). Then it ensures the request
memory pool is within the capacity limit (this might trigger spill to reclaim the used memory from the
request pool itself), and tries to allocate free capacity from the arbitrator or reclaim the free memory from
the other queries which is protected by arbitrator lock.
Third if runLocalArbitration can't reclaim sufficient memory, then proceeds with runGlobalArbitration which
acquires the writer lock of arbitration lock. Then it reclaims the free capacity from the arbitrator or the other
queries. And at last reclaims the used memory from the other queries by spilling.
ArbitrationOperation is added to contains the state of arbitration request processing to simplify the code
implementation.
This PR also adds an option to disable global arbitration and only do local arbitration which can always
run in parallel. With this option, for 2hrs stress test (1000 query concurrency at coordinator) with Meta
internal production workloads replay, the averaged query execution time has been reduced by ~30% and cpu
time is kept the same. The memory arbitration wall time for query without triggering spilled is only 9 mins in
total.