Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize local memory arbitration #9649

Closed
wants to merge 1 commit into from

Conversation

xiaoxmeng
Copy link
Contributor

@xiaoxmeng xiaoxmeng commented Apr 28, 2024

Parallelize the local memory arbitration execution. The workload flow of memory arbitration process
changes as follows:
First wait in the arbitration queue to serialize the memory arbitration request processing from the same
query pool. Adds ArbitrationQueue data structure for this which contains the wait promises of the
arbitration requests from the same query pool. The arbitration queue is protected by arbitrator lock.

Second calls runLocalArbitration to run local arbitration which acquires the reader lock of arbitration
lock (which is added by this pr for local/global arbitration execution control). Then it ensures the request
memory pool is within the capacity limit (this might trigger spill to reclaim the used memory from the
request pool itself), and tries to allocate free capacity from the arbitrator or reclaim the free memory from
the other queries which is protected by arbitrator lock.

Third if runLocalArbitration can't reclaim sufficient memory, then proceeds with runGlobalArbitration which
acquires the writer lock of arbitration lock. Then it reclaims the free capacity from the arbitrator or the other
queries. And at last reclaims the used memory from the other queries by spilling.

ArbitrationOperation is added to contains the state of arbitration request processing to simplify the code
implementation.

This PR also adds an option to disable global arbitration and only do local arbitration which can always
run in parallel. With this option, for 2hrs stress test (1000 query concurrency at coordinator) with Meta
internal production workloads replay, the averaged query execution time has been reduced by ~30% and cpu
time is kept the same. The memory arbitration wall time for query without triggering spilled is only 9 mins in
total.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 28, 2024
Copy link

netlify bot commented Apr 28, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 65ce31d
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6639a98df0ce1300089312e5

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

};

// Invoked to check if the memory growth will exceed the memory pool's max
// capacity limit or the arbitrator's node capacity limit.
bool checkCapacityGrowth(const MemoryPool& pool, uint64_t targetBytes) const;
bool checkCapacityGrowth(ArbitrationOperation* op) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing around the same object everywhere, shall we just pass this operation object to runLocalArbitration() and keep track of an instance of operation of currently running global one? It seems unnecessary to pass this stateful item around for global aribtration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can set the current running arbitration operation as a member of shared arbitrator object but there a quite a few utilities shared between local and global arbitration operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the only shared one is checkFreeCapacity()?

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@tanjialiang tanjialiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Left some comments. Also since this changed the arbitration workflow of SharedArbitrator, shall we also put detailed documentation on top of the SharedArbitrator class?

@xiaoxmeng xiaoxmeng force-pushed the par branch 4 times, most recently from 50e0ce7 to ce4d9f0 Compare May 2, 2024 23:33
@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request May 4, 2024
Summary:
Parallelize the local memory arbitration execution. The workload flow of memory arbitration process
changes as follows:
First wait in the arbitration queue to serialize the memory arbitration request processing from the same
query pool. Adds ArbitrationQueue data structure for this which contains the wait promises of the
arbitration requests from the same query pool. The arbitration queue is protected by arbitrator lock.

Second calls runLocalArbitration to run local arbitration which acquires the reader lock of arbitration
lock (which is added by this pr for local/global arbitration execution control). Then it ensures the request
memory pool is within the capacity limit (this might trigger spill to reclaim the used memory from the
request pool itself), and tries to allocate free capacity from the arbitrator or reclaim the free memory from
the other queries which is protected by arbitrator lock.

Third if runLocalArbitration can't reclaim sufficient memory, then proceeds with runGlobalArbitration which
acquires the writer lock of arbitration lock. Then it reclaims the free capacity from the arbitrator or the other
queries. And at last reclaims the used memory from the other queries by spilling.

ArbitrationOperation is added to contains the state of arbitration request processing to simplify the code
implementation


Reviewed By: tanjialiang, oerling

Differential Revision: D56685200

Pulled By: xiaoxmeng
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D56685200

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request May 7, 2024
Summary:
Parallelize the local memory arbitration execution. The workload flow of memory arbitration process
changes as follows:
First wait in the arbitration queue to serialize the memory arbitration request processing from the same
query pool. Adds ArbitrationQueue data structure for this which contains the wait promises of the
arbitration requests from the same query pool. The arbitration queue is protected by arbitrator lock.

Second calls runLocalArbitration to run local arbitration which acquires the reader lock of arbitration
lock (which is added by this pr for local/global arbitration execution control). Then it ensures the request
memory pool is within the capacity limit (this might trigger spill to reclaim the used memory from the
request pool itself), and tries to allocate free capacity from the arbitrator or reclaim the free memory from
the other queries which is protected by arbitrator lock.

Third if runLocalArbitration can't reclaim sufficient memory, then proceeds with runGlobalArbitration which
acquires the writer lock of arbitration lock. Then it reclaims the free capacity from the arbitrator or the other
queries. And at last reclaims the used memory from the other queries by spilling.

ArbitrationOperation is added to contains the state of arbitration request processing to simplify the code
implementation.

This PR also adds an option to disable global arbitration and only do local arbitration which can always
run in parallel. With this option, for 2hrs stress test (1000 query concurrency at coordinator) with Meta
internal production workloads replay, the averaged query execution time has been reduced by ~30% and cpu
time is kept the same. The memory arbitration wall time for query without triggering spilled is only 9 mins in
total.


Reviewed By: tanjialiang, oerling

Differential Revision: D56685200

Pulled By: xiaoxmeng
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D56685200

Summary:
Parallelize the local memory arbitration execution. The workload flow of memory arbitration process
changes as follows:
First wait in the arbitration queue to serialize the memory arbitration request processing from the same
query pool. Adds ArbitrationQueue data structure for this which contains the wait promises of the
arbitration requests from the same query pool. The arbitration queue is protected by arbitrator lock.

Second calls runLocalArbitration to run local arbitration which acquires the reader lock of arbitration
lock (which is added by this pr for local/global arbitration execution control). Then it ensures the request
memory pool is within the capacity limit (this might trigger spill to reclaim the used memory from the
request pool itself), and tries to allocate free capacity from the arbitrator or reclaim the free memory from
the other queries which is protected by arbitrator lock.

Third if runLocalArbitration can't reclaim sufficient memory, then proceeds with runGlobalArbitration which
acquires the writer lock of arbitration lock. Then it reclaims the free capacity from the arbitrator or the other
queries. And at last reclaims the used memory from the other queries by spilling.

ArbitrationOperation is added to contains the state of arbitration request processing to simplify the code
implementation.

This PR also adds an option to disable global arbitration and only do local arbitration which can always
run in parallel. With this option, for 2hrs stress test (1000 query concurrency at coordinator) with Meta
internal production workloads replay, the averaged query execution time has been reduced by ~30% and cpu
time is kept the same. The memory arbitration wall time for query without triggering spilled is only 9 mins in
total.


Reviewed By: tanjialiang, oerling

Differential Revision: D56685200

Pulled By: xiaoxmeng
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D56685200

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng merged this pull request in 5911129.

@xiaoxmeng xiaoxmeng deleted the par branch May 7, 2024 06:50
Copy link

Conbench analyzed the 1 benchmark run on commit 59111293.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request May 8, 2024
Summary:
Parallelize the local memory arbitration execution. The workload flow of memory arbitration process
changes as follows:
First wait in the arbitration queue to serialize the memory arbitration request processing from the same
query pool. Adds ArbitrationQueue data structure for this which contains the wait promises of the
arbitration requests from the same query pool. The arbitration queue is protected by arbitrator lock.

Second calls runLocalArbitration to run local arbitration which acquires the reader lock of arbitration
lock (which is added by this pr for local/global arbitration execution control). Then it ensures the request
memory pool is within the capacity limit (this might trigger spill to reclaim the used memory from the
request pool itself), and tries to allocate free capacity from the arbitrator or reclaim the free memory from
the other queries which is protected by arbitrator lock.

Third if runLocalArbitration can't reclaim sufficient memory, then proceeds with runGlobalArbitration which
acquires the writer lock of arbitration lock. Then it reclaims the free capacity from the arbitrator or the other
queries. And at last reclaims the used memory from the other queries by spilling.

ArbitrationOperation is added to contains the state of arbitration request processing to simplify the code
implementation.

This PR also adds an option to disable global arbitration and only do local arbitration which can always
run in parallel. With this option, for 2hrs stress test (1000 query concurrency at coordinator) with Meta
internal production workloads replay, the averaged query execution time has been reduced by ~30% and cpu
time is kept the same. The memory arbitration wall time for query without triggering spilled is only 9 mins in
total.

Pull Request resolved: facebookincubator#9649

Reviewed By: tanjialiang, oerling

Differential Revision: D56685200

Pulled By: xiaoxmeng

fbshipit-source-id: f8db71e4cae05f24f913464c37c5bc1e6528083b
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
Summary:
Parallelize the local memory arbitration execution. The workload flow of memory arbitration process
changes as follows:
First wait in the arbitration queue to serialize the memory arbitration request processing from the same
query pool. Adds ArbitrationQueue data structure for this which contains the wait promises of the
arbitration requests from the same query pool. The arbitration queue is protected by arbitrator lock.

Second calls runLocalArbitration to run local arbitration which acquires the reader lock of arbitration
lock (which is added by this pr for local/global arbitration execution control). Then it ensures the request
memory pool is within the capacity limit (this might trigger spill to reclaim the used memory from the
request pool itself), and tries to allocate free capacity from the arbitrator or reclaim the free memory from
the other queries which is protected by arbitrator lock.

Third if runLocalArbitration can't reclaim sufficient memory, then proceeds with runGlobalArbitration which
acquires the writer lock of arbitration lock. Then it reclaims the free capacity from the arbitrator or the other
queries. And at last reclaims the used memory from the other queries by spilling.

ArbitrationOperation is added to contains the state of arbitration request processing to simplify the code
implementation.

This PR also adds an option to disable global arbitration and only do local arbitration which can always
run in parallel. With this option, for 2hrs stress test (1000 query concurrency at coordinator) with Meta
internal production workloads replay, the averaged query execution time has been reduced by ~30% and cpu
time is kept the same. The memory arbitration wall time for query without triggering spilled is only 9 mins in
total.

Pull Request resolved: facebookincubator#9649

Reviewed By: tanjialiang, oerling

Differential Revision: D56685200

Pulled By: xiaoxmeng

fbshipit-source-id: f8db71e4cae05f24f913464c37c5bc1e6528083b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants