Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omit non-needs_drop drop_in_place in vtables #122662

Merged
merged 1 commit into from
May 28, 2024

Conversation

Mark-Simulacrum
Copy link
Member

@Mark-Simulacrum Mark-Simulacrum commented Mar 17, 2024

This replaces the drop_in_place reference with null in vtables. On librustc_driver.so, this drops about ~17k (11%) dynamic relocations from the output, since many vtables can now be placed in read-only memory, rather than having a relocated pointer included.

This makes a tradeoff by adding a null check at vtable call sites. I'm not sure that's readily avoidable without changing the vtable format (e.g., so that we can use a pc-relative relocation instead of an absolute address, and avoid the dynamic relocation that way). But it seems likely that the check is cheap at runtime.

Accepted MCP: rust-lang/compiler-team#730

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 17, 2024
@Mark-Simulacrum

This comment was marked as outdated.

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 17, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 17, 2024
Omit non-needs_drop drop_in_place in vtables

This replaces the drop_in_place reference with null in vtables. On librustc_driver.so, this drops about ~17k (11%) dynamic relocations from the output, since many vtables can now be placed in read-only memory, rather than having a relocated pointer included.

This makes a tradeoff by adding a null check at vtable call sites. I'm not sure that's readily avoidable without changing the vtable format (e.g., so that we can use a pc-relative relocation instead of an absolute address, and avoid the dynamic relocation that way). But it seems likely that the check is cheap at runtime.

r? `@Mark-Simulacrum` (opening for perf first)
@bors

This comment was marked as outdated.

@rust-log-analyzer

This comment has been minimized.

@bors

This comment was marked as outdated.

1 similar comment
@bors

This comment was marked as outdated.

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (58f8d0e): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.7% [-0.7%, -0.7%] 2
Improvements ✅
(secondary)
-0.6% [-0.6%, -0.4%] 6
All ❌✅ (primary) -0.7% [-0.7%, -0.7%] 2

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.2% [2.2%, 2.2%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.9% [-6.1%, -2.8%] 3
All ❌✅ (primary) 2.2% [2.2%, 2.2%] 1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.1% [-1.1%, -1.1%] 1
All ❌✅ (primary) - - 0

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.6%, -0.0%] 76
Improvements ✅
(secondary)
-1.4% [-5.9%, -0.1%] 64
All ❌✅ (primary) -0.2% [-0.6%, -0.0%] 76

Bootstrap: 668.447s -> 665.442s (-0.45%)
Artifact size: 312.75 MiB -> 312.55 MiB (-0.07%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 18, 2024
@Kobzol
Copy link
Contributor

Kobzol commented Mar 18, 2024

Can't we instead just emit a single no-op drop function shared by all these types, instead of using null, to avoid the null check? That should save binary size, but avoid a runtime check (even though it will probably be predicted correctly if the drop call is on a generic type, rather through &dyn). The single function should be easily cached in the instruction cache.

@Mark-Simulacrum
Copy link
Member Author

Let's discuss on the MCP Zulip thread to avoid splitting the discussion.

@Mark-Simulacrum
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 27, 2024
@bors
Copy link
Contributor

bors commented Apr 27, 2024

⌛ Trying commit 080d085 with merge e7c1602...

bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 27, 2024
Omit non-needs_drop drop_in_place in vtables

This replaces the drop_in_place reference with null in vtables. On librustc_driver.so, this drops about ~17k (11%) dynamic relocations from the output, since many vtables can now be placed in read-only memory, rather than having a relocated pointer included.

This makes a tradeoff by adding a null check at vtable call sites. I'm not sure that's readily avoidable without changing the vtable format (e.g., so that we can use a pc-relative relocation instead of an absolute address, and avoid the dynamic relocation that way). But it seems likely that the check is cheap at runtime.

Accepted MCP: rust-lang/compiler-team#730
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Apr 27, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 27, 2024
@rust-log-analyzer

This comment has been minimized.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 5, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request May 5, 2024
…-obk,bjorn3

Omit non-needs_drop drop_in_place in vtables

This replaces the drop_in_place reference with null in vtables. On librustc_driver.so, this drops about ~17k (11%) dynamic relocations from the output, since many vtables can now be placed in read-only memory, rather than having a relocated pointer included.

This makes a tradeoff by adding a null check at vtable call sites. I'm not sure that's readily avoidable without changing the vtable format (e.g., so that we can use a pc-relative relocation instead of an absolute address, and avoid the dynamic relocation that way). But it seems likely that the check is cheap at runtime.

Accepted MCP: rust-lang/compiler-team#730
@bors
Copy link
Contributor

bors commented May 5, 2024

⌛ Testing commit ab17641 with merge 9618db9...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented May 5, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels May 5, 2024
@jieyouxu
Copy link
Member

@bors r- (looks like some tests are failing?)

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 26, 2024
This replaces the drop_in_place reference with null in vtables. On
librustc_driver.so, this drops about ~17k dynamic relocations from the
output, since many vtables can now be placed in read-only memory, rather
than having a relocated pointer included.

This makes a tradeoff by adding a null check at vtable call sites.
That's hard to avoid without changing the vtable format (e.g., to use a
pc-relative relocation instead of an absolute address, and avoid the
dynamic relocation that way). But it seems likely that the check is
cheap at runtime.
@Mark-Simulacrum
Copy link
Member Author

r? @bjorn3 for another review, missed that I need to use helper.llbb_with_cleanup when skipping drop in the ssa code. Does cranelift need a similar construction? I can't see anything mentioning cleanup/funclets in codegen_cranelift...

@rustbot ready

@rustbot
Copy link
Collaborator

rustbot commented May 28, 2024

Could not assign reviewer from: bjorn3.
User(s) bjorn3 are either the PR author, already assigned, or on vacation, and there are no other candidates.
Use r? to specify someone else to assign.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 28, 2024
@bjorn3
Copy link
Member

bjorn3 commented May 28, 2024

missed that I need to use helper.llbb_with_cleanup when skipping drop in the ssa code. Does cranelift need a similar construction? I can't see anything mentioning cleanup/funclets in codegen_cranelift...

Unwinding panics are not yet supported by Cranelift.

@bjorn3
Copy link
Member

bjorn3 commented May 28, 2024

@bors r+

@bors
Copy link
Contributor

bors commented May 28, 2024

📌 Commit 4c002fc has been approved by bjorn3

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 28, 2024
@bors
Copy link
Contributor

bors commented May 28, 2024

⌛ Testing commit 4c002fc with merge 8c4db85...

@bors
Copy link
Contributor

bors commented May 28, 2024

☀️ Test successful - checks-actions
Approved by: bjorn3
Pushing 8c4db85 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label May 28, 2024
@bors bors merged commit 8c4db85 into rust-lang:master May 28, 2024
7 checks passed
@rustbot rustbot added this to the 1.80.0 milestone May 28, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8c4db85): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.6% [-1.5%, -0.2%] 9
Improvements ✅
(secondary)
-0.5% [-1.0%, -0.2%] 18
All ❌✅ (primary) -0.6% [-1.5%, -0.2%] 9

Max RSS (memory usage)

Results (primary 2.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.1% [1.5%, 2.7%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.1% [1.5%, 2.7%] 2

Cycles

Results (secondary -2.4%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.4% [-2.4%, -2.4%] 1
All ❌✅ (primary) - - 0

Binary size

Results (primary -0.3%, secondary -1.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.3% [-0.6%, -0.0%] 84
Improvements ✅
(secondary)
-1.0% [-5.9%, -0.1%] 64
All ❌✅ (primary) -0.3% [-0.6%, -0.0%] 84

Bootstrap: 669.344s -> 668.572s (-0.12%)
Artifact size: 318.39 MiB -> 318.08 MiB (-0.10%)

@rustbot rustbot removed the perf-regression Performance regression. label May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants