Omit non-needs_drop drop_in_place in vtables #122662

Mark-Simulacrum · 2024-03-17T21:52:36Z

This replaces the drop_in_place reference with null in vtables. On librustc_driver.so, this drops about ~17k (11%) dynamic relocations from the output, since many vtables can now be placed in read-only memory, rather than having a relocated pointer included.

This makes a tradeoff by adding a null check at vtable call sites. I'm not sure that's readily avoidable without changing the vtable format (e.g., so that we can use a pc-relative relocation instead of an absolute address, and avoid the dynamic relocation that way). But it seems likely that the check is cheap at runtime.

Accepted MCP: rust-lang/compiler-team#730

Omit non-needs_drop drop_in_place in vtables This replaces the drop_in_place reference with null in vtables. On librustc_driver.so, this drops about ~17k (11%) dynamic relocations from the output, since many vtables can now be placed in read-only memory, rather than having a relocated pointer included. This makes a tradeoff by adding a null check at vtable call sites. I'm not sure that's readily avoidable without changing the vtable format (e.g., so that we can use a pc-relative relocation instead of an absolute address, and avoid the dynamic relocation that way). But it seems likely that the check is cheap at runtime. r? `@Mark-Simulacrum` (opening for perf first)

rust-timer · 2024-03-18T04:50:27Z

Finished benchmarking commit (58f8d0e): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.7%	[-0.7%, -0.7%]	2
Improvements ✅ (secondary)	-0.6%	[-0.6%, -0.4%]	6
All ❌✅ (primary)	-0.7%	[-0.7%, -0.7%]	2

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.2%	[2.2%, 2.2%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-4.9%	[-6.1%, -2.8%]	3
All ❌✅ (primary)	2.2%	[2.2%, 2.2%]	1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.1%	[-1.1%, -1.1%]	1
All ❌✅ (primary)	-	-	0

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.2%	[-0.6%, -0.0%]	76
Improvements ✅ (secondary)	-1.4%	[-5.9%, -0.1%]	64
All ❌✅ (primary)	-0.2%	[-0.6%, -0.0%]	76

Bootstrap: 668.447s -> 665.442s (-0.45%)
Artifact size: 312.75 MiB -> 312.55 MiB (-0.07%)

Kobzol · 2024-03-18T13:36:03Z

Can't we instead just emit a single no-op drop function shared by all these types, instead of using null, to avoid the null check? That should save binary size, but avoid a runtime check (even though it will probably be predicted correctly if the drop call is on a generic type, rather through &dyn). The single function should be easily cached in the instruction cache.

Mark-Simulacrum · 2024-03-18T13:40:01Z

Let's discuss on the MCP Zulip thread to avoid splitting the discussion.

Mark-Simulacrum · 2024-04-27T18:40:36Z

@bors try @rust-timer queue

bors · 2024-04-27T18:41:46Z

⌛ Trying commit 080d085 with merge e7c1602...

Omit non-needs_drop drop_in_place in vtables This replaces the drop_in_place reference with null in vtables. On librustc_driver.so, this drops about ~17k (11%) dynamic relocations from the output, since many vtables can now be placed in read-only memory, rather than having a relocated pointer included. This makes a tradeoff by adding a null check at vtable call sites. I'm not sure that's readily avoidable without changing the vtable format (e.g., so that we can use a pc-relative relocation instead of an absolute address, and avoid the dynamic relocation that way). But it seems likely that the check is cheap at runtime. Accepted MCP: rust-lang/compiler-team#730

bors · 2024-04-27T19:02:08Z

💔 Test failed - checks-actions

…-obk,bjorn3 Omit non-needs_drop drop_in_place in vtables This replaces the drop_in_place reference with null in vtables. On librustc_driver.so, this drops about ~17k (11%) dynamic relocations from the output, since many vtables can now be placed in read-only memory, rather than having a relocated pointer included. This makes a tradeoff by adding a null check at vtable call sites. I'm not sure that's readily avoidable without changing the vtable format (e.g., so that we can use a pc-relative relocation instead of an absolute address, and avoid the dynamic relocation that way). But it seems likely that the check is cheap at runtime. Accepted MCP: rust-lang/compiler-team#730

bors · 2024-05-05T03:44:10Z

⌛ Testing commit ab17641 with merge 9618db9...

bors · 2024-05-05T04:31:14Z

💔 Test failed - checks-actions

jieyouxu · 2024-05-26T21:57:26Z

@bors r- (looks like some tests are failing?)

This replaces the drop_in_place reference with null in vtables. On librustc_driver.so, this drops about ~17k dynamic relocations from the output, since many vtables can now be placed in read-only memory, rather than having a relocated pointer included. This makes a tradeoff by adding a null check at vtable call sites. That's hard to avoid without changing the vtable format (e.g., to use a pc-relative relocation instead of an absolute address, and avoid the dynamic relocation that way). But it seems likely that the check is cheap at runtime.

Mark-Simulacrum · 2024-05-28T10:45:57Z

r? @bjorn3 for another review, missed that I need to use helper.llbb_with_cleanup when skipping drop in the ssa code. Does cranelift need a similar construction? I can't see anything mentioning cleanup/funclets in codegen_cranelift...

@rustbot ready

rustbot · 2024-05-28T10:45:59Z

Could not assign reviewer from: bjorn3.
User(s) bjorn3 are either the PR author, already assigned, or on vacation, and there are no other candidates.
Use r? to specify someone else to assign.

bjorn3 · 2024-05-28T14:42:21Z

missed that I need to use helper.llbb_with_cleanup when skipping drop in the ssa code. Does cranelift need a similar construction? I can't see anything mentioning cleanup/funclets in codegen_cranelift...

Unwinding panics are not yet supported by Cranelift.

bjorn3 · 2024-05-28T14:46:40Z

@bors r+

bors · 2024-05-28T14:46:43Z

📌 Commit 4c002fc has been approved by bjorn3

It is now in the queue for this repository.

bors · 2024-05-28T16:04:16Z

⌛ Testing commit 4c002fc with merge 8c4db85...

bors · 2024-05-28T18:20:16Z

☀️ Test successful - checks-actions
Approved by: bjorn3
Pushing 8c4db85 to master...

rust-timer · 2024-05-28T20:00:29Z

Finished benchmarking commit (8c4db85): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.6%	[-1.5%, -0.2%]	9
Improvements ✅ (secondary)	-0.5%	[-1.0%, -0.2%]	18
All ❌✅ (primary)	-0.6%	[-1.5%, -0.2%]	9

Max RSS (memory usage)

Results (primary 2.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.1%	[1.5%, 2.7%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.1%	[1.5%, 2.7%]	2

Cycles

Results (secondary -2.4%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.4%	[-2.4%, -2.4%]	1
All ❌✅ (primary)	-	-	0

Binary size

Results (primary -0.3%, secondary -1.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.3%	[-0.6%, -0.0%]	84
Improvements ✅ (secondary)	-1.0%	[-5.9%, -0.1%]	64
All ❌✅ (primary)	-0.3%	[-0.6%, -0.0%]	84

Bootstrap: 669.344s -> 668.572s (-0.12%)
Artifact size: 318.39 MiB -> 318.08 MiB (-0.10%)

rustbot assigned Mark-Simulacrum Mar 17, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 17, 2024

This comment was marked as outdated.

Sign in to view