-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trans: Make type names in LLVM IR independent of crate-nums and source locations. #37640
Conversation
r? @Aatch (rust_highfive has picked a reviewer for you, use r? to override) |
Collision handling make little sense to me. Types will never get their “LLVM names” exposed (except maybe through debug info?) nor should never use them to influence what code does (as opposed to using I feel like all this extra work the compiler would end up doing is just a waste. |
I agree with @nagisa here. In fact, I'd venture to say LLVM's "type system" is a misguided attempt at some sort of higher-level-than-machine "MIR" for C and C++, and their regrets alone won't rewrite LLVM. It's less insightful than the debuginfo which is still presented in a deplorable manner in LLVM IR. I vote for not emitting any type or value names (other than |
Yes, I agree, the more I think about it. How about we have something that is readable (for debugging), like the item path and any disambiguation is done through a cgu-local counter (which might already be provided by llvm, as @nagisa mentioned). That should be fine for incr. comp. For that we only need to get rid of source locations and crate nums in the names. On November 7, 2016 5:38:02 PM EST, Eduard-Mihai Burtescu notifications@github.com wrote:
Sent from my Android device with K-9 Mail. Please excuse my brevity. |
I feel like we can tell LLVM to not preserve any names (other than statics and fns, as eddyb already noted; there’s a switch for that somewhere, I believe) unless |
OK, I did some tests with and without human-readable names and I could hardly detect a difference in space requirements or compile times. E.g. the libstd rlib was 5784 KB with readable names and 5750 KB without, libcore 8402 KB with and 8397 KB without - so 0.5% and 0.05%. |
Are trans times not affected? In that case just making the type names simpler is enough IMO. |
Not as far as I can tell. |
4ed52fb
to
5e13f0c
Compare
Update: See text of PR description. |
☔ The latest upstream changes (presumably #37670) made this pull request unmergeable. Please resolve the merge conflicts. |
5e13f0c
to
b69bc15
Compare
Rebased. I think there is nothing controversial about this PR anymore. It fixes the incr. comp. stability problems with making type names worse. |
@bors r+ |
📌 Commit b69bc15 has been approved by |
☔ The latest upstream changes (presumably #37104) made this pull request unmergeable. Please resolve the merge conflicts. |
b69bc15
to
47b8656
Compare
@bors r=brson Rebased. |
📌 Commit 47b8656 has been approved by |
@bors r+ |
💡 This pull request was already approved, no need to approve it again.
|
📌 Commit 47b8656 has been approved by |
Oops, I see @brson r+'d already... well, I approve anyhow. =) |
Rollup of 30 pull requests - Successful merges: #37190, #37368, #37481, #37503, #37527, #37535, #37551, #37584, #37600, #37613, #37615, #37659, #37662, #37669, #37682, #37688, #37690, #37692, #37693, #37694, #37695, #37696, #37698, #37699, #37705, #37708, #37709, #37716, #37724, #37727 - Failed merges: #37640, #37689, #37717
🔒 Merge conflict |
☔ The latest upstream changes (presumably #37730) made this pull request unmergeable. Please resolve the merge conflicts. |
47b8656
to
4a32030
Compare
@bors r=brson Rebased. |
📌 Commit 4a32030 has been approved by |
🔒 Merge conflict |
☔ The latest upstream changes (presumably #37675) made this pull request unmergeable. Please resolve the merge conflicts. |
Before this PR, type names could depend on the cratenum being used for a given crate and also on the source location of closures. Both are undesirable for incremental compilation where we cache LLVM IR and don't want it to depend on formatting or in which order crates are loaded.
4a32030
to
276f052
Compare
@bors r=brson Rebased. |
📌 Commit 276f052 has been approved by |
⌛ Testing commit 276f052 with merge 435246b... |
trans: Make type names in LLVM IR independent of crate-nums and source locations. UPDATE: This PR makes the type names we assign in LLVM IR independent of the type definition's location in the source code and the order in which extern crates are loaded. The new type names look like the old ones, except for closures and the `<crate-num>.` prefix being gone. Resolution of name clashes (e.g. of the same type in different crate versions) is left to LLVM (which will just append `.<counter>` to the name). ORIGINAL TEXT: This PR makes the type names we assign in LLVM IR independent of the type definition's location in the source code. Before, the type of closures contained the closures definition location. The new naming scheme follows the same pattern that we already use for symbol names: We have a human readable prefix followed by a hash that makes sure we don't have any collisions. Here is an example of what the new names look like: ```rust // prog.rs - example program mod mod1 { pub struct Struct<T>(pub T); } fn main() { use mod1::Struct; let _s = Struct(0u32); let _t = Struct('h'); let _x = Struct(Struct(0i32)); } ``` Old: ```llvm %"mod1::Struct<u32>" = type { i32 } %"mod1::Struct<char>" = type { i32 } %"mod1::Struct<mod1::Struct<i32>>" = type { %"mod1::Struct<i32>" } %"mod1::Struct<i32>" = type { i32 } ``` New: ```llvm %"prog::mod1::Struct<u32>::ejDrT" = type { i32 } %"prog::mod1::Struct<char>::2eEAU" = type { i32 } %"prog::mod1::Struct<prog::mod1::Struct<i32>>::ehCqR" = type { %"prog::mod1::Struct<i32>::$fAo2" } %"prog::mod1::Struct<i32>::$fAo2" = type { i32 } ``` As you can see, the new names are slightly more verbose, but also more consistent. There is no difference now between a local type and one from another crate (before, non-local types where prefixed with `<crate-num>.` as in `2.std::mod1::Type1`). There is a bit of design space here. For example, we could leave off the crate name for local definitions (making names shorter but less consistent): ```llvm %"mod1::Struct<u32>::ejDrT" = type { i32 } %"mod1::Struct<char>::2eEAU" = type { i32 } %"mod1::Struct<mod1::Struct<i32>>::ehCqR" = type { %"mod1::Struct<i32>::$fAo2" } %"mod1::Struct<i32>::$fAo2" = type { i32 } ``` We could also put the hash in front, which might be more readable: ```llvm %"ejDrT.mod1::Struct<u32>" = type { i32 } %"2eEAU.mod1::Struct<char>" = type { i32 } %"ehCqR.mod1::Struct<mod1::Struct<i32>>" = type { %"$fAo2.mod1::Struct<i32>" } %"$fAo2.mod1::Struct<i32>" = type { i32 } ``` We could probably also get rid of the hash if we used full DefPaths and crate-nums (though I'm not yet a 100% sure if crate-nums could mess with incremental compilation). ```llvm %"mod1::Struct<u32>" = type { i32 } %"mod1::Struct<char>" = type { i32 } %"mod1::Struct<mod1::Struct<i32>>" = type { %"mod1::Struct<i32>" } %"mod1::Struct<i32>" = type { i32 } %"2.std::mod1::Type1" = type { ... } ``` I would prefer the solution with the hashes because it is nice and consistent conceptually, but visually it's admittedly a bit uglier. Maybe @rust-lang/compiler would like to bikeshed a little about this. On a related note: Has anyone ever tried if the LTO-linker will merge equal types with different names? (^ @brson, @alexcrichton ^) If not, that would be a reason to make type names more consistent.
UPDATE:
This PR makes the type names we assign in LLVM IR independent of the type definition's location in the source code and the order in which extern crates are loaded. The new type names look like the old ones, except for closures and the
<crate-num>.
prefix being gone. Resolution of name clashes (e.g. of the same type in different crate versions) is left to LLVM (which will just append.<counter>
to the name).ORIGINAL TEXT:
This PR makes the type names we assign in LLVM IR independent of the type definition's location in the source code. Before, the type of closures contained the closures definition location. The new naming scheme follows the same pattern that we already use for symbol names: We have a human readable prefix followed by a hash that makes sure we don't have any collisions. Here is an example of what the new names look like:
Old:
New:
As you can see, the new names are slightly more verbose, but also more consistent. There is no difference now between a local type and one from another crate (before, non-local types where prefixed with
<crate-num>.
as in2.std::mod1::Type1
).There is a bit of design space here. For example, we could leave off the crate name for local definitions (making names shorter but less consistent):
We could also put the hash in front, which might be more readable:
We could probably also get rid of the hash if we used full DefPaths and crate-nums (though I'm not yet a 100% sure if crate-nums could mess with incremental compilation).
I would prefer the solution with the hashes because it is nice and consistent conceptually, but visually it's admittedly a bit uglier. Maybe @rust-lang/compiler would like to bikeshed a little about this.
On a related note: Has anyone ever tried if the LTO-linker will merge equal types with different names?
(^ @brson, @alexcrichton ^)
If not, that would be a reason to make type names more consistent.