Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support LTO #10741

Closed
alexcrichton opened this issue Nov 30, 2013 · 6 comments
Closed

Support LTO #10741

alexcrichton opened this issue Nov 30, 2013 · 6 comments
Labels
A-linkage Area: linking into static, shared libraries and binaries E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot.

Comments

@alexcrichton
Copy link
Member

A "static rust library" is simply just an 'rlib' file, which in turns is just an ar archive. Right now the archive includes the crate.o and all native static dependencies. There's no reason that this file couldn't also include crate.bc to allow for LTO.

I would imagine that a --lto option would be specified when creating an executable and for when creating a dynamic library (I don't think this makes sense for static libraries?). When --lto is specified, then we don't load the bc files from all of the upstream rlib archives and throw it all into LLVM, then we tell LLVM to run its LTO passes.

Basically this would mean perhaps shoving more things into rlib archives by default (need to make sure that using rlib files still works), and then implementing the --lto option to load the bc files from archives.

Note that native libraries should all still "just work" to the greatest extent possible.

Not quite an easy bug, but certainly an interesting project!

@emberian
Copy link
Member

emberian commented Dec 1, 2013

cc @thestinger

@alexcrichton
Copy link
Member Author

I have this implemented, but there's a few caveats. First, right now our linkage model makes it such that LTO is only a reasonable option for a staticlib and binary output. This doesn't make sense for rlibs or for dylibs. This also means that when running LTO, it requires all upstream dependencies to be available as rlibs. This is how I implemented it:

  • Whenever an rlib is generated, in addition to crate.o being inserted into the archive, crate.bc is also inserted into the archive (the optimized bytecode).
  • During the run_passes phase of the compiler, the following steps are taken:
    • First, run the function/module pass managers over the local LLVM module
    • Load each upstream dependency's bytecode file, and "link into" the current LLVM module (using llvm apis)
    • Run an internalization pass which internalizes all symbols except those determined reachable (for the current module)
    • Run the LTO passes (there's an LTO pass manager provided by LLVM that's used)
    • Emit the bytecode of this entire module
  • When linking, take each upstream libfoo.rlib, copy it to a temporary location, remove foo.o from the archive, and then link as usual. I found it necessary (at least on OSX) to remove the object files because otherwise the linker would still pull in all of them (because their symbols are exported I would imagine).

Optimization-wise, this appears to work as well as one would expect. I got fn main() {} to compile to a 1.3MB executable, and ~700KB without librustuv. These are the same numbers that I was getting in #10740 with -ffunction-sections + --gc-sections, which is a promising number. This also means that a solution to this issue may end up closing #10740.

Sadly this comes with a caveat, and that caveat is that this is about the slowest possible thing the compiler can do. Here's a breakdown of the timing:

  • 0.397s loading std.bc and linking into the current module
  • 0.103s loading rustuv.bc
  • 1.894s running LTO optimization passes
  • 5.067s emitting the output file (running the codegen passes)
  • 0.396s preparing libstd.rlib without std.o
  • 0.060s preparing librustuv.rlib
  • 1.416s running the system linker

The total runtime of the compiler was 9.655s. Remember that the program in question is fn main() {}.

I don't think that this means we shouldn't have LTO support, but we would seriously need to look into optimizing this. This is an absurd runtime. If we could in theory make loading the bc libraries and preparing the new rlib libraries have 0 runtime (we can certainly optimize them more than I implemented), we're still looking at at least an 8s runtime.

What do others think about this? Did I make a fundamental misdirection somewhere? Is this actually a reasonable runtime for an LTO optimization?

As reference, my work in progress can be found at alexcrichton@e5416c8

@huonw
Copy link
Member

huonw commented Dec 3, 2013

@alexcrichton I tried running that branch locally (x64 linux), and I met with:

note: /home/huon/rust/build/x86_64-unknown-linux-gnu/stage2/lib/rustc/x86_64-unknown-linux-gnu/lib/libmorestack.a(morestack.o): In function `__morestack':
(.text+0x9): undefined reference to `rust_stack_exhausted'
collect2: error: ld returned 1 exit status

in linking. (Also, how did you get those fine grained numbers?)

@alexcrichton
Copy link
Member Author

I used a bit of an ad-hoc method to get those numbers (manual instrumentation with the time function), but I've formalized it a bit more now.

I think I fixed that problem in alexcrichton@2ba6426, although I'm a little scared that it ran successfully on OSX in the first place...

@thestinger
Copy link
Contributor

@alexcrichton: The individual crates are supposed to be somewhat optimized (not the full regular module pass) before the final LTO pass. Look at what clang does when compiling like this.

@alexcrichton
Copy link
Member Author

Interesting! I was hoping to avoid a "compile this library for LTO-enabled linkage" flag, but perhaps that may end up being inevitable. Sadly though I don't think that it will help the compile times at all (it'll probably just end up slowing them down a little with more optimizations being run at the end).

flip1995 pushed a commit to flip1995/rust that referenced this issue Jun 2, 2023
flip1995 pushed a commit to flip1995/rust that referenced this issue Jun 2, 2023
[`large_stack_arrays`]: check array initializer expressions

Fixes rust-lang#10741.
Prior to this PR, the lint only checked array repeat expressions (ie. `[T; n]`). Now it also checks array initializer expressions.

changelog: [`large_stack_arrays`]: check array initializer expressions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-linkage Area: linking into static, shared libraries and binaries E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot.
Projects
None yet
Development

No branches or pull requests

4 participants