Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize and improve the proc_macro RPC interface for cross-thread execution #86816

Closed
wants to merge 12 commits into from

Conversation

mystor
Copy link
Contributor

@mystor mystor commented Jul 2, 2021

This series of patches performs various changes to the RPC interface used by proc_macro to improve its performance. From my local testing (which involved doing check --test runs on serde's test suite, based on #49219 (comment)) these pages have small improvements to same-thread execution, and significant improvements to cross-thread execution.

Some of the major changes include:

  • Stash frequently accessed interned values (Span::{def,call,mixed}_site) in the client bridge, so no RPC is required to access them,
  • Support non-blocking messaging to the server for calls which can be handled async (such as TokenStream::clone() or composing Token{Tree,Stream}s together),
  • Reduce the number of messages required for common TokenStream operations (removes TokenStreamBuilder, TokenStreamIter in favour of passing Vec<TokenTree> over RPC),
  • Store an Option<bridge::client::TokenStream> in the TokenStream type to avoid TokenStream::new() call overhead,
  • Add a local symbol interner within the client bridge to hold Symbols in the client, along with support to serialize them over RPC,
  • Decompose the token types Group, Punct, Ident, and Literal into struct types around more primitive handles (TokenStream, Symbol, and Span), meaning that common operations on these types, such as accessing .span() or .delimiter() require no RPC calls,
  • Use crossbeam-channel for cross-thread message passing, to avoid mpsc::channel's overhead,
  • Added a flag (-Z proc_macro_cross_thread) to turn on the CrossThread execution strategy for proc macros without rebuilding.

The PR is organized into a sequence of commits which build on top of one another, and could be reviewed independently.

Some example times from my machine (there's unfortunately variation in times run-to-run, but I tried to pick representative examples):

PR (SameThread)
~/src/serde/test_suite$ ( export RUSTFLAGS="-Z proc_macro_cross_thread=no"; cargo +dev check --tests && rm -f ../target/debug/deps/libtest_*.rmeta && time cargo +dev check --tests )
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
    Checking serde_test_suite v0.0.0 (/home/nika/src/serde/test_suite)
    Finished dev [unoptimized + debuginfo] target(s) in 0.76s

real	0m0.779s
user	0m2.256s
sys	0m0.459s
PR (CrossThread)
~/src/serde/test_suite$ ( export RUSTFLAGS="-Z proc_macro_cross_thread=yes"; cargo +dev check --tests && rm -f ../target/debug/deps/libtest_*.rmeta && time cargo +dev check --tests )
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
    Checking serde_test_suite v0.0.0 (/home/nika/src/serde/test_suite)
    Finished dev [unoptimized + debuginfo] target(s) in 0.83s

real	0m0.847s
user	0m2.971s
sys	0m0.579s
HEAD (SameThread)
~/src/serde/test_suite$ cargo +dev check --tests && rm -f ../target/debug/deps/libtest_*.rmeta && time cargo +dev check --tests
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
    Checking serde_test_suite v0.0.0 (/home/nika/src/serde/test_suite)
    Finished dev [unoptimized + debuginfo] target(s) in 0.85s

real	0m0.866s
user	0m2.608s
sys	0m0.395s
HEAD (CrossThread2)
~/src/serde/test_suite$ cargo +dev check --tests && rm -f ../target/debug/deps/libtest_*.rmeta && time cargo +dev check --tests
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
    Checking serde_test_suite v0.0.0 (/home/nika/src/serde/test_suite)
    Finished dev [unoptimized + debuginfo] target(s) in 7.49s

real	0m7.504s
user	0m8.036s
sys	0m13.600s

(note: there will likely be changes required to RA to support these changes as well, however this patch currently includes no changes to RA or it's copy of the proc_macro crate)

mystor added 8 commits July 2, 2021 12:54
This greatly improves the performance of the very frequently called
`call_site()` macro when running in a cross-thread configuration.
This has minimal impact without changes from a later part, as it also requires
support from the ExecutionStrategy. Messages are able to return owning handles
without waiting by allocating an ID from the client thread, and passing it up
to the server to fill with a response. If the server panics, it will be an ICE,
and no further requests will be handled.
…bug flag to use it

This new strategy supports avoiding waiting for a reply for noblock messages.
This strategy requires using a channel-like approach (similar to the previous
CrossThread1 approach).

This new CrossThread execution strategy takes a type parameter for the channel
to use, allowing rustc to use a more efficient channel which the proc_macro
crate could not declare as a dependency.
Compared to mpsc::channel, crossbeam-channel has significantly lower overhead.
… and iterate TokenStreams

This significantly reduces the cost of common interactions with TokenStream
when running with the CrossThread execution strategy, by reducing the number of
RPC calls required.
@rust-highfive
Copy link
Collaborator

r? @petrochenkov

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 2, 2021
@rust-log-analyzer

This comment has been minimized.

@eddyb
Copy link
Member

eddyb commented Jul 2, 2021

Just to get ahead of this a bit: I was aware that some of this was being worked on by @mystor, and long-term I believe we want all of these changes, but I'm not sure how to approach this PR with all of them at once.

If I find enough free time, I could maybe help split into several parts (there's some interesting design changes that aren't strictly tied to cross-thread-specific optimizations, like drastically cutting down on RPC "object" types, keeping interned symbols a strings in the client, etc.)

But as of right now I can't promise anything. If someone else is willing to review this in bulk, feel free to go ahead, but even then, I doubt we could land it without an MCP.

@mystor
Copy link
Contributor Author

mystor commented Jul 2, 2021

I'd be willing to split this into smaller PRs based on the individual patches in the review (each of which should be independent). I unfortunately don't know how to do this on github, however (reading suggests it's not possible). Perhaps we could start with a smaller PR which only includes the changes up to afc36cc, then do the range for the wait/nowait changes (up to 8db4f28), and then the non-remote object changes (up to efedf62). I think it should also be possible to rebase the struct changes to not include the wait/nowait changes if they're particularly controversial (as they only impact CrossThread).

mystor added 4 commits July 2, 2021 14:33
This greatly reduces round-trips to fetch relevant extra information about the
token in proc macro code, and avoids RPC messages to create Punct tokens.
This greatly reduces round-trips to fetch relevant extra information about the
token in proc macro code, and avoids RPC messages to create Group tokens.
This requires a dependency on `unicode-normalization` and `rustc_lexer`, which
is currently not possible for `proc_macro`. Instead, a second `extern "C" fn`
is provided by the compiler server to perform these steps from any thread.

String values are interned in both the server and client, meaning that
identifiers can be stringified without any RPC roundtrips without substantially
inflating their size.

RPC messages passing symbols include the full un-interned value, and are
re-interned on the receiving side. This could potentially be optimized in the
future.

The symbol infrastructure will alwo be used for literals in a following part.
This builds on the symbol infrastructure built for ident to replicate the
`LitKind` and `Lit` structures in rustc within the `proc_macro` client,
allowing literals to be fully created and interacted with from the client
thread. Only parsing and subspan operations still require sync RPC.
@rust-log-analyzer
Copy link
Collaborator

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
configure: rust.channel         := nightly
configure: rust.debug-assertions := True
configure: llvm.assertions      := True
configure: dist.missing-tools   := True
configure: build.configure-args := ['--enable-sccache', '--disable-manage-submodu ...
configure: writing `config.toml` in current directory
configure: 
configure: run `python /checkout/x.py --help`
configure: 
---
 Documenting std v0.0.0 (/checkout/library/std)
    Finished release [optimized] target(s) in 8.87s
    Checking std v0.0.0 (/checkout/library/std)
 Documenting proc_macro v0.0.0 (/checkout/library/proc_macro)
error: unresolved link to `Ident::to_string`
   --> library/proc_macro/src/lib.rs:954:26
    |
954 |     /// by [`to_string`](Self::to_string).
    |                          ^^^^^^^^^^^^^^^ the struct `Ident` has no field or associated item named `to_string`
    |
    = note: `-D rustdoc::broken-intra-doc-links` implied by `-D warnings`
error: aborting due to previous error

error: could not document `proc_macro`


Caused by:
  process didn't exit successfully: `/checkout/obj/build/bootstrap/debug/rustdoc --edition=2018 --crate-type lib --crate-name proc_macro library/proc_macro/src/lib.rs --target x86_64-unknown-linux-gnu -o /checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/doc --error-format=json --json=diagnostic-rendered-ansi --markdown-css rust.css --markdown-no-toc -Z unstable-options --resource-suffix 1.55.0 --index-page /checkout/src/doc/index.md -L dependency=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps -L dependency=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/release/deps --extern std=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/libstd-08f5c12deca4d678.rmeta --cfg=bootstrap -Dwarnings '-Wrustdoc::invalid_codeblock_attributes' --crate-version '1.55.0-nightly
  (c0413432c
  2021-07-02)' '-Zcrate-attr=doc(html_root_url="https://doc.rust-lang.org/nightly/")'` (exit status: 1)


command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "rustdoc" "--target" "x86_64-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "16" "--release" "--locked" "--color" "always" "--features" "panic-unwind backtrace compiler-builtins-c" "--manifest-path" "/checkout/library/test/Cargo.toml" "-p" "proc_macro" "-Zskip-rustdoc-fingerprint" "--" "--markdown-css" "rust.css" "--markdown-no-toc" "-Z" "unstable-options" "--resource-suffix" "1.55.0" "--index-page" "/checkout/src/doc/index.md"


failed to run: /checkout/obj/build/bootstrap/debug/bootstrap doc --stage 0 library/test
Build completed unsuccessfully in 0:00:32

@mystor
Copy link
Contributor Author

mystor commented Jul 2, 2021

Closing this PR for now in favour of #86822 which should provide the SameThread benefits as this patch, but is smaller. If that patch is merged, the CrossThread changes in this PR could be rebased on top of those changes.

@mystor mystor closed this Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants