Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proc_macro: implement TokenTree, TokenKind, hygienic quote!, and other API #40939

Merged
merged 13 commits into from
Jul 6, 2017

Conversation

jseyfried
Copy link
Contributor

@jseyfried jseyfried commented Mar 31, 2017

All new API is gated behind #![feature(proc_macro)] and may be used with #[proc_macro], #[proc_macro_attribute], and #[proc_macro_derive] procedural macros.

More specifically, this PR adds the following in proc_macro:

// `TokenStream` constructors:
impl TokenStream { fn empty() -> TokenStream { ... } }
impl From<TokenTree> for TokenStream { ... }
impl From<TokenKind> for TokenStream { ... }
impl<T: Into<TokenStream>> FromIterator<T> for TokenStream { ... } 
macro quote($($t:tt)*) { ... } // A hygienic `TokenStream` quoter

// `TokenStream` destructuring: 
impl TokenStream { fn is_empty(&self) -> bool { ... } }
impl IntoIterator for TokenStream { type Item = TokenTree; ... }

struct TokenTree { span: Span, kind: TokenKind }
impl From<TokenKind> for TokenTree { ... }
impl Display for TokenTree { ... }

struct Span { ... } // a region of source code along with expansion/hygiene information
impl Default for Span { ... } // a span from the current procedural macro definition
impl Span { fn call_site() -> Span { ... } } // the call site of the current expansion
fn quote_span(span: Span) -> TokenStream;

enum TokenKind {
    Group(Delimiter, TokenStream), // A delimited sequence, e.g. `( ... )`
    Term(Term), // a unicode identifier, lifetime ('a), or underscore
    Op(char, Spacing), // a punctuation character (`+`, `,`, `$`, etc.).
    Literal(Literal), // a literal character (`'a'`), string (`"hello"`), or number (`2.3`)
}

enum Delimiter {
    Parenthesis, // `( ... )`
    Brace, // `[ ... ]`
    Bracket, // `{ ... }`
    None, // an implicit delimiter, e.g. `$var`, where $var is  `...`.
}

struct Term { ... } // An interned string
impl Term {
    fn intern(string: &str) -> Symbol { ... }
    fn as_str(&self) -> &str { ... }
}

enum Spacing {
    Alone, // not immediately followed by another `Op`, e.g. `+` in `+ =`.
    Joint, // immediately followed by another `Op`, e.g. `+` in `+=`
}

struct Literal { ... }
impl Display for Literal { ... }
impl Literal {
    fn integer(n: i128) -> Literal { .. } // unsuffixed integer literal
    fn float(n: f64) -> Literal { .. } // unsuffixed floating point literal
    fn u8(n: u8) -> Literal { ... } // similarly: i8, u16, i16, u32, i32, u64, i64, f32, f64
    fn string(string: &str) -> Literal { ... }
    fn character(ch: char) -> Literal { ... }
    fn byte_string(bytes: &[u8]) -> Literal { ... }
}

For details on quote! hygiene, see this example and declarative macros 2.0.

r? @nrc

@jseyfried
Copy link
Contributor Author

@eddyb
Copy link
Member

eddyb commented Mar 31, 2017

Can't we call TokenTree just Token? I'm a bit disappointed in how I fooled myself we could get rid of the "tree" model but flattening seems to not be in the cards.
However, I still think we can keep simpler names in the API.

@SimonSapin
Copy link
Contributor

Nice! I like the overall approach, some comments on the details:

impl IntoIterator for TokenStream { type Item = TokenTree; ... }

Would impls for &TokenStream and &mut TokenStream also be useful?

struct Symbol { ... } // An interned string

To me the word "symbol" evokes punctuation, excluding letters. Could this be called something else? Identifier, Ident, Word, … ? (Are the first two not appropriate because "identifier" excludes keywords?)

Joint, // immediately followed by another `Op`, e.g. `+` in `+=`

So the = in += would be a separate token? &&, .., or :: would similarly be two tokens each? (>>= or ... three tokens?) Please expand the doc-comments in the PR to explain this.

impl Display for Literal { ... }
impl Literal {

I’d also add:

  • fn byte_string(bytes: &[u8]) -> Literal, much more convenient that building a Sequence of u8 literals for a slice literal. (And more efficient?)
  • Maybe fn byte(byte: u8) -> Literal similar to Literal::u8 but with byte literal syntax, though it only makes a difference when looking at generated code for debugging.
  • fn as_str(&self) -> Option<&str> and other fn as_*(&self) -> Option<*> methods to extract the value. (Though number literals without a type suffix are tricky.) The Display impl technically provides the same information, but writing a Rust parser (even just for literals) is a complex undertaking.

In src/libproc_macro/lib.rs

#[unstable(feature = "proc_macro", issue = "38356")]
#[macro_export]
macro_rules! quote { () => {} }

In src/libproc_macro/quote.rs

macro_rules! quote {
    () => { TokenStream::empty() };
    ($($t:tt)*) => { [ $( quote_tree!($t), )* ].iter().cloned().collect::<TokenStream>() };
}

Why are there two quote macros, with one empty, and only the empty one exported?

@SimonSapin
Copy link
Contributor

I think this PR addresses all of my previous comments in #38356 (comment) 👍

@bors
Copy link
Contributor

bors commented Mar 31, 2017

☔ The latest upstream changes (presumably #40950) made this pull request unmergeable. Please resolve the merge conflicts.

@jseyfried
Copy link
Contributor Author

@eddyb

I'm a bit disappointed in how I fooled myself we could get rid of the "tree" model but flattening seems to not be in the cards.

Do you mean w.r.t. the internals? I believe we can still use the internal flattened representation you proposed with this API (imo, would be nice but not high priority).

@SimonSapin

Would impls for &TokenStream and &mut TokenStream also be useful?

impl IntoIterator for &TokenStream { type Item = TokenTree; ... } and/or
impl TokenStream { fn iter(&self) -> TokenIter { ... } } could be useful -- both would be equivalent to tokens.clone().into_iter().

impl IntoIterator for &TokenStream { type Item = &TokenTree; ... } isn't possible (yet, perhaps), and would make an internal "flattened" internal representation more complex / less efficient.

TokenStream uses shared, ref-counted memory, so I don't think impl IntoIterator for &mut TokenStream makes sense.

To me the word "symbol" evokes punctuation, excluding letters. Could this be called something else?

Yeah. Currently, Symbol is just an interned string, so Word/Ident don't seem appropriate. If we restrict the type to valid TokenKind::Words, I'd prefer Word for symmetry with TokenKind. Otherwise, perhaps Str/InternedStr? cc @nrc

So the = in += would be a separate token? &&, .., or :: would similarly be two tokens each?

Yeah, it's TokenKind::Op(char, OpKind), so a just single character per Op -- I'll mention that in the doc comment.

I’d also add [more methods for Literal]

Yeah, I agree that we should have Literal::{byte, byte_string, as_*}. This API isn't intended to be exhaustive, e.g. we also want API for emitting "first class" diagnostics (@sergio is working on this).

Why are there two quote macros, with one empty, and only the empty one exported?

The non-empty one is incomplete (by design) and only used to bootstrap the internal implementation of proc_macro::quote. The exported one is a placeholder for the internal implementation (à la lang items). After we support stability hygiene and stability-checking macro resolutions, we should be able to replace the internal implementation with a #[proc_macro].

@bors
Copy link
Contributor

bors commented Apr 5, 2017

☔ The latest upstream changes (presumably #40811) made this pull request unmergeable. Please resolve the merge conflicts.

nrc
nrc previously requested changes Apr 10, 2017
Copy link
Member

@nrc nrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bunch of changes requested, but most are just about docs, and I don't think there is anything major. Assuming all the fixes are straightforward, then r=me

pub struct TokenStream {
inner: TokenStream_,
}
pub struct TokenStream(tokenstream::TokenStream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not this PR (and pretty nitty), but just pointing out that it should either token_stream or Tokenstream, currently the naming is inconsistent.


/// An iterator over `TokenTree`s.
#[unstable(feature = "proc_macro", issue = "38356")]
pub struct TokenIter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this a TokenTreeIter? That leaves room for a flattening TokenIter and means the name and target correspond

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrc Yeah, TokenTreeIter makes more sense.

On a similar note, @eddyb proposed using TokenNode instead of TokenKind for better correspondence to TokenTree and to leave room for Token/TokenKind -- what do you think?

}

impl TokenTree {
fn from_raw(stream: tokenstream::TokenStream, next: &mut Option<tokenstream::TokenStream>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like from_raw as a name - 'raw' can mean so many different things, and perhaps the most likely here is as in raw string which is wrong. How about from_internal or from_rustc_internal or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change to from_internal.

TokenTree { span: Span(span), kind: kind }
}

fn to_raw(self) -> tokenstream::TokenStream {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise here.

And somewhat connected, is there a way to mark the API which we expect macro authors to use? Is it all the pub API? Or is some of that transitional or only intended for the compiler to use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's all the pub API not in __internal (which is unusable without #![feature(proc_macro_internals)]).

@@ -80,7 +524,11 @@ pub struct LexError {
/// all of the contents.
#[unstable(feature = "proc_macro_internals", issue = "27812")]
#[doc(hidden)]
#[path = ""]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for? I'm not sure what the effect is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mod quote; in __internal; this allows the file to be libproc_macro/quote.rs. Thinking about this some more, I'll move mod quote into the crate root and remove the #[path = ""].


//! # Quasiquoter
//! This file contains the implementation internals of the quasiquoter provided by `qquote!`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an overview of how the quasi-quoter works please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@@ -196,6 +201,10 @@ impl TokenStream {
}
}

pub fn builder() -> TokenStreamBuilder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be better as a free function rather than a method on TokenStream

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or it could be a new method on TokenStreamBuilder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change to TokenStreamBuilder::new() (motivation for TokenStream::builder() was to avoid making people use anything else into their scopes).

@@ -225,13 +234,107 @@ impl TokenStream {
}
true
}

pub fn as_tree(self) -> (TokenTree, bool /* joint? */) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these public methods get docs please? (In particular here, some more explanation of what joint means would be good).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since this takes self by value, should this be named into_tree instead? Or are the conventions different for reference counted things

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrc will do.
@lfairy Perhaps, I don't think it matters much.

@@ -8,47 +8,37 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.

#![feature(plugin, plugin_registrar, rustc_private)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like anyone with an existing procedural macro is going to have to make some significant changes. Could you write up either here or on internals or somewhere, a guide to the changes needed to make a proc macro work after this PR please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing procedural macros will lose the syntax::tokenstream::TokenStream quoter from proc_macro_internals, AFAIK other than that they won't have to make any changes. I could include a syntax::tokenstream::TokenStream quoter for back-compat.

How many people are actually using "existing procedural macros" though? This PR doesn't effect the widely used legacy plugin system; I didn't think many people were using SyntaxExtension::ProcMacro from #36154.

@@ -455,3 +461,38 @@ pub fn is_op(tok: &Token) -> bool {
_ => true,
}
}

#[derive(Clone, Eq, PartialEq, Debug)]
pub struct LazyTokenStream(RefCell<Option<TokenStream>>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be used by macro authors? If so could you document with why and when.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, nothing in libsyntax can be used by proc macro authors.

}

#[derive(Clone, Eq, PartialEq, Debug)]
pub struct LazyTokenStream(RefCell<Option<TokenStream>>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside: if the dynamic borrow checks end up too inefficient, then this could be re-implemented on top of MoveCell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::cell::Cell is now like MoveCell :) rust-lang/rfcs#1651

let mut opt_stream = self.0.borrow_mut();
if opt_stream.is_none() {
*opt_stream = Some(f());
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: extra semicolon

@Mark-Simulacrum Mark-Simulacrum added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Apr 14, 2017
@carols10cents
Copy link
Member

Friendly ping that there are merge conflicts and some comments for you @jseyfried! ❤️

@abonander
Copy link
Contributor

abonander commented Apr 23, 2017

Should we use a different feature gate for the unstable APIs? I'm under the impression that language feature gates and library feature gates are meant to be distinct from each other, and proc_macro currently falls under the language side.

@nikomatsakis
Copy link
Contributor

Hmm, is there a strict separation between language/lib feature-gates? I wouldn't think so.

@abonander
Copy link
Contributor

It's kind of implied by the structure of the Unstable Book but that might be more of an issue for #41476 where this came up originally.

@arielb1
Copy link
Contributor

arielb1 commented Apr 25, 2017

status: still waiting for #40847.

@arielb1
Copy link
Contributor

arielb1 commented May 2, 2017

status: still waiting for #40847.

1 similar comment
@carols10cents
Copy link
Member

status: still waiting for #40847.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 9, 2017

The lack of expansion info broke clippy together with proc macros, since code inside derives will now trigger lints. Is there any new way to detect expanded code? If not, what needs to be done in order to create expansion info that doesn't influence save analysis.

See https://github.com/Manishearth/rust-clippy/issues/1877

@nrc
Copy link
Member

nrc commented Jul 13, 2017

Since we no longer have the extra expansion information, @nrc's save-analysis code tries to re-parses the call site span and expects it to match the proc-macro generated tokens.

Ah, I wondered why I was seeing so many errors in the RLS :-(

I think we must fix this. This feels like another reason not to conflate macro hygiene with expansion history. That might be too hard a way to fix this though.

If we want to fix this, we could instead mark tokens from FromStr for TokenStream specially so that save-analysis would know that they are "weird", but so that they would still behave like unexpanded tokens w.r.t. hygiene. I don't have a strong opinion here.

Not using spans for hygiene seems a much nicer fix here - the span should reflect the expansion history and the hygiene should be altered (to erase hygiene, essentially). I expect we want the correct expansion info for errors as well as save-analysis.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 13, 2017

I already have a fix in #43179

@jseyfried
Copy link
Contributor Author

jseyfried commented Jul 13, 2017

@nrc
We could instead just include hygiene bending information along with expansion information so that span expansion info and hygiene info remain unified (@oli-obk's solution is fine for impl FromStr for TokenStream, so we're OK for now).

That is, I don't think think hygiene bending requires us to split up span/hygiene info or construct an "fake" expansion history just for hygiene. I think it'd be nicer to have a single source of truth for expansion info that is expressive enough for hygiene bending.

bors added a commit that referenced this pull request Jul 20, 2017
Tidy: allow common lang+lib features

This allows changes to the Rust language that have both library
and language components share one feature gate.

The feature gates need to be "about the same change", so that both
library and language components must either be both unstable, or
both stable, and share the tracking issue.

Removes the ugly "proc_macro" exception added by #40939.

Closes #43089
@nrc
Copy link
Member

nrc commented Jul 25, 2017

So #43179 did not fix the expansion info issue (at least not completely) - I'm still seeing errors in the RLS where we are treating macro-generated names as hand-written ones (and I think this is causing #43371 too).

bors added a commit that referenced this pull request Aug 1, 2017
Three small fixes for save-analysis

First commit does some naive deduplication of macro uses. We end up with lots of duplication here because of the weird way we get this data (we extract a use for every span generated by a macro use).

Second commit is basically a typo fix.

Third commit is a bit interesting, it partially reverts a change from #40939 where temporary variables in format! (and thus println!) got a span with the primary pointing at the value stored into the temporary (e.g., `x` in `println!("...", x)`). If `format!` had a definition it should point at the temporary in the macro def, but since it is built-in, that is not possible (for now), so `DUMMY_SP` is the best we can do (using the span in the callee really breaks save-analysis because it thinks `x` is a definition as well as a reference).

There aren't a test for this stuff because: the deduplication is filtered by any of the users of save-analysis, so it is purely an efficiency change. I couldn't actually find an example for the second commit that we have any machinery to test, and the third commit is tested by the RLS, so there will be a test once I update the RLS version and and uncomment the previously failing tests).

r? @jseyfried
@aidanhs aidanhs mentioned this pull request Aug 10, 2017
bors added a commit that referenced this pull request Feb 15, 2019
Remove `LazyTokenStream`.

`LazyTokenStream` was added in #40939. Perhaps it was an effective optimization then, but no longer. This PR removes it, making the code both simpler and faster.

r? @alexcrichton
Centril added a commit to Centril/rust that referenced this pull request Feb 20, 2019
…etrochenkov

Remove `LazyTokenStream`.

`LazyTokenStream` was added in rust-lang#40939. Perhaps it was an effective optimization then, but no longer. This PR removes it, making the code both simpler and faster.

r? @alexcrichton
Centril added a commit to Centril/rust that referenced this pull request Feb 23, 2019
…etrochenkov

Remove `LazyTokenStream`.

`LazyTokenStream` was added in rust-lang#40939. Perhaps it was an effective optimization then, but no longer. This PR removes it, making the code both simpler and faster.

r? @alexcrichton
Centril added a commit to Centril/rust that referenced this pull request Feb 23, 2019
…etrochenkov

Remove `LazyTokenStream`.

`LazyTokenStream` was added in rust-lang#40939. Perhaps it was an effective optimization then, but no longer. This PR removes it, making the code both simpler and faster.

r? @alexcrichton
Centril added a commit to Centril/rust that referenced this pull request Feb 23, 2019
…etrochenkov

Remove `LazyTokenStream`.

`LazyTokenStream` was added in rust-lang#40939. Perhaps it was an effective optimization then, but no longer. This PR removes it, making the code both simpler and faster.

r? @alexcrichton
Centril added a commit to Centril/rust that referenced this pull request Feb 23, 2019
…etrochenkov

Remove `LazyTokenStream`.

`LazyTokenStream` was added in rust-lang#40939. Perhaps it was an effective optimization then, but no longer. This PR removes it, making the code both simpler and faster.

r? @alexcrichton
Centril added a commit to Centril/rust that referenced this pull request Feb 23, 2019
…etrochenkov

Remove `LazyTokenStream`.

`LazyTokenStream` was added in rust-lang#40939. Perhaps it was an effective optimization then, but no longer. This PR removes it, making the code both simpler and faster.

r? @alexcrichton
BoredApe8461 added a commit to BoredApe8461/proc-macro2 that referenced this pull request Aug 12, 2024
This is verified to compile and work against the current version of
rust-lang/rust#40939.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.