-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Declarative macro metavariable expressions #3086
Conversation
Repetition counting can be efficiently done with recursion like so: playground. Even for 100 000 elements on my 7+year old laptop compiles in ~5s. Maybe something like this should be added to macro_rules! count {
() => { 0 };
($odd:tt $($a:tt $b:tt)*) => {
(count!($($a)*) << 1) | 1
};
($($a:tt $even:tt)*) => {
(count!($($a)*) << 1)
};
} We could make this faster with more splitting cases, for example instead of just two branches, you could do 4 branches and that get's the time down to 1.5s for 100 000 elements on my machine macro_rules! count {
() => { 0 };
($($a:tt $b:tt $c:tt $d:tt)*) => {
count!($($a)*) << 2
};
($odd:tt $($a:tt $b:tt $c:tt $d:tt)*) => {
(count!($($a)*) << 2) | 1
};
($odd_1:tt $odd_2:tt $($a:tt $b:tt $c:tt $d:tt)*) => {
(count!($($a)*) << 2) | 2
};
($odd_1:tt $odd_2:tt $odd_3:tt $($a:tt $b:tt $c:tt $d:tt)*) => {
(count!($($a)*) << 2) | 3
};
} |
But then every single macro writer also needs to know how to do that. and that also does not handle the other cases that this will initially supports (index, length, ignore). |
Not if it's in
macro_rules! ignore {
($($a:tt)*) => ();
} This just leaves |
That macro is very clever! It doesn't match anything I found when I was trying to find how to do this, and it's a lot better in terms of performance. Since it's counting the bits in the length it has a constant recursion depth. In an earlier draft of an RFC that targetted counts specifically (rendered), I benchmarked the ones I found in the wild. The common recursive cases caused compiler stack overflows and took 10s of seconds for large counts. The more efficient implementations (taking the length of a slice) still took 6 or more seconds for 10,000 items. My prototype implementation was considerably faster (it's effectively instant to get the count or index, since it was just accessing internal compiler state), which is what motivated working on this RFC. Interestingly I just repeated my benchmark with the slice-length approach and it's also much improved, so kudos to everyone working on compiler performance. New benchmarks (time to compile a simple program that counts a repetition with 10024 items, average of 100 runs):
|
More benchmarks, increasing the length to 100,251 items:
|
Even more benchmarks. Since most users don't count a single macro with tens of thousands of items, I figured a more reasonable benchmark would be large numbers of invocations of macros, each counting a modest number of items, simulating a large codebase with lots of uses of a macro that needs to count things. For example, the codebase I work in has hundreds of invocations of I made a new test case that counted 1,000 instances of a repetition with 101 items. The results were a bit surprising:
You can see a cut down version counting 20 x 500 in the playground. This takes around 8s to compile, although that varied quite a bit. If you add more count invocations then it typically hits the playground timeout. |
Oh wow! I wasn't expecting that! In that case, it looks like the metavariable approach does carry it's weight better than I expected. |
@markbt |
However, I don't want to frame this RFC as being just about the performance of counting. These alternatives don't help so well with the |
current repetition, and the `#` character had been used in similar proposals | ||
for that. There was some reservation expressed for the use of the `#` token | ||
because of the cognitive burden of another sigil, and its common use in the | ||
`quote!` macro. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, I already asked this on Zulip, but could you document what syntactic space is available for this feature (and similar features) in general?
The main issue with choosing a syntax here is that pretty much any syntax is valid on the right hand side of the macro because it represents macro output, which can be an arbitrary token stream.
So, we are lucky that the combination $ {
turned out reserved.
Are any other reserved combinations that can be used for new macro features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe $[ ... ]
was also available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @Lokathor says, this is mentioned briefly in the "Future possibilities" section at the bottom, but I will expand on it.
Currently the $
symbol can only be followed by an identifier or (
, so anything else can be used in a language extension. This RFC specifies ${ .. }
and $$
, but $[ .. ]
remains invalid, as does $
followed by any other symbol (so $@
, $:
, $!
or similar could be used).
Additionally, metavariable expressions are intended to be extensible themselves. This RFC defines count
, index
, length
and ignore
, but future RFCs can add additional expressions of the form ${foo(fooargs...)}
. Anything that fits within this pattern and can be evaluated by the macro transcriber would be a suitable candidate for another expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, and I like that it carves out an extensible space for future improvements to macro syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry slightly that ${...}
and $(...)
might look too similar. That might be an artifact of my font.
In particular, the its going to be tricky to give good diagnostics when a user writes $(...)
when they meant to write ${...}
, and vice versa. Especially if their macro body happens to refer to names like count
or length
But I don't have great counter-suggestions; $[...]
might be just as bad (though I do think it is easier to distinguish from $(...)
.) The only other counter-suggestion can think of is ${{...}}
, but that might be bridge too far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your worry. I also have some reservations about the )}
cluster in ${count(ident)}
.
We don't have to use a delimited block. Since currently anything other than ident
or (..)
is invalid after $
we could use some other sigil. Some examples:
$:count(ident)
e.g.let v = Vec::with_capacity($:count(values))
$@count(ident)
e.g.let v = Vec::with_capacity($@count(values))
$!count(ident)
e.g.let v = Vec::with_capacity($!count(values))
Using the last one as an example, this would be parsed as: $
!
<metavar-expr-function>
(
<metavar-expr-args...>
)
Other suggestions also welcome.
This looks great to me. |
I was a little surprised that RFC PR #88 was not referenced in the text nor in the comments so far on this PR. |
Based on discussion in today's @rust-lang/lang meeting, I'm proposing to merge this, to gauge consensus via rfcbot (rather than, as we sometimes do, waiting until we already know what the rfcbot outcome will be before proposing it). People are still welcome to take further time to review. @rfcbot merge |
Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
Somehow I missed that PR when I was doing my research, so thanks for the pointer. It makes a good point that these index numbers can be used as tuple indexes (something that isn't possible with computed values). For example, it becomes possible to write a macro like this:
Which outputs:
Without this you'd need some kind of destructuring of the tuple, which is hard as you need names for the fields. |
Since metavariable expressions always apply during the expansion of the macro, | ||
they cannot be used in recursive macro definitions. To allow recursive macro | ||
definitions to use metavariable expressions, the `$$` expression expands to a | ||
single `$` token. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are they any places where this would want to go deeper? Would it be helpful to have $$$
that expands to $$
, instead of needing $$$$
? (Does $$$$
work with this RFC, actually? Is exponential escaping bad, or fine because "just don't go that deep"?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going deeper is only necessary if macro definitions are multiple-times recursive (a macro that defines a macro that defines a macro), and you want to defer metavariable expansions or repetitions to the inner macros in ways that are otherwise ambiguous. The doubling up of the escape characters for each level is necessary so that at each nesting level you can represent a meta-variable whose name is stored in another meta-variable. An even number of $
followed by var
(e.g. $$$$var
) expands to n/2
$
s followed by a literal var (e.g. $$var
). An odd number of $
expands to (n-1)/2
$
s followed by the expansion of $var
, (e.g. if $var
== foo
then $$$$$var
expands to $$foo
).
An example of where this would be necessary in existing code is here. This code is currently using $dol
as a hack for what $$
would provide, and $dol $arg
would become $$$arg
.
This is the same as for \
-escaping in strings, and most other kinds of escaping in other languages, so it should be familiar to users.
Although sixteen $
in a row wouldn't be great, quadruply-recursive macro definitions are probably not a great idea either, and it should be possible to break the macro down into separate parts with less nesting if that does become a concern.
The author believes it is worth the overhead of new syntax, as even though | ||
there exist workarounds for obtaining the information if it's really needed, | ||
these workarounds are sometimes difficult to discover and naive | ||
implementations can significantly harm compiler performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the workarounds for these are non-obvious, and that it's worth giving them well-known names. But I think the jump to new syntax could be better-motivated in the RFC.
For example, why not std::macro_utils::count!(...)
instead of ${count(...)}
? If it can be written as a macro as @RustyYato showed, that would then leave it up to an implementation to choose whether to add special compiler code to optimize it or just decide that the binary matching trick is good enough.
(I suspect it won't be too hard to convince me that syntax is worth it, but I'd still like to see it addressed in the RFC text.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workaround macros work by expanding to count!($( $value )*)
: i.e. the compiler must generate a sequence by expanding the repetition, re-parsing it as part of the count!
macro invocation, and then computing the length. This is the redundant additional work that this RFC seeks to address.
The reason for new syntax is that these expansions occur during macro transcription, rather than as their own macro expansions. ${count(ident)}
would be transcribed directly to the literal count, whereas count!(ident)
in a macro body would be transcribed to count!(ident)
(there is no change as the transcriber has nothing to do - it doesn't peek inside macro invocations), at which point the information about what ident
means is lost and the count!
macro has no knowledge about what it is counting or what context it is counting it in.
Another way to think of metavariable expressions is as "macro transcriber directives". You can then think of the macro transcriber as performing the following:
$var
=> the value ofvar
$( ... )*
=> a repetition${ directive }
=> a special transcriber directive$$
=>$
Perhaps describing it like this makes it a bit clearer that these are special things the transcriber to do (not necessarily limited to counts and indexes, but that is what this RFC focuses on).
We could special-case these macro invocations during transcription, but that feels like a worse solution. It would make it harder to understand what the macro transcriber is going to do with arbitrary code if you don't remember all of the special macros that don't work like other macros.
(Conversely, I think there might be existing special macros that might have been better written as metavariable expressions if they had already existed. While I haven't thought it through fully, file!()
, line!()
and column!()
spring to mind as candidates).
${count(x, 1)} ${count(x, 2)} ${count(x, 3)} $( a $( b $( $x )* )* )* | ||
``` | ||
|
||
The three values this expands to are the number of outer-most repetitions (the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be my ignorance of macros speaking, but does the RFC need to specify which kind of fragment they produce? Are they allowed to expand to (0+4)
or 1+1+1+1
or 2*2
or 4_usize
, or only to exactly 4
? Is there any way I can start to depend on that expansion, like writing a macro that checks a length by only accepting a particular token?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't, but it would be a good change to add it. It should expand to a literal with the appropriate value and no suffix (i.e. only and exactly 4
). This allows consistent use in things like stringify!
and tuple indexing. Type inferencing should be able to infer the correct type for when it is used in code (and also produce an error if the value that is produced won't fit inside the target type).
Thanks for the review @scottmcm. I will add some text to the RFC to expand on these points (although probably not until the weekend). |
On Sun, Mar 07, 2021 at 07:48:51PM -0800, scottmcm wrote:
This might be my ignorance of macros speaking, but does the RFC need to specify which kind of fragment they produce? Are they allowed to expand to `(0+4)` or `1+1+1+1` or `2*2` or `4_usize`, or only to exactly `4`? Is there any way I can start to depend on that expansion, like writing a macro that checks a length by only accepting a particular token?
I think it'd be a good idea to clarify that it expands to a single
integer literal, yes.
|
@rfcbot reviewed |
On Sat, Mar 13, 2021 at 02:58:12AM -0800, Mark Juggurnauth-Thomas wrote:
@markbt commented on this pull request.
> +[rationale-and-alternatives]: #rationale-and-alternatives
+
+This RFC proposes a modest but powerful extension to macro syntax that makes
+it possible to obtain information that the compiler already knows, but
+requires inefficient and complex techniques to obtain in the macro.
+
+The original proposal was for a shorter syntax to provide the count of
+repetitions: `$#ident`. During discussions of this syntax, it became clear
+that it was not obvious as to which number this referred to: the count of
+repetitions at this level, or the length of the current repetition. It also
+does not provide a way to discover counts or lengths for other repetition
+depths. There was also interest in being able to discover the index of the
+current repetition, and the `#` character had been used in similar proposals
+for that. There was some reservation expressed for the use of the `#` token
+because of the cognitive burden of another sigil, and its common use in the
+`quote!` macro.
I see your worry. I also have some reservations about the `)}` cluster in `${count(ident)}`.
We don't have to use a delimited block.
We don't, but I like that this proposal does so, because that then
reserves all possible syntax within the delimited block for future
extensions. And it feels less line-noise-y than other alternatives, as
well as more distinct from `$ident`.
|
I've kept it as |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
cc @rust-lang/libs This isn't a libs RFC, but in discussions in the language team meeting, we felt that since this was introducing a new family of built-in names (similar to built-in macros), it'd be appropriate to make sure libs was aware and didn't have any concerns. |
Dropping nomination - looks like this is on track and doesn't need T-lang discussion right now. |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. The RFC will be merged soon. |
Huzzah! The @rust-lang/lang team has decided to accept this RFC. If you'd like to follow along with its development, please subscribe to the tracking issue rust-lang/rust#83527. |
…lett Stabilize `$$` in Rust 1.63.0 # Stabilization proposal This PR proposes the stabilization of a subset of `#![feature(macro_metavar_expr)]` or more specifically, the stabilization of dollar-dollar (`$$`). Tracking issue: rust-lang#83527 Version: 1.63 (2022-06-28 => beta, 2022-08-11 => stable). ## What is stabilized ```rust macro_rules! foo { () => { macro_rules! bar { ( $$( $$any:tt )* ) => { $$( $$any )* }; } }; } fn main() { foo!(); } ``` ## Motivation For more examples, see the [RFC](https://github.com/markbt/rfcs/blob/macro_metavar_expr/text/0000-macro-metavar-expr.md). Users must currently resort to a tricky and not so well-known hack to declare nested macros with repetitions. ```rust macro_rules! foo { ($dollar:tt) => { macro_rules! bar { ( $dollar ( $any:tt )* ) => { $dollar ( $any )* }; } }; } fn main() { foo!($); } ``` As seen above, such hack is fragile and makes work with declarative macros much more unpleasant. Dollar-dollar (`$$`), on the other hand, makes nested macros more intuitive. ## What isn't stabilized `count`, `ignore`, `index` and `length` are not being stabilized due to the lack of consensus. ## History * 2021-02-22, [RFC: Declarative macro metavariable expressions](rust-lang/rfcs#3086) * 2021-03-26, [Tracking Issue for RFC 3086: macro metavariable expressions](rust-lang#83527) * 2022-02-01, [Implement macro meta-variable expressions](rust-lang#93545) * 2022-02-25, [[1/2] Implement macro meta-variable expressions](rust-lang#94368) * 2022-03-11, [[2/2] Implement macro meta-variable expressions](rust-lang#94833) * 2022-03-12, [Fix remaining meta-variable expression TODOs](rust-lang#94884) * 2019-03-21, [[macro-metavar-expr] Fix generated tokens hygiene](rust-lang#95188) * 2022-04-07, [Kickstart the inner usage of macro_metavar_expr](rust-lang#95761) * 2022-04-07, [[macro_metavar_expr] Add tests to ensure the feature requirement](rust-lang#95764) ## Non-stabilized expressions rust-lang#83527 lists several concerns about some characteristics of `count`, `index` and `length` that effectively make their stabilization unfeasible. `$$` and `ignore`, however, are not part of any discussion and thus are suitable for stabilization. It is not in the scope of this PR to detail each concern or suggest any possible converging solution. Such thing should be restrained in this tracking issue. ## Tests This list is a subset of https://github.com/rust-lang/rust/tree/master/src/test/ui/macros/rfc-3086-metavar-expr * [Ensures that nested macros have correct behavior](https://github.com/rust-lang/rust/blob/master/src/test/ui/macros/rfc-3086-metavar-expr/dollar-dollar-has-correct-behavior.rs) * [Compares produced tokens to assert expected outputs](https://github.com/rust-lang/rust/blob/master/src/test/ui/macros/rfc-3086-metavar-expr/feature-gate-macro_metavar_expr.rs) * [Checks the declarations of the feature](https://github.com/rust-lang/rust/blob/master/src/test/ui/macros/rfc-3086-metavar-expr/required-feature.rs) * [Verifies all possible errors that can occur due to incorrect user input](https://github.com/rust-lang/rust/blob/master/src/test/ui/macros/rfc-3086-metavar-expr/syntax-errors.rs) ## Possible future work Once consensus is achieved, other nightly expressions can be stabilized. Thanks `@markbt` for creating the RFC and thanks to `@petrochenkov` and `@mark-i-m` for reviewing the implementations.
…lett Stabilize `$$` in Rust 1.63.0 # Stabilization proposal This PR proposes the stabilization of a subset of `#![feature(macro_metavar_expr)]` or more specifically, the stabilization of dollar-dollar (`$$`). Tracking issue: rust-lang#83527 Version: 1.63 (2022-06-28 => beta, 2022-08-11 => stable). ## What is stabilized ```rust macro_rules! foo { () => { macro_rules! bar { ( $$( $$any:tt )* ) => { $$( $$any )* }; } }; } fn main() { foo!(); } ``` ## Motivation For more examples, see the [RFC](https://github.com/markbt/rfcs/blob/macro_metavar_expr/text/0000-macro-metavar-expr.md). Users must currently resort to a tricky and not so well-known hack to declare nested macros with repetitions. ```rust macro_rules! foo { ($dollar:tt) => { macro_rules! bar { ( $dollar ( $any:tt )* ) => { $dollar ( $any )* }; } }; } fn main() { foo!($); } ``` As seen above, such hack is fragile and makes work with declarative macros much more unpleasant. Dollar-dollar (`$$`), on the other hand, makes nested macros more intuitive. ## What isn't stabilized `count`, `ignore`, `index` and `length` are not being stabilized due to the lack of consensus. ## History * 2021-02-22, [RFC: Declarative macro metavariable expressions](rust-lang/rfcs#3086) * 2021-03-26, [Tracking Issue for RFC 3086: macro metavariable expressions](rust-lang#83527) * 2022-02-01, [Implement macro meta-variable expressions](rust-lang#93545) * 2022-02-25, [[1/2] Implement macro meta-variable expressions](rust-lang#94368) * 2022-03-11, [[2/2] Implement macro meta-variable expressions](rust-lang#94833) * 2022-03-12, [Fix remaining meta-variable expression TODOs](rust-lang#94884) * 2019-03-21, [[macro-metavar-expr] Fix generated tokens hygiene](rust-lang#95188) * 2022-04-07, [Kickstart the inner usage of macro_metavar_expr](rust-lang#95761) * 2022-04-07, [[macro_metavar_expr] Add tests to ensure the feature requirement](rust-lang#95764) ## Non-stabilized expressions rust-lang#83527 lists several concerns about some characteristics of `count`, `index` and `length` that effectively make their stabilization unfeasible. `$$` and `ignore`, however, are not part of any discussion and thus are suitable for stabilization. It is not in the scope of this PR to detail each concern or suggest any possible converging solution. Such thing should be restrained in this tracking issue. ## Tests This list is a subset of https://github.com/rust-lang/rust/tree/master/src/test/ui/macros/rfc-3086-metavar-expr * [Ensures that nested macros have correct behavior](https://github.com/rust-lang/rust/blob/master/src/test/ui/macros/rfc-3086-metavar-expr/dollar-dollar-has-correct-behavior.rs) * [Compares produced tokens to assert expected outputs](https://github.com/rust-lang/rust/blob/master/src/test/ui/macros/rfc-3086-metavar-expr/feature-gate-macro_metavar_expr.rs) * [Checks the declarations of the feature](https://github.com/rust-lang/rust/blob/master/src/test/ui/macros/rfc-3086-metavar-expr/required-feature.rs) * [Verifies all possible errors that can occur due to incorrect user input](https://github.com/rust-lang/rust/blob/master/src/test/ui/macros/rfc-3086-metavar-expr/syntax-errors.rs) ## Possible future work Once consensus is achieved, other nightly expressions can be stabilized. Thanks ``@markbt`` for creating the RFC and thanks to ``@petrochenkov`` and ``@mark-i-m`` for reviewing the implementations.
Implement `${count()}` metavariable expression This PR implements `${count()}` metavariable expression for MBE as described in [RFC 3086][rfc]. See [this section][sec-count] of the RFC for its semantics. Additionally, this PR changes the type of depth parameter of `${index()}` from u32 to usize so as to match how [rustc parses it][index-usize]. Part of #11952 Fixes #14871 [rfc]: rust-lang/rfcs#3086 [sec-count]: https://github.com/rust-lang/rfcs/blob/master/text/3086-macro-metavar-expr.md#count [index-usize]:https://github.com/rust-lang/rust/blob/ddad0576caf8d0515ed453e04b468977c7d3dfc1/compiler/rustc_expand/src/mbe/metavar_expr.rs#L22
refactor:higher-order http method macro we want to switch to rust nightly to be able to make use of rust metavariable expansions as defined by RFC rust-lang/rfcs#3086 and as tracked by rust-lang/rust#83527. other references include rust-lang/rust#99035. this feature was stabilized in 1.63, then unstabilized again in rust-lang/rust#99435 and is now only available in rust nightly, awaiting restabilization. however, the feature is stable enough for our use case, which is why i'm going ahead and enabling it.
refactor:higher-order http method macro we want to switch to rust nightly to be able to make use of rust metavariable expansions as defined by RFC rust-lang/rfcs#3086 and as tracked by rust-lang/rust#83527. other references include rust-lang/rust#99035. this feature was stabilized in 1.63, then unstabilized again in rust-lang/rust#99435 and is now only available in rust nightly, awaiting restabilization. however, the feature is stable enough for our use case, which is why i'm going ahead and enabling it.
Add new syntax to declarative macros to give their authors easy access to additional metadata about macro metavariables, such as the index, length, or count of macro repetitions.
This RFC has been drafted as part of the lang-team declarative macro repetition counts project.
Rendered