Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard Patterns #3637

Merged
merged 18 commits into from
Sep 4, 2024
Merged

Guard Patterns #3637

merged 18 commits into from
Sep 4, 2024

Conversation

max-niederman
Copy link
Contributor

@max-niederman max-niederman commented May 15, 2024

text/3637-guard-patterns.md Outdated Show resolved Hide resolved
text/3637-guard-patterns.md Show resolved Hide resolved
@kennytm
Copy link
Member

kennytm commented May 15, 2024

the keyword if is currently in the follow-set of a $:pat:

macro_rules! check_pat {
    ($p:pat) => { 1 };
    ($p:pat if $e:expr) => { 2 };
}

fn main() {
    assert_eq!(check_pat!(Some(3) | None), 1);
    assert_eq!(check_pat!(Some(e) if e == 3), 2);
    // assert_eq!(check_pat!((Some(e) if e == 3)), 1); // <- workaround for existing editions
}

this means at minimum this RFC will require a new Edition, or require that $:pat can only match PatternNoTopGuard


The only bindings available to guard conditions are
- bindings from the scope containing the pattern match, if any; and
- bindings introduced by identifier patterns _within_ the guard pattern.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with #2294 (rust-lang/rust#51114) the guard can be an if let clause that introduces binding to the match arm:

#![feature(if_let_guard)]

fn main() {
    match Some(Some(4)) {
        Some(a) if let Some(b) = a => assert_eq!(b, 4),
        _ => unreachable!(),
    }
}

how will if-let guard interact with this RFC?

// error?
(Some(a) if let Some(b) = a.foo()) => using(b),

// error?
Some(a if let Some(b) = a.foo()) => using(b),

// error? assuming the `b` on both branches have the same type
(Some(Some(a)) if let Ok(b) = a.foo()) | (Some(Some(a)) if let Err(b) = a.bar()) => using(b),

// definitely error
(Some(a) if let Some(b) = a.foo()) | None => using(b),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wasn't aware of this feature. If you look in the Future Possibilities section I do mention the possibility of allowing if let, but I hadn't realized it would be necessary for this RFC. I think probably all three of those patterns should be allowed, but to be cautious it might be better to restrict it to work only on the top-level for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary since if let guards aren't stable. Most likely when both features are enabled together then if let guards would be allowed in patterns everywhere that if guards are. Implementation-wise we can also forbid combining these two features to start with if that proves too tricky.

_ => {
// the user doesn't have enough credit, return an error message
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same example using is expressions:

if plan is Plan::Regular && credit >= 100 || plan is Plan::Premium && credit >= 80 {
  // complete the transaction
} else {
  // return an error
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not the same, the example uses user.subscription_plan() and user.credit() here rather than plan and credit. this rewrite will evaluate plan twice.

plus you don't even need is expression here as there is no binding, matches!(plan, Plan::Regular) is sufficient, and if Plan derives PartialEq you could even use plan == Plan::Regular.

@ehuss ehuss added the T-lang Relevant to the language team, which will review and decide on the RFC. label May 15, 2024
@max-niederman
Copy link
Contributor Author

the keyword if is currently in the follow-set of a $:pat:

macro_rules! check_pat {
    ($p:pat) => { 1 };
    ($p:pat if $e:expr) => { 2 };
}

fn main() {
    assert_eq!(check_pat!(Some(3) | None), 1);
    assert_eq!(check_pat!(Some(e) if e == 3), 2);
    // assert_eq!(check_pat!((Some(e) if e == 3)), 1); // <- workaround for existing editions
}

this means at minimum this RFC will require a new Edition, or require that $:pat can only match PatternNoTopGuard

True, forgot to check that. I think I'd prefer making pat fragments match only PatternNoTopGuard to gating it on a new edition, since you can always just wrap the guard in parentheses to invoke a pattern with it.

@programmerjake
Copy link
Member

True, forgot to check that. I think I'd prefer making pat fragments match only PatternNoTopGuard to gating it on a new edition, since you can always just wrap the guard in parentheses to invoke a pattern with it.

I think a better plan is to make :pat match PatternNoTopGuard on edition 2021, and then the next edition (2024 if we're not too late already, otherwise whatever edition comes after that) would have :pat match Pattern. There would also be a :pat_no_top_guard (or some shorter name) that works on all editions.

This means this feature can be stabilized without waiting for a new edition.

This is just like what happened with :pat and :pat_param
https://doc.rust-lang.org/edition-guide/rust-2021/or-patterns-macro-rules.html

@petrochenkov
Copy link
Contributor

This may be a bit vague, but running an arbitrary logic for conditions in patterns sort of goes against the idea of a "pattern".

Pattern is a "leaf" element of a condition and if you need to follow up on it using executable logic, you either

  • open a new block {}, like in if-let
  • or use some sugar to resyntax that block into a chain, ideally like expr is pat(binding) && condition(binding), or with if-let chains as a handicapped variant of that.

(This mostly applies to nonexhaustive matching, but matches involving if guards are by definition non-exhaustive and have to have fallback "else" clauses at some level.)

The subscription_plan example in particular looks like an if-else chain forced into a match, and would look better either without a match, or without if guards.
Basically, my general conclusion is that not everything should be a match, regular ifs and elses are good too.


I have encountered the specific issue of two match arms being the same but not mergeable, but not too often.
Maybe a compromise solution extending match arms but not extending patterns would be suitable (since match arms already allow executable conditions anyway).
There are other situations where it's useful, not involving if guards, for example

match foo {
  A(_want_to_name_this_binding_explicitly_for_exhaustiveness_even_if_it_is_not_used) => {
    // same logic
  }
  B /* no equivalent binding here */ => {
   // same logic
  }
}

@kennytm
Copy link
Member

kennytm commented May 16, 2024

There are other situations where it's useful, not involving if guards, for example

match foo {
  A(_want_to_name_this_binding_explicitly_for_exhaustiveness_even_if_it_is_not_used) => {
    // same logic
  }
  B /* no equivalent binding here */ => {
   // same logic
  }
}

if you need to merge the two branches you could just use a comment instead of a real name?

match foo {
    A(_ /* name this binding explicitly for exhaustiveness */) | B => {
        // logic
    }
}

@max-niederman
Copy link
Contributor Author

the keyword if is currently in the follow-set of a $:pat:

macro_rules! check_pat {
    ($p:pat) => { 1 };
    ($p:pat if $e:expr) => { 2 };
}

fn main() {
    assert_eq!(check_pat!(Some(3) | None), 1);
    assert_eq!(check_pat!(Some(e) if e == 3), 2);
    // assert_eq!(check_pat!((Some(e) if e == 3)), 1); // <- workaround for existing editions
}

this means at minimum this RFC will require a new Edition, or require that $:pat can only match PatternNoTopGuard

@kennytm @programmerjake

I've added a section explaining how we can handle this the same way it was handled for or-patterns.

> _PatternNoTopGuard_ :\
> &nbsp;&nbsp; &nbsp;&nbsp; `|`<sup>?</sup> _PatternNoTopAlt_ ( `|` _PatternNoTopAlt_ )<sup>\*</sup>

With `if let` and `while let` expressions now using `PatternNoTopGuard`. `let` statements and function parameters can continue to use `PatternNoTopAlt`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A for expression also accepts a Pattern, but the pattern is required to be irrefutable.

    for Ok(x) | Err(x) in vec![Ok(5), Err(6), Ok(7), Err(8)] {
        println!("{x}");
    }

I think this be changed to PatternNoTopGuard too?

Expression Irrefutable Current grammar This RFC
let(/else) PatternNoTopAlt PatternNoTopAlt
Function & closure parameter PatternNoTopAlt PatternNoTopAlt
match Pattern MatchArmGuard? Pattern
if/while let Pattern PatternNoTopGuard
for Pattern PatternNoTopGuard

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could be, but wouldn't it be better to give an error saying the pattern must be refutable rather than a parse error?

@dlight
Copy link

dlight commented May 17, 2024

This may be a bit vague, but running an arbitrary logic for conditions in patterns sort of goes against the idea of a "pattern".

I think that allowing guards in nested patterns is a natural step after allowing | in nested patterns, which is done since 1.53.

Guards are already part of the match construct, and allowing it in inner patterns doesn't actually bring a new functionality, it's just syntax sugar. The compiler is presumably allowed to hoist the guards outside the patterns (if they guarantee the guard only runs once), like you would do manually today.

(Outside match, adding an inner guard means just wrapping the whole thing inside an enclosing if; again, just syntax sugar)

The only issue with guards for me is that it trips the exhaustiveness checking (the compiler doesn't know if two or more guards cover all possibilities, so you need to delete the last guard to make it happy), but that's already an issue for the current guard feature in match, and allowing inner guards don't make it any worse.

(There is an analogous issue with bare if: the compiler can't detect if two or more successive if /else if are exhaustive and you need to replace the last one with else, and this is relevant when analyzing control flow)

But anyway I think that this language feature is just syntax sugar and should be approached as such.

@clarfonthey
Copy link
Contributor

This feels like a direct counter to the proposal in #3573: instead of allowing patterns as conditions, allow conditions as patterns. And I'll be frank, for the reasons mentioned, I don't like this.

Adding a condition to a pattern essentially means that the pattern isn't covered at all, and so, it's just adding the pattern and condition together as a single condition, with the minor difference that if the pattern was covered completely in a previous branch, it'd be noticed as dead code.

So... I would much rather have this just as the ability to combine patterns in an if statement beforehand, rather than a way of making match arms more confusing.

@coolreader18
Copy link

Would this allow disjoint sets of bindings? e.g.

(A(x, y) if x == y) | B(x) => ...

To me it seems intuitive that this would be allowed (as it would now have a reason to exist, whereas before there would be no reason to bind a variable in only one branch of an or-pattern). Trying to access x would give a this variable was not bound in all branches error, or maybe allow accessing an x from an outer scope? Not sure about that.

@programmerjake
Copy link
Member

(A(x, y) if x == y) | B(x) => ...

x is bound in both branches, maybe you meant y since that's not mentioned in the B branch?

text/3637-guard-patterns.md Outdated Show resolved Hide resolved
[drawbacks]: #drawbacks

Rather than matching only by structural properties of ADTs, equality, and ranges of certain primitives, guards give patterns the power to express arbitrary restrictions on types. This necessarily makes patterns more complex both in implementation and in concept.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another drawback: patterns may now have arbitrary side-effects by evaluating expressions. Currently, no kind of pattern executes arbitrary code, and therefore in the current language version, unsafe code could rely on either using a caller-supplied $p:pat inside a macro, or using a pattern containing a pattern macro from another crate, while a safety invariant is broken but unobservable.

This is probably fine as a change because no guarantee has been written, but it rules out deciding later that patterns won't ever have side-effects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in current language versions, unsafe code can not rely on user-supplied patterns not executing arbitrary code: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=1f3c26c1c0ae0732148ad5987321d2ce
matching against a const may call PartialEq::eq

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@programmerjake your sample code generated the indirect_structural_match lint, suggesting this is not intended to be compilable:

warning: to use a constant of type `Vec<u8>` in a pattern, `Vec<u8>` must be annotated with `#[derive(PartialEq)]`
  --> src/main.rs:17:9
   |
17 |         C => {}
   |         ^
   |
   = warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
   = note: for more information, see issue #120362 <https://github.com/rust-lang/rust/issues/120362>
   = note: the traits must be derived, manual `impl`s are not sufficient
   = note: see https://doc.rust-lang.org/stable/std/marker/trait.StructuralPartialEq.html for details
   = note: `#[warn(indirect_structural_match)]` on by default

basically the manual impl StructuralPartialEq for S violated the definition of structural equality by having a PartialEq that does not compare structurally (also AFAIK StructuralPartialEq is perma-unstable).

Deref pattern rust-lang/rust#87121 is another place that user code may be called during pattern matching, but the hole is sealed by another perma-unstable DerefPure marker trait.

@joshtriplett
Copy link
Member

This seems reasonable. I've wanted this a few times myself, and it's always inconvenient to work around.

It looks like the issue of edition handling has been addressed, and the issue of if let guards is covered in "future work" (which seems sufficient since if let guards aren't stable yet).

Let's see if we have a consensus on this:

@rfcbot merge

@joshtriplett joshtriplett added the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Jun 13, 2024
@rfcbot
Copy link
Collaborator

rfcbot commented Jun 13, 2024

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns.
See this document for info about what commands tagged team members can give me.

@traviscross
Copy link
Contributor

@rustbot labels -I-lang-nominated

We discussed this in the lang triage meeting today. With the clarification, people were happy to check boxes, and this is now in FCP.

@rustbot rustbot removed the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Aug 21, 2024
@kennytm
Copy link
Member

kennytm commented Aug 21, 2024

Not sure if it was missed or not but the RFC text / stabilization report should explain #3637 (comment) why an if pattern allowing arbitrary expressions with potential side effect is fine, but the deref! and const patterns require the restricted traits DerefPure and StructuralPartialEq respectively to prove their pureness --- what makes deref! special and if not.

@Nadrieril
Copy link
Member

For deref patterns the core reason is because an impure implementation would cause unsoundness; e.g.

// EvilBox's `deref` implementation returns `true` or `false` alternatively on each call. The following
// match calls it twice. This is guaranteed to cause UB.
match EvilBox::new(false) {
    deref!(true) => {}
    deref!(false) => {}
}

I'd guess it's same thing for constant patterns. There is no such risk of unsoundness for guard patterns.

@clarfonthey
Copy link
Contributor

I should add as the primary person against this change (from what I can see) that my main issues aren't about soundness, but rather the way I feel they're overloading patterns beyond what I feel is appropriate.

I have no doubt that this feature could be implemented well and used responsibly in the language, and just don't like it for the reasons I've described. But also, a lot of the reasons I've provided are theoretical, difficult to clarify, and ultimately just a suggestion, so, I don't feel the need to argue them any more. Folks are largely in favour of this change, so, I think it's okay to go forward.

@max-niederman
Copy link
Contributor Author

Not sure if it was missed or not but the RFC text / stabilization report should explain #3637 (comment) why an if pattern allowing arbitrary expressions with potential side effect is fine, but the deref! and const patterns require the restricted traits DerefPure and StructuralPartialEq respectively to prove their pureness --- what makes deref! special and if not.

@kennytm, I've added a section clarifying this.

Also, I've added a short paragraph on the macro use cases for mismatched bindings @Nadrieril was talking about.

@tmandry
Copy link
Member

tmandry commented Aug 23, 2024

@rfcbot reviewed

This RFC does not propose to change what bindings are allowed in disjunctions, even when those bindings are used only within guard patterns.

If you ask me we should absolutely do this, and this RFC provides ample justification for doing so. I can't think of any reason we would keep the restriction. Would you consider moving this to an "Open question" so we can consider it at stabilization time without a new RFC?

@tmandry
Copy link
Member

tmandry commented Aug 23, 2024

I just noticed the comment above about this from @traviscross:

@max-niederman: Does this RFC propose, or mean to propose, changing the rules for what bindings are allowed on either side of disjunctive patterns?

If so, a lot more detail is going to be needed in the RFC about how this would work.

I don't think much more detail is needed. The following rules should suffice:

  1. Same as today: If a name is bound in any part of a pattern, it shadows existing definitions of the name.
  2. Same as today: If a name bound by a pattern is used in the body, it must be defined in every part of a disjunction and be the same type in each.
  3. Removed: Bindings introduced in one branch of a disjunction must be introduced in all branches.
  4. Added: If a name is bound in multiple parts of a disjunction, it must be bound to the same type in every part. (Enforced today by the combination of 2 and 3. We could remove this later. Whether to remove it can be an open question decided by a simple FCP.)

If a binding is introduced in one branch but never used in an if guard, we should of course lint as part of the standard unused variables lint.

This would, for example, allow

match Some(0) {
    Some(x if x > 0) | None => {},
    _ => {}
}

but not

match Some((0, "foo")) {
    Some((x, _) if x > 0) | Some((_, x) if x.len() > 0) | None => {},
    // ERROR: ^ x must be the same type in all positions in a pattern
    _ => {}
}

and also not

let x = 0;
match Ok::<()>(()) {
    Ok(x) | Err(_) => dbg!(x),
    // ERROR: x is not bound in all parts of this "or" pattern
    // note: The x on line 1 is shadowed by this usage of x in the pattern
}

edit: Added rule 1 about shadowing; other edits for clarity; added code examples.

@tmandry tmandry added the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Aug 24, 2024
@max-niederman
Copy link
Contributor Author

max-niederman commented Aug 24, 2024

@tmandry: I quite like the rules you've proposed, and since there's a lot of interest I've moved the question of whether mismatched bindings should be allowed as a part of this feature to the open questions section as you've suggested, including a summary of your proposal.

I'm not certain what the procedure is for making a substantive change like this during FCP, but I'm assuming that it'll be discussed at the next lang team meeting and we can move forward according to the consensus?

@tmandry
Copy link
Member

tmandry commented Aug 24, 2024

Thanks @max-niederman! Yes, we'll discuss it in the triage meeting next week. Since it's an open question I think it's unlikely to get objections.

That said, if everyone likes the rules including you, we can discuss including them normatively instead of leaving them as an open question. If anyone speaks up against that we'll just leave it as an open question.

@kennytm
Copy link
Member

kennytm commented Aug 24, 2024

Off topic for this RFC but the changes outlined in #3637 (comment) (esp. removing E0408 and regarding shadowing) is also how the binding rules of #3573 is expressions involving || should work.

@traviscross traviscross removed the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Aug 28, 2024
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Aug 31, 2024
@rfcbot
Copy link
Collaborator

rfcbot commented Aug 31, 2024

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@traviscross traviscross added the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Sep 1, 2024
@joshtriplett
Copy link
Member

@tmandry Based on discussion in today's @rust-lang/lang meeting: could you please propose that as a tiny PR atop this RFC (once merged), and propose that PR for FCP right away?

@traviscross traviscross merged commit 7b7e8b9 into rust-lang:master Sep 4, 2024
@traviscross
Copy link
Contributor

The lang team has now accepted this RFC, and we've now merged it.

Thanks to @max-niederman for pushing forward this important work, and thanks to all those who reviewed the RFC and provided useful feedback.

For further updates, please follow the tracking issue:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. T-lang Relevant to the language team, which will review and decide on the RFC. to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.