Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More generic impl of Replacer for closures #1048

Closed
wants to merge 13 commits into from

Conversation

JanBeh
Copy link

@JanBeh JanBeh commented Jul 20, 2023

Implement Replacer also for closures that return a type which depends on lifetime of the argument. This allows performing a no-op replacement where the closure returns the whole match, for example, without needing to clone the captured string.

See Conditional regex replacement on URLO for the genesis of this approach.

JanBeh added 2 commits July 20, 2023 10:50
Implement `Replacer` also for closures that return a type which depends
on lifetime of the argument. This allows performing a no-op replacement
where the closure returns the whole match, for example, without needing
to clone the captured string.
Fixed code formatting using rustfmt.
Less generic implementation, which results in a better documented API.
@BurntSushi
Copy link
Member

This is interesting. Can you explain how it works? Also, adding a doc test to confirm it fixes what you think it does would be helpful. (Although I'm unlikely to merge this, see below.)

One part I'm concerned about here is &'a Captures<'a>. To me that means 'a will be the shorter of the lifetime of &Captures and the haystack, and it seems like that could lead to borrow errors.

I'm not too keen on this particular solution personally indeed because of the extra public trait here. It seems like a small thing, but it makes something that is already a little difficult to understand even more difficult.

@BurntSushi
Copy link
Member

Oh, I see from the urlo post that this only helps cases that enable the closure_lifetime_binder feature on nightly? I think that would also be insufficient motivation as well for adding this.

Do not use the same lifetime for reference and type parameter of
`Captures`.
@JanBeh
Copy link
Author

JanBeh commented Jul 20, 2023

One part I'm concerned about here is &'a Captures<'a>. To me that means 'a will be the shorter of the lifetime of &Captures and the haystack, and it seems like that could lead to borrow errors.

I think this should not be an issue due to subtyping, but I fixed it anyway, see adf66a8.

That last change requires a slightly different usage now:

#![feature(closure_lifetime_binder)]

use regex::{Captures, Regex};

fn main() {
    let s1 = "Hello World!";
    let s2 = Regex::new("World").unwrap().replace_all(
        s1,
        for<'a, 'b> |caps: &'a Captures<'b>| -> &'a str {
            &caps[0] // don't replace
        }
    );
    assert_eq!(s2, "Hello World!");
}

Can you explain how it works? Also, adding a doc test to confirm it fixes what you think it does would be helpful.

I will try to do both later.

I'm not too keen on this particular solution personally indeed because of the extra public trait here. It seems like a small thing, but it makes something that is already a little difficult to understand even more difficult.

Targeting a potential new regex 2.0.0, the question is whether an additional lifetime parameter (as proposed in #776 and #777) would be better. I feel like the extra trait might be the better trade-off (besides not breaking backward compatibility).

However, I agree that the helper trait is a downside and makes the API appear more complex.

I see from the urlo post that this only helps cases that enable the closure_lifetime_binder feature on nightly? I think that would also be insufficient motivation as well for adding this.

In either case, it would probably be wise to wait until/if closure_lifetime_binder stabilizes.

@BurntSushi
Copy link
Member

Gotya. And yeah I would be happy to re-litigate this in more detail if and when a regex 2.0 ever happens.

JanBeh added 3 commits July 20, 2023 20:53
Hide `ReplacerClosure` trait as an implementation detail and describe
for which closures the `Replacer` trait is implemented in the
documentation instead.

Added documentation tests.
Fixed error in documentation comment in private module.
Ignore documentation test that needs unstable Rust.
@JanBeh
Copy link
Author

JanBeh commented Jul 20, 2023

I'm not too keen on this particular solution personally indeed because of the extra public trait here.

That issue is addressed now by hiding the trait in a private module, see a40d8f5. The added documentation of Replacer explains for which closures the Replacer trait is implemented. I also added corresponding documentation tests.

[…] I would be happy to re-litigate this in more detail if and when a regex 2.0 ever happens.

I think this should rather depend on the closure_lifetime_binder feature instead of a new regex API. As I said, this proposal shouldn't break the regex API.

Note, however, that even using stable Rust, there is a use case. The relaxed bound can be used for functions (instead of closures) that return Cow, for example (see added documentation test):

let re = Regex::new(r"[0-9]+").unwrap();
fn prepend_odd<'a>(caps: &'a Captures<'_>) -> Cow<'a, str> {
    if caps[0].len() % 2 == 1 {
        Cow::Owned(format!("0{}", &caps[0]))
    } else {
        Cow::Borrowed(&caps[0])
    }
}
let result = re.replace_all("1234,12345", prepend_odd);
assert_eq!(result, "1234,012345");

Do not use closure lifetime binder but helper function for coercion in
documentation test.
@JanBeh
Copy link
Author

JanBeh commented Jul 20, 2023

It's also possible to use the relaxed bound for closures on stable Rust as pointed out in this URLO post. I updated the documentation test, see 43222f2.

Added documentation comment on `Replacer` impl for closures.
@JanBeh
Copy link
Author

JanBeh commented Jul 20, 2023

Can you explain how it works?

Basically the idea is to work around the problem that when specifying a bound for a closure, e.g. F: FnMut(&Captures<'_>) you have to specify a return type due to syntactic reasons (as otherwise the closure is expected to return ()). So, for example, you write F: FnMut(&Captures<'_>) -> T. Type T has to be declared outside of the bound, which means it can't depend on the anonymous lifetime of the closure's argument.

The idea of this PR is to use a helper trait which allows to add a bound that doesn't specifies the return type T of the closure. The return type is accessible through an associated type (this one). That helper trait is parameterized with a lifetime 'a that's the lifetime of the &'a Captures<'_> reference. Thus, the return type may also depend on that lifetime. In order for a closure to implement Replacer it must implement that helper trait for all 'a.

In the most recent draft, I have hidden the helper trait as an implementation detail in a private module. With 2c8c8dd, a documentation comment explains what the (hidden) complex implementation actually means. Hiding the helper trait should give some leverage when future language features are added to Rust, which might render the helper trait superfluous.

JanBeh added 2 commits July 21, 2023 08:15
Removed unnecessary import of `Replacer` in doc test.
Use same lifetime for reference and type parameter of `Captures` (again)
because `Captures<'a>` is covariant over `'a`. This covers closures that
accept a `&'a Captures<'b>` as argument and have a result type that
depends either on `'b` or `'a`. Documentation was updated and
corresponding test cases have been added to `tests/misc.rs`.

A link to the blanket implementation has been added to the
"Implementation by closures" section of the documentation on `Replacer`.
@JanBeh
Copy link
Author

JanBeh commented Jul 21, 2023

One part I'm concerned about here is &'a Captures<'a>. To me that means 'a will be the shorter of the lifetime of &Captures and the haystack, and it seems like that could lead to borrow errors.

I think this should not be an issue due to subtyping, but I fixed it anyway, see adf66a8.

That last change requires a slightly different usage now: […]

I was undoing that change again in 664a0f2 because it is sufficient for a closure to accept &'a Captures<'a> for all 'a. The caller (i.e. the implementation of Replacer for closures) can always coerce any &'a Captures<'b> to &'a Captures<'a> as Captures<'b> is covariant over 'b.

This also allows a closure to return a type that depends either on 'a or 'b if it is more generic and has two distinct lifetimes in its argument. Test cases have been added to tests/misc.rs for that matter.

@JanBeh
Copy link
Author

JanBeh commented Jul 21, 2023

With the recent changes, I would like to know if this has a chance to land (before regex 2.0.0).

I tried to address all concerns. Specifically:

  • The helper trait is a private implementation detail now, which is not exposed to the API.
  • Documentation for the blanket implementation describes for which closures Replacer is implemented (even though the implementation bound itself is a hidden detail).
  • The changes should be backward compatible; all tests are passing.
  • As pointed out above, it's possible to use the generalized implementations of Replacer also on stable Rust. The documentation comment and test shows how this works. Using the unstable closure_lifetime_binder feature is optional and could improve ergonomics in the future (but isn't a blocker).

If there is further interest in this, I could reproduce the changes also for the bytes module of the crate (which is basically a copy that's just using [u8] instead of str, right?).

@zjp-CN
Copy link

zjp-CN commented Jul 21, 2023

as Captures<'b> is covariant over 'b

What if Captures<'b> becomes incovariant over 'b? You'll have another breaking change then.

Beside, it'd be better to keep the lifetime contract right instead of weirdly working.
When you first see &Captures<'_>, '_ and lifetime on & are already distinct. To make them the same is overdone IMO.

@JanBeh
Copy link
Author

JanBeh commented Jul 21, 2023

What if Captures<'b> becomes incovariant over 'b?

That would already be a breaking change, so I don' think that's a problem.

Beside, it'd be better to keep the lifetime contract right instead of weirdly working.

I'm not sure what's "right" here. Consider the already existing impl<'a> Replacer for &'a Cow<'a, str>, which also uses a single lifetime 'a instead of two lifetimes 'a, 'b: 'a.

Not sure what's the correct approach. I tried to use two different lifetimes, but then the helper trait ReplacerClosure should also be parameterized over both lifetimes (such that the output type may depend on either lifetime).

I tried this:

-    pub trait ReplacerClosure<'a>
+    pub trait ReplacerClosure<'a, 'b>
     where
-        Self: FnMut(&'a Captures<'a>) -> <Self as ReplacerClosure<'a>>::Output,
+        'b: 'a,
+        Self: FnMut(&'a Captures<'b>) -> <Self as ReplacerClosure<'a, 'b>>::Output,
     {
-        /// Return type of the closure (may depend on lifetime `'a`).
+        /// Return type of the closure (may depend on lifetime `'a` or `'b`).
         type Output: AsRef<str>;
     }
-    impl<'a, F: ?Sized, O> ReplacerClosure<'a> for F
+    impl<'a, 'b, F: ?Sized, O> ReplacerClosure<'a, 'b> for F
     where
-        F: FnMut(&'a Captures<'a>) -> O,
+        'b: 'a,
+        F: FnMut(&'a Captures<'b>) -> O,
         O: AsRef<str>,
     {

But then I also have to do that:

-impl<F: for<'a> ReplacerClosure<'a>> Replacer for F {
+impl<F: for<'a, 'b: 'a> ReplacerClosure<'a, 'b>> Replacer for F {

Which isn't valid syntax (I also asked on URLO about this):

error: lifetime bounds cannot be used in this context
    --> src/regex/string.rs:2591:21
     |
2591 | impl<F: for<'a, 'b: 'a> ReplacerClosure<'a, 'b>> Replacer for F {
     |                     ^^

If I instead use:

impl<F: for<'a, 'b> ReplacerClosure<'a, 'b>> Replacer for F {

then the compiler won't be able to tell that F only needs to implement ReplacerClosure<'a, 'b> for 'b: 'a; and then several tests fail, e.g.:

error: implementation of `regex::regex::string::replacer_closure::ReplacerClosure` is not general enough
  --> tests/replace.rs:7:24
   |
7  |               assert_eq!(re.$which($search, $replace), $result);
   |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ implementation of `regex::regex::string::replacer_closure::ReplacerClosure` is not general enough
...
84 | / replace!(
85 | |     closure_returning_reference,
86 | |     replace,
87 | |     r"([0-9]+)",
...  |
90 | |     "age: 2"
91 | | );
   | |_- in this macro invocation
   |
   = note: `[closure@tests/replace.rs:89:5: 89:37]` must implement `regex::regex::string::replacer_closure::ReplacerClosure<'0, '1>`, for any two lifetimes `'0` and `'1`...
   = note: ...but it actually implements `regex::regex::string::replacer_closure::ReplacerClosure<'_, '2>`, for some specific lifetime `'2`
   = note: this error originates in the macro `replace` (in Nightly builds, run with -Z macro-backtrace for more info)

I believed using a single lifetime instead of two lifetimes is okay here (which avoids these problems). But I'm not sure.


Note that counterintuitively, using a single lifetime 'a in the F: FnMut(&'a Captures<'a>) -> O bound (here) results in a more relaxed bound (which also can cover all the test cases).

Demanding F to be implemented for &'a Captures<'b> for all 'a, 'b: 'a is a stricter (though maybe reasonable, yet unnecessary) requirement.


I think the current approach is harmless, though it's not possible to go from &'a Captures<'a> to the stricter &'a Captures<'b> later (but vice versa). If this a concern, a more conservative approach could be taken (e.g. this one). That would leave both variants open for the future, but currently doesn't allow the return type to depend on 'b.

Also see: rust-lang/rfcs#3261.

JanBeh added a commit to JanBeh/regex that referenced this pull request Jul 21, 2023
This is an alternative flavor of PR rust-lang#1048:

Require `Replacer` closures to take a `&'a Captures<'b>` (for all `'a,
'b: 'a`) as argument and have a return type that must not depend on `'b`
(but only on `'a`).
@BurntSushi
Copy link
Member

Thanks for the explanation, that was helpful. To say where I'm at currently:

  • ReplacerClosure is still part of the public API because it shows up in a bound in a public API. I don't see a way around this. Making it so users of the crate can't import has the advantage of not confusing the public API, but it also has the disadvantage of removing options from the user in case it would be useful to use it in bounds of their own.
  • It's not totally clear to me that this isn't a breaking change. I would generally expect the kind of breakage to happen here to be inference related, if at all. Inference breakages are something I could tolerate, but only if they are incredibly rare. Testing this sort of change in std is easier because of crater, but I don't have a tool like that to do that sort of testing before unleashing this change on to everyone.
  • Looking at this holistically, I don't think the juice is worth the squeeze. The docs in this PR talk about unstable features and show examples with weird coerce function hacks. That's not the kind of thing I want to see in the regex crate personally. I'd much rather tell people to go write their own replacement routine then to deal with these type system shenanigans.

@JanBeh
Copy link
Author

JanBeh commented Jul 21, 2023

@BurntSushi thanks for your feedback.

  • ReplacerClosure is still part of the public API because it shows up in a bound in a public API.

rustdoc displays it as a non-linked trait in regard to the blanket implementation of Replacer for F. However, as it can't be named from outside the crate, it doesn't really affect the users of the API in any way except in regard to which closures implement Replacer. The trait that's really exposed to the user of the API is Replacer (before and after the changes), and not ReplacerClosure.

Nonetheless, having an unnameable trait showing up in the API can be considered somewhat ugly (similar to the sealed trait pattern that's commonly used).

  • It's not totally clear to me that this isn't a breaking change. I would generally expect the kind of breakage to happen here to be inference related, if at all. Inference breakages are something I could tolerate, but only if they are incredibly rare. Testing this sort of change in std is easier because of crater, but I don't have a tool like that to do that sort of testing before unleashing this change on to everyone.

I understand, and I don't fully overlook this either.

  • Looking at this holistically, I don't think the juice is worth the squeeze. The docs in this PR talk about unstable features and show examples with weird coerce function hacks. That's not the kind of thing I want to see in the regex crate personally. I'd much rather tell people to go write their own replacement routine then to deal with these type system shenanigans.

Would you reconsider this if/when closure_lifetime_binder stabilizes?

I would propose to add these changes to the draft to make the current approach more "clean" and to highlight the open issues, and then close this PR for future reference. It could be reopened or recreated when there is some progress on that matter.

JanBeh added 2 commits July 22, 2023 12:47
Refactored code to be able to use two different lifetimes `'a` and `'b`
for the `&'a Captures<'b>` argument while allowing the return type to
depend on either `'a` or `'b`.
Do not mention unstable features in documentation. Do not include
coercing helper function in example code.
@JanBeh
Copy link
Author

JanBeh commented Jul 22, 2023

I would propose to add these changes to the draft to make the current approach more "clean" […]

Thanks to this post I have been able to refactor the code to use two different lifetimes 'a and 'b in &'a Captures<'b> (d89b31a).

See also:

which is related to the problems I encountered when solving this.

  • The docs in this PR talk about unstable features and show examples with weird coerce function hacks.

I have improved that (1aab0b8). The example code isn't using any hacks anymore (but simply a straight-forward function).

@BurntSushi
Copy link
Member

Would you reconsider this if/when closure_lifetime_binder stabilizes?

I would propose to add these changes to the draft to make the current approach more "clean" and to highlight the open issues, and then close this PR for future reference. It could be reopened or recreated when there is some progress on that matter.

SGTM. I'm open to re-considering, but probably some effort will need to be put into ensuring this isn't a breaking change. (I don't necessarily mean "technically breaking" but rather "practically breaking.")

The existence of the helper trait is a bummer but not a blocker for me. The breaking changes and the requirement to use coerce function hacks are bigger issues to me. What I'd want here is for the obvious thing to "just work." Otherwise it's probably too much ceremony to be worth doing IMO.

@JanBeh
Copy link
Author

JanBeh commented Jul 23, 2023

Closing this then. Could be reconsidered when

or a similar mechanism is implemented, which makes using such closures more idiomatic.

Thanks for all the feedback and assessment.

@JanBeh JanBeh closed this Jul 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants