-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an Enumerable.Match method #39443
Comments
Tagging subscribers to this area: @eiriktsarpalis |
This is an API request, moving to the correct repo. |
@333fred oh sorry, at some point it must have sent me to the runtime repo... I will close and re-open in the correct repo |
@simon-curtis sorry if that wasn't clear. I moved it already. |
Just a note, there was a big discussion about this on gitter earlier this year. Turns out it's tricky. One problem is that Linq wants to be lazy (where possible) and avoid iterating the source collection twice. It's hard to do a partition like this in a lazy way, because it depends on when you decide to enumerate Another problem is that Another is, what happens if you call (You can do something ugly like this to try and overcome those, but it has its own problems.) |
Oh I see, thanks! Sorry, still reasonably new to GitHub |
Basing it on IEnumerable is dubious as mentioned, but declaring this for concrete collection types like ImmutableArray and ImmutableList is very handy. I call my extension methods |
|
The F# equivalent is also called public static class Enumerable
{
public static (List<TSource> Satisfied, List<TSource> Falsified) Partition<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate);
} |
What about this approach?
P.S. It will buffer full collection as far as I remember, which may be not desired behavior. |
.ToLookup(...) is better than .GroupBy(...).ToDictionary(...). More efficient is: foreach (var value in source)
{
(predicate(value) ? trueBuilder : falseBuilder).Add(value);
} |
@jnm2 good point! To sum up, currently there are 2 solutions for the Match problem:
using System;
using System.Collections.Generic;
using System.Linq;
public static class MatchExtension {
public static (IEnumerable<T>, IEnumerable<T>) MatchBuffered<T>(this IEnumerable<T> c, Func<T, bool> cond) {
var l = c.ToLookup(cond);
return (l[true], l[false]);
}
public static (IEnumerable<T>, IEnumerable<T>) Match<T>(this IEnumerable<T> c, Func<T, bool> cond) {
return (c.Where(cond), c.Where(v => !cond(v)));
}
}
public class Program
{
public static void Main(string[] args)
{
var c = new [] {"a","b","a"};
{
var (matched, unmatched) = c.Match(k => k=="a");
Console.WriteLine(string.Join(",",matched));
Console.WriteLine(string.Join(",",unmatched));
}
{
var (matched, unmatched) = c.MatchBuffered(k => k=="a");
Console.WriteLine(string.Join(",",matched));
Console.WriteLine(string.Join(",",unmatched));
}
}
} The non-buffered approach can be extended for IQueryable, but would require a bit code for original Expression negation. |
I had this thought from a project I’m doing in blazor with EFCore, where I was separating entities’ UI elements by its archived property.
I wouldn’t want to go back to the database on every render so would definitely want some form of buffering, in this instance I’m using DbContext. But I could see how in other cases you wouldn’t want buffering. I did end up using GroupBy().ToDictionary(), just not as a static method so I will do that on Monday to make my life easier than doing it every time, but looked messy and that’s why I’m suggesting a one-liner |
How would the non-buffered approach work when the matched or unmatched enumerables are enumerated more than once? Assume that var (matched, unmatched) = source.Partition(cond);
var a = matched.Select(...).ToList(); // Fully enumerate matched enumerable once
var b = unmatched.Select(...).ToList(); // Fully enumerate unmatched enumerable once
var c = matched.Select(...).ToList(); // Fully enumerate matched enumerable a second time
var d = unmatched.Select(...).ToList(); // Fully enumerate unmatched enumerable a second time You would assume that elements in How many times total is |
For non-buffered approach I've personally assumed that the source sequence is stable, i.e. does not change over time. For files & databases this is easily achievable. For random generators - of course, not. |
For enumerables backed by databases, enumerating |
@jnm2 , yes. Enumerating database twice will query database twice. You can lock database state with a transactions mechanism. But for databases one would need to implement approach with IQueryable to stream data effectively, because conversion to IEnumerable will trigger data stream of full results to be sent over network. |
If the method accepts |
Usually it depends on the implementation. IEnumerable is allowed to be enumerated more than once. |
@slavanap My worry is, what should that implementation be? I'm not sure how to avoid buffering and be safe. And you can't get away from all buffering. If the first half doesn't match and the second half does, and you fully enumerate the matched enumerable first, you have to buffer half the data anyway. |
@jnm2 you dispose data while enumerating (and with IQueryable you query each half separately without loading any data from another half). If source IEnumerable sources data from file or network, then your IEnumerable size is not constrained by RAM capacity. While buffered IEnumerable is simpler it has its limitations, like RAM constraints. |
namespace System.Linq
{
public partial static class Enumerable
{
public static (List<TSource> Matched, List<TSource> Unmatched) Match<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate);
}
} |
I strongly believe you should return |
I don't see how this implementation would ever use anything other than |
We never know now what we'll know in the future. That's the entire point of defensive coding. The framework is littered with stuff that is now considered obsolete or bad practice that seemed like a great idea at the time.
No other LINQ methods return concrete types. Including GroupBy, where you are actually constructing a completely new collection but hiding it behind an interface.
Like what? The only benefit that I can think of is that the user will mutate it, which is not something the BCL should be encouraging. Returning Returning |
While I understand the point you are trying to make in principle, I don't agree that it applies to this particular case.
Note that the new method has more similarities with
Again, I understand the point you are making in principle, but I don't see how that translates to the particular method. What particular problems would you expect this cause for users? |
@MgSam This was discussed in the video at https://youtu.be/7bRCbwE9CYE?t=2772. Someone mentions:
Also, the design team saw it as a good thing for users to be able to own and modify the created instances. |
I think a better analog rather than
I totally agree with this principle- which is why I'm proposing
There are multiple problems:
The interface vs non-interface is the point I still really don't understand and which you haven't answered. Even if we all agreed it was a great idea to return a mutable list as the result of Partition- why wouldn't we use |
But since they explicitly wanted the semantics of this API to be that the collections being created do belong to you, this is a good reason to return a concrete collection type.
As Eirik said above, this is a matter of perspective. This is intended to diverge from GroupBy because it's intended to serve a slightly different purpose. This usage pattern is not perfectly served by GroupBy, thus the difference and the reason for this API request.
This was addressed in the video. It's really an interesting listen. |
@jnm2 I did listen, and if anything the video proves why this issue deserves more thought. They barely spent about two minutes discussing it where two of the participants opinions are "I don't really care" and the third basically says "let's return List because that's what we're using internally". They do discuss the immutable vs mutable question but don't present a very compelling argument IMO. The question of why Public methods should not be exposing implementation details. .NET should not change framework patterns arbitrarily because the developers reviewing the code can't be bothered to make an opinion about it or consider it in the wider context of the rest of the framework. At the end of the day, it's a minor issue either way but I want to see .NET be the best platform it can be. And I certainly don't want to see LINQ proper become an everything-and-the-kitchen-sink mess of methods with inconsistent usage and documentation like Rx is. |
Simply put, this is an elementary helper method for which
I think you might be applying a liberal definition of "implementation detail".
What "framework patterns" are you referring to? Here's a quote from the framework design guidelines, also mentioned during the meeting:
Ultimately, would you be able to propose a potential future concern, particular to this method, that might justify abstracting the return type? Abstraction comes with tradeoffs, so we won't be adopting it for the sake of an ambiguous sense of futureproofing. |
I just want to make a small point as an observer -- This is Microsoft's house, and they've kindly invited us in to see how they do things. Remember that they are the owners and this is their job: we can make suggestions and give our personal opinions, but we do not have the right to tell them what to do. I've seen first-hand the annoyance that some stranger telling you how to do your job causes, and I'm keen that we don't risk ruining the great privilege we have here :) (I don't want to debate this message, just dropping past) |
We welcome your interaction and input here. At the same time, please assume good intent. This is called out in our code of conduct. |
I just want to elaborate on a few points here:
|
Thanks for the kind words. I just want to point out that this is a two way street for us: we equally feel very privileged of being lucky that so many contributors take their time and help us make .NET better. That is a big reason why I care about the conduct in our discussions: it's not just how people talk to us, it's how we all talk to each other. I don't want to lose contributors or potential contributors because they see a toxic thread and decide that this isn't the community they want to be a part of. |
(@terrajobst, just FYI, we've actually been returning something other than |
Don't ruin my brilliant tale with facts 😄 |
|
Thanks guys for the explanation. I agree there definitely can be unintended consequences even with an interface as the return type if you ever needed to change that, especially at the BCL level. That said, someone's reliance on internal implementation details is a much "easier" breaking change to swallow than changing a concrete type. |
Per #49819 (review) we have reached the conclusion that the proposed method, while useful for some applications, does not meet the bar for inclusion in System.Linq. I'm therefore going to close this issue. |
Background and Motivation
Simple proposal to return both matched and unmatched results from a where query in a one-liner.
For readability I would normally do the following; but requires and extra pass
You can achieve this in one pass using a for loop, but goes against the point of Linq and is longer
Proposed API
The proposed method buffers the results. The returned type is a tuple of
IReadOnlyList
s to better communicate this fact.namespace System.Linq { public static class Enumerable { + public static (IReadOnlyList<TSource> Satisfied, IReadOnlyList<TSource> Falsified) Partition<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate); } }
We should not be implementing an IQueryable overload.
The text was updated successfully, but these errors were encountered: