-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Implement SegmentedHashSet<T> #54574
Conversation
These files are based on dotnet/runtime@556582d9 (v5.0.7).
… and non-randomized string comparers
@dotnet/roslyn-compiler for reviews |
@dotnet/roslyn-compiler for reviews |
1 similar comment
@dotnet/roslyn-compiler for reviews |
@dotnet/roslyn-compiler for reviews |
|
||
public int GetHashCode(SegmentedHashSet<T>? obj) | ||
{ | ||
var hashCode = 0; // default to 0 for null/empty set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would System.HashCode be a good fit here here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
➡️ This implementation is taken from the link at the top of the file, and only changed where necessary to build in Roslyn. The original code for this method does not use System.HashCode
, so this question is probably better for dotnet/runtime.
{ | ||
if (collection == null) | ||
{ | ||
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this is not nameof(collection)
(applies to all ExceptionArgument
usages)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
➡️ This implementation is taken from the link at the top of the file, and only changed where necessary to build in Roslyn. The original code for this method does not use nameof(collection)
, so this question is probably better for dotnet/runtime.
} | ||
|
||
// Equals method for the comparer itself. | ||
public override bool Equals(object? obj) => obj is SegmentedHashSetEqualityComparer<T>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public override bool Equals(object? obj) => obj is SegmentedHashSetEqualityComparer<T>; | |
public override bool Equals([NotNullWhen(true)] object? obj) => obj is SegmentedHashSetEqualityComparer<T>; |
This was added in dotnet/runtime:
Note: requires using System.Diagnostics.CodeAnalysis;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
➡️ This implementation is taken from the link at the top of this file. The 5.0.7 tag does not contain this attribute; it will be added during the update step if/when a newer version is adopted.
} | ||
else if (ReferenceEquals(_comparer, StringComparer.OrdinalIgnoreCase)) | ||
{ | ||
_comparer = (IEqualityComparer<T>)NonRandomizedStringEqualityComparer.WrappedAroundStringComparerOrdinalIgnoreCase; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why were these special cases eliminated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They require porting a huge amount of platform-specific culture code. The same change was adopted for SegmentedDictionary<TKey, TValue>
.
It looks like the important commit here is c0c39b1. |
@dotnet/roslyn-compiler for second review |
return false; | ||
} | ||
|
||
var defaultComparer = EqualityComparer<T>.Default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could move this to line 33 to avoid .Default
on unnecessary code paths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would typically avoid this primarily so we can keep the file as close to the reference source as possible (i.e. if the code changes in dotnet/runtime, we would adopt that change in the next update here per the link in the comment at the top of the file).
} | ||
} | ||
|
||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if x
has every element in y
and one additional element? That will pass the test above and fall through to return true
even though the x
has more elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the expected behavior here. This is not documented on docs.microsoft.com for HashSet<T>.CreateSetComparer()
, and is the same behavior seen here:
https://github.com/dotnet/runtime/blob/6ca8c9bc0c4a5fc1082c690b6768ab3be8761b11/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSetEqualityComparer.cs#L25-L51
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just seems like a bug in the original source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reference that bug here? If they decide to fix that in their source we can follow up with a fix here.
This is a drop-in replacement for
HashSet<T>
, with equivalent semantics and algorithmic complexity but does not require the use of the Large Object Heap.