Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Implement SegmentedHashSet<T> #54574

Merged
merged 12 commits into from
May 12, 2022
Merged

Conversation

sharwell
Copy link
Member

@sharwell sharwell commented Jul 2, 2021

This is a drop-in replacement for HashSet<T>, with equivalent semantics and algorithmic complexity but does not require the use of the Large Object Heap.

@sharwell
Copy link
Member Author

@dotnet/roslyn-compiler for reviews

@sharwell
Copy link
Member Author

@dotnet/roslyn-compiler for reviews

1 similar comment
@sharwell
Copy link
Member Author

@dotnet/roslyn-compiler for reviews

@RikkiGibson RikkiGibson self-assigned this Aug 25, 2021
@sharwell
Copy link
Member Author

@dotnet/roslyn-compiler for reviews


public int GetHashCode(SegmentedHashSet<T>? obj)
{
var hashCode = 0; // default to 0 for null/empty set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would System.HashCode be a good fit here here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

➡️ This implementation is taken from the link at the top of the file, and only changed where necessary to build in Roslyn. The original code for this method does not use System.HashCode, so this question is probably better for dotnet/runtime.

{
if (collection == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is not nameof(collection) (applies to all ExceptionArgument usages)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

➡️ This implementation is taken from the link at the top of the file, and only changed where necessary to build in Roslyn. The original code for this method does not use nameof(collection), so this question is probably better for dotnet/runtime.

}

// Equals method for the comparer itself.
public override bool Equals(object? obj) => obj is SegmentedHashSetEqualityComparer<T>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public override bool Equals(object? obj) => obj is SegmentedHashSetEqualityComparer<T>;
public override bool Equals([NotNullWhen(true)] object? obj) => obj is SegmentedHashSetEqualityComparer<T>;

This was added in dotnet/runtime:

https://github.com/dotnet/runtime/blob/64245ab3966abde394767fbd6531e7edbea2802e/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSetEqualityComparer.cs#L75

Note: requires using System.Diagnostics.CodeAnalysis;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

➡️ This implementation is taken from the link at the top of this file. The 5.0.7 tag does not contain this attribute; it will be added during the update step if/when a newer version is adopted.

}
else if (ReferenceEquals(_comparer, StringComparer.OrdinalIgnoreCase))
{
_comparer = (IEqualityComparer<T>)NonRandomizedStringEqualityComparer.WrappedAroundStringComparerOrdinalIgnoreCase;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were these special cases eliminated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They require porting a huge amount of platform-specific culture code. The same change was adopted for SegmentedDictionary<TKey, TValue>.

@RikkiGibson
Copy link
Contributor

It looks like the important commit here is c0c39b1.

@sharwell
Copy link
Member Author

@dotnet/roslyn-compiler for second review

@dotnet dotnet deleted a comment from azure-pipelines bot May 11, 2022
@dotnet dotnet deleted a comment from azure-pipelines bot May 11, 2022
return false;
}

var defaultComparer = EqualityComparer<T>.Default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could move this to line 33 to avoid .Default on unnecessary code paths

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would typically avoid this primarily so we can keep the file as close to the reference source as possible (i.e. if the code changes in dotnet/runtime, we would adopt that change in the next update here per the link in the comment at the top of the file).

}
}

return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if x has every element in y and one additional element? That will pass the test above and fall through to return true even though the x has more elements.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the expected behavior here. This is not documented on docs.microsoft.com for HashSet<T>.CreateSetComparer(), and is the same behavior seen here:
https://github.com/dotnet/runtime/blob/6ca8c9bc0c4a5fc1082c690b6768ab3be8761b11/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSetEqualityComparer.cs#L25-L51

Copy link
Member

@jaredpar jaredpar May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just seems like a bug in the original source code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dotnet/runtime#69218

Can we reference that bug here? If they decide to fix that in their source we can follow up with a fix here.

@sharwell sharwell enabled auto-merge May 12, 2022 14:16
@sharwell sharwell merged commit ef54147 into dotnet:main May 12, 2022
@ghost ghost added this to the Next milestone May 12, 2022
@sharwell sharwell deleted the segmented-hashset branch May 12, 2022 18:43
@Cosifne Cosifne modified the milestones: Next, 17.3 P2 May 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants