Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Microsoft.CodeAnalysis.Collections #6595

Merged
merged 4 commits into from
Aug 23, 2021

Conversation

sharwell
Copy link
Member

@sharwell sharwell commented Jun 18, 2021

Closes #6601

Review commit-by-commit highly recommended. Please use a normal merge commit (no squash merge), as this pull request was carefully constructed and validated as distinct changes.

@sharwell sharwell force-pushed the roslyn-collections branch from 75a485e to eaa984b Compare June 18, 2021 20:17
@dotnet dotnet deleted a comment from azure-pipelines bot Jul 30, 2021
@sharwell sharwell force-pushed the roslyn-collections branch from eaa984b to 81f7d70 Compare July 30, 2021 17:37
@sharwell sharwell force-pushed the roslyn-collections branch from 81f7d70 to a66a243 Compare July 30, 2021 18:10
@sharwell sharwell marked this pull request as ready for review July 30, 2021 21:07
<ClassName>Microsoft.CodeAnalysis.Collections.SR</ClassName>
</EmbeddedResource>
</ItemGroup>
</Target>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 I need to figure out why this is needed and add it to the documentation steps in dotnet/roslyn#49991

@@ -1,14 +1,27 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFrameworks>$(LibraryTargetFrameworks)</TargetFrameworks>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 I need to add this to the documentation steps in dotnet/roslyn#49991

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid question: If the code always requires this prop, can it be specified in the package (in Microsoft.CodeAnalysis.Collections.targets?)

an imported package. This suppression should be removed if/when the project is migrated to enable nullable
reference types.
-->
<NoWarn>$(NoWarn),CS8632</NoWarn>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 This is certainly not the way I'd hope this would go. Once dotnet/roslyn#52893 is implemented we can quickly deal with this.

@Forgind
Copy link
Member

Forgind commented Aug 9, 2021

PR Triage:
Do you have any performance measurements from before and after this PR?
Did you profile both command line builds and builds from within Visual Studio?
And is it correct to say this switches to using ImmutableSegmentList but doesn't use other data structures?

Because this is a new dependency from us on Roslyn, we need to make sure we have the right policies in place for keeping it up to date, especially around releases.

@sharwell
Copy link
Member Author

sharwell commented Aug 9, 2021

Do you have any performance measurements from before and after this PR?

I don't. Performance for these collections is much closer to List<T> than ImmutableList<T>, so I would not expect a measurable difference in practice, and certainly not a regression of the original issue that led to removal on ImmutableList<T> on this path.

Note that in an ideal world, #6598 would have directly converted from ImmutableList<T> to ImmutableSegmentedList<T>, but the collections were not implemented in time to complete this.

Did you profile both command line builds and builds from within Visual Studio?

I haven't done any profiling outside of some micro-benchmarking of the SegmentedList<T> indexer in https://github.com/dotnet/roslyn/blob/58c4cceb4aa4fe5212b59307603898fa3c6b8bee/src/Tools/IdeCoreBenchmarks/SegmentedArrayBenchmarks_Indexer.cs.

And is it correct to say this switches to using ImmutableSegmentList<T> but doesn't use other data structures?

ImmutableSegmentedList<T> has some dependencies on other collections defined in the same package, but it would be correct to say that not all collections in the package are used. It would not be surprising to see other collections change over time (uses that particularly benefit are immutable lists/dictionaries created from empty and then not mutated after creation).

Because this is a new dependency from us on Roslyn, we need to make sure we have the right policies in place for keeping it up to date, especially around releases.

This is strictly a source package, so the version of these collections shipping with Roslyn does not directly impact the code shipping in this project.

Copy link
Member

@Forgind Forgind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to see confirmation that this doesn't bring back AB#1344683, but ImmutableSegmentedList is fast, if I remember correctly.

This is strictly a source package, so the version of these collections shipping with Roslyn does not directly impact the code shipping in this project.

We would still need to ship a last-minute version bump if you found a major correctness issue, though, right?

@@ -104,6 +104,16 @@
Visible="false" Pack="false"/>
</ItemGroup>

<Target Name="SetResourceProperties" BeforeTargets="_GetEmbeddedResourcesWithSourceGeneration">
<ItemGroup>
<EmbeddedResource Update="@(EmbeddedResource)" Condition="'%(EmbeddedResource.NuGetPackageId)' == 'Microsoft.CodeAnalysis.Collections' AND '%(FileName)' == 'Strings'">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is checking for strings.resx files, but I noticed we also have a strings.shared.resx file.

image

Is strings.shared.resx relevant for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe Strings.shared.resx is part of this NuGet package.

an imported package. This suppression should be removed if/when the project is migrated to enable nullable
reference types.
-->
<NoWarn>$(NoWarn),CS8632</NoWarn>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NoWarn is a semicolon-delimited list.

Copy link
Member Author

@sharwell sharwell Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it's always supported both, but it's documented as comma-delimited.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it's documented as comma-delimited

I had no idea 🤯 I judged based off of the value of NoWarn in a random binlog (which looked to be entirely semicolon-delimited)

Copy link
Member

@ladipro ladipro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link an overview of the new collections, their intended use cases, perf characteristics? Genuinely interested, thank you!

@@ -1,14 +1,27 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFrameworks>$(LibraryTargetFrameworks)</TargetFrameworks>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid question: If the code always requires this prop, can it be specified in the package (in Microsoft.CodeAnalysis.Collections.targets?)

@sharwell
Copy link
Member Author

sharwell commented Aug 17, 2021

Can you link an overview of the new collections, their intended use cases, perf characteristics? Genuinely interested, thank you!

Most of the documentation is included directly as documentation comments on the types themselves.

BCL MS.CA.Collections Notes
T[] SegmentedArray<T> Negligible difference in scenarios where LOH needs to be avoided
List<T> SegmentedList<T> Negligible difference in scenarios where LOH needs to be avoided
Dictionary<TKey, TValue> SegmentedDictionary<TKey, TValue> dotnet/roslyn#54574 Negligible difference in scenarios where LOH needs to be avoided
HashSet<T> SegmentedHashSet<T> Negligible difference in scenarios where LOH needs to be avoided
ImmutableList<T> ImmutableSegmentedList<T> Reads O(log n)→O(1); mutations O(log n)→O(n)
ImmutableDictionary<TKey, TValue> ImmutableSegmentedDictionary<TKey, TValue> Reads O(log n)→O(1); mutations O(log n)→O(n)
ImmutableHashSet<T> ImmutableSegmentedHashSet<T> dotnet/roslyn#54719 Reads O(log n)→O(1); mutations O(log n)→O(n)
  • The mutable collections are primarily suited for cases where transient collections larger than the LOH threshold are created and then collected in Gen 0 or Gen 1 GC.
  • The immutable collections are primarily suited for cases where an immutable collection is created from empty and then stored in a location where it will be read but not mutated. The mutation performance is notably impacted for everything except small collections, but every other characteristic (storage density, locality, access times) are all substantially improved.

There are a few other subtleties, like the fact that the immutable wrappers are all value types which avoids allocations in a few edge cases.

@sharwell
Copy link
Member Author

sharwell commented Aug 17, 2021

@ladipro I'll also answer a couple of questions you didn't specifically ask:

  • What chance will we need bug fixes?

    We've needed bug fixes a few times. These almost always centered around differences in JIT behavior between .NET Framework and .NET Core with only a performance impact (#52218, #52757), but we did hit one case where an internal algorithm had a correctness bug (dotnet/roslyn@b42470f).

  • What about other fixes?

    There are several cases where we know optimizations are possible, but have not yet been implemented. Most of these center around improving mutation performance of the immutable collections, which is currently O(n) but could be improved in most cases to O(1) with a large constant (not to exceed O(n)). There are a bunch of TODO comments for known cases, e.g. https://github.com/dotnet/roslyn/blob/7d7bf0cc73e335390d73c9de6d7afd1e49605c9d/src/Dependencies/Collections/ImmutableSegmentedList%601.cs#L150-L151.

    Typically these would be implemented on an as-needed basis, which means MSBuild would probably be aware of the need to upgrade prior to that update existing.

@Forgind Forgind added the merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now. label Aug 19, 2021
@Forgind Forgind merged commit aac64bb into dotnet:main Aug 23, 2021
@sharwell sharwell deleted the roslyn-collections branch August 23, 2021 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use ImmutableSegmentedList<T> where appropriate
4 participants