Skip to content

Latest commit

 

History

History
117 lines (74 loc) · 7.06 KB

File metadata and controls

117 lines (74 loc) · 7.06 KB

Collection literals working group meeting for May 26, 2023

Agenda

  • Codegen patterns and APIs for efficient construction
  • Open questions on dictionary literals
  • Cut line for C# 12

Discussion

Capacity of constructed collections

If an exact length is able to be calculated for the constructed collection, the capacity could be set ahead of time to avoid resizing the backing storage of the collection. But if the constructed collection escapes outside the method body, adding a single item to the collection after that would trigger a resize to double the backing storage size.

People will be able to write better code by hand than the compiler can generate if they have outside knowledge of the eventual state of the collection during its life past the construction of the collection literal. This would be a caveat to a statement about the collection literals feature generating optimal code. This is fine; the compiler can strike a good balance, and people can drop down to manual construction of the collection if they need more control over the initial capacity.

When an exact count is known for a collection literal but the compiler can't prove that it is not potentially resized later, or when only a minimum count is known for a collection literal, the compiler could for example set the initial capacity to the best-fitting power of two. This would skip some early resizes while also amortizing resizes that could happen if the collection is modified after the collection literal is created.

AddRange and boxing

When a struct-typed collection is spread, codegen could call AddRange(IEnumerable<T>) on the collection being constructed. This would cause the collection being spread to be boxed. To avoid boxing, codegen could loop over the spread collection and call Add(T) repeatedly on the collection being constructed.

This would work fine when the constructed collection has a writeable Capacity property because codegen could set the capacity before calling Add(T) in a loop. However, if there's no writeable Capacity property, we're better off calling AddRange and boxing, because the AddRange implementation is the only way to intelligently avoid intermediate resizes. (In this situation, the thing being spread has a known length but the collection literal as a whole has an unknown length due to an earlier spread.)

Conclusion (AddRange and boxing)

Stick with building using Add and AddRange. This is good enough for 95% of customers. Allow the compiler to special-case codegen for widely-used collections to improve performance.

The compiler should prefer calling AddRange(ReadOnlySpan<T>) if such a method exists and the thing being spread is a span or exposes a span.

Construction API pattern

The ideal API for constructing collections with contiguous backing storage is to pass the capacity and receive a span to the backing storage. The collection could be passed back using an out parameter to enable overloading on collection type.

The type parameter to use when calling the creation method could be the iteration type of the collection being constructed.

[CollectionLiteralBuilder(
    typeof(CollectionsMarshal),
    nameof(CollectionsMarshal.CreateUnsafe))]
public class MyCollection<T> : IEnumerable<T> { }

public static class CollectionsMarshal
{
    public static void CreateUnsafe<T>(
        int capacity,
        out MyCollection<T> collection,
        out Span<T> storage);

    public static void CreateUnsafe<T>(
        int capacity,
        out List<T> collection,
        out Span<T> storage);
}

The reverse API pattern (passing a span to the collection rather than receiving one) would enable the construction of collections that have noncontiguous backing storage and do not have mutating Add/AddRange methods, such as ImmutableDictionary.

This would allow the use of existing Create methods in the BCL:

[CollectionLiteralBuilder(
    typeof(MyCollection),
    nameof(MyCollection.Create))]
public class MyCollection<T> : IEnumerable<T> { }

public static class MyCollection
{
    public static MyCollection<T> Create<T>(ReadOnlySpan<T> elements);
}

Conclusion (construction API pattern)

Use an attribute to point to a creation method similar to the code samples above. If this attribute is present, prefer it over generating Add/AddRange calls.

This will need to go through API review. If we don't ship this general facility in C# 12, special-case access to the backing spans of List and ImmutableArray anyway for the sake of performance.

Dictionary sigils

In a previous full LDM, some folks didn't like the idea that [..dict1, ..dict2] would create a dictionary (unless the target type is a dictionary type). A merged dictionary could change the count and enumerated order of the key-value pairs. Similar concerns apply to [kvp1, kvp2, kvp3].

The working group doesn't expect these syntaxes to be commonly desired for producing dictionaries. We expect at least one k: v element to be present in literals where the user wants a dictionary and the target type is not already a dictionary type. Merging two dictionaries to create a new dictionary, and doing nothing further, is not a scenario we think comes up that often.

So, is there a place for a sigil to override this, where syntax similar to [:, ..dict1, ..dict2] or [:, kvp1, kvp2] causes a dictionary to be created instead of a list of key-value pairs?

An existing way to override this would be [..dict1, ..dict2].ToDictionary(). This is something which users may do anyway to override the natural type in other situations, such as var x = [a, b].ToImmutableArray();.

There's an open question about whether the natural type is influenced by the collection types of spread collections. Depending on the resolution, [..dict1, ..dict2] may create a dictionary while [..listOfKvps1, ..listOfKvps2] does not. A dictionary sigil would then be addressing a rarer set of scenarios.

Conclusion (dictionary sigils)

The working group doesn't find the sigil use cases compelling, and it can be added later.

Cut line for C# 12

The working group feels good about shipping these features in C# 12:

  • Target-typing to core collections and collections which support classic collection initializers
  • An opt-in API construction pattern to enable 1️⃣ target-typing to other collections and 2️⃣ improving performance over classic collection initializers
  • Spreads
  • Dictionary literals

We'd like this in C# 12, but we'd rather let this slip to C# 13 than anything in the previous list:

  • Natural type for non-empty literals
    • We think performance concerns about List<T> have a great resolution after input from the runtime on the "optimize away List<T>" strategy.
    • Design discussion is still needed. For example, does the collection type of a spread collection affect the natural type of the collection literal?

Not interested in pursuing for C# 12:

  • Natural type for empty literals
    • This requires forward-looking type inference.

Next steps

We will present takeaways of recent working group meetings in a full language design meeting and get feedback on those and on some open questions.