Back to Csharplang

C# LDM Collection Literals Agenda

meetings/working-groups/collection-literals/CL-LDM-2023-05-31.md

latest12.3 KB
Original Source

C# LDM Collection Literals Agenda

Core spec: https://github.com/dotnet/csharplang/blob/main/proposals/collection-literals.md

Core set of features for C# 12

  1. We are not discussing any 'natural typing' scenarios today. We're continuing to make progress in that space and would like to have a dedicated meeting on that topic.
  2. This is the set of features we believe has a very intuitive and solid design around them, and would hit a good 'sweet spot' of capabilities and user satisfaction for the initial release.
  3. We intend to have future meetings that go through the spec in detail to ensure LDM satisfaction with the exact changes being proposed (Chuck to drive).

Grammar:

diff
primary_no_array_creation_expression
  ...
+ | collection_literal_expression
  ;

+ collection_literal_expression
  : '[' ']'
  | '[' collection_literal_element ( ',' collection_literal_element )* ']'
  ;

+ collection_literal_element
  : expression_element
  | dictionary_element
  | spread_element
  ;

+ expression_element
  : expression
  ;

+ dictionary_element
  : expression ':' expression
  ;

+ spread_element
  : '..' expression
  ;

Common examples:

c#
int[] a = [1, 2, 3];
List<object> list = [start, .. middle, end];
IDictionary<string, int> d = [k1:v1, k2:v2, k3:v3];

Target-typing to core collections and collections with classic collection initializers

The following are constructible collection types that can be used to construct collection literals. These are the valid target types for collection literals.

The list is in priority order. If a collection type belongs in multiple categories, the first is used.

  1. Single dimensional arrays (e.g. T[])
c#
int[] a = [1, 2, 3]; // equivalent to:
int[] a = new int[] { 1, 2, 3 };
  1. Span types (Span<T> and ReadOnlySpan<T>)
c#
Span<int> values = [1, 2, 3]; // equivalent to either:

Span<int> values = new int[] { 1, 2, 3 }; // or
Span<int> values = stackalloc int[] { 1, 2, 3 };

Which one is picked depends on compiler heuristics around stack space (and is part of the 'params span' WG outcomes), and the ref-safety rules around values.

  1. Types with a suitable Construct method. This will be discussed below, but relates heavily to immutable collections, and how to efficiently construct them. It will also be the preferred way to create a List<T> as it is the most efficient form with lowest overhead.

  2. Concrete collection types that implement System.Collections.IDictionary

c#
Dictionary<string, int> nameToAge = [ "Dustin": 42, "Cyrus": 43 ]; // possibly equivalent to:

Dictionary<string, int> nameToAge = new Dictionary<string, int> { { "Dustin", 42 }, { "Cyrus", 43 } }; 

// Open question below about update vs throw semantics for duplicate keys.  Will cover when we discuss dictionaries.
  1. Interface types I<TKey, TValue> implemented by System.Collections.Generic.Dictionary<TKey, TValue> (e.g. IDictionary<TKey, TValue>, IReadOnlyDictionary<TKey, TValue>)
c#
IReadOnlyDictionary<string, int> nameToAge = [ "Dustin": 42, "Cyrus": 43 ]; // possibly equivalent to:
IReadOnlyDictionary<string, int> nameToAge = new Dictionary<string, int> { { "Dustin", 42 }, { "Cyrus", 43 } }; 

// Open questions on if there is a guarantee about concrete type instantiated.
  1. Types that implement System.Collections.IEnumerable. e.g. normal C# 3.0 collection initializer types
c#
HashSet<int> set = [1, 2, 3]; // equivalent to:

HashSet<int> set = new HashSet { 1, 2, 3 };

// We pass the capacity to the constructor if it takes one.
  1. Interface types I<T> implemented by System.Collections.Generic.List<T> (e.g. IEnumerable<T>, IList<T>, IReadOnlyList<T>, etc.)
c#
IEnumerable<int> values = [1, 2, 3]; // could be:

IEnumerable<int> values = new List<int> { 1, 2, 3 }; // or could be:
IEnumerable<int> valuies = new <>UnspeakableName { 1, 2, 3 };

// Open question below about this.

Open Questions:

  1. Does LDM agree with the priority order and totality of the support constructible collection types? Of note: this means that any type that is not in that list, and is not being updated (e.g. netfx 4.7.2 types) might not work with collection literals. Should we have some way (e.g. an extension-method pathway) to be able to opt those legacy types in?
  2. What is the behavior when users attempt to initialize an IEnumerable (or any interface type)? Do you get a List<T> or an undefined type?

Target-typing with BCL attribute Create method

Main examples:

c#
List<int> list = [1, 2, 3]; // equivalent to:
CollectionsMarshal.Create<int>(capacity: 3, out List<int> list, out Span<T> __storage);
__storage[0] = 1;
__storage[1] = 2;
__storage[2] = 3;
    
// Works identically for ImmutableArray<T>.

// However, some types cannot give you the storage to write sequentially into.  For those, the pattern is:
ImmutableDictionary<string, int> namesToAge = [ "Dustin": 42, "Cyrus": 43 ]; // equivalent to:
    
// Storage is initialized (ideally on stack when safe), and passed to type to own creating its internal structure
ReadOnlySpan<KeyValuePair<string, int>> storage = [ new("Dustin", 42), new("Cyrus", 43) ]; // could be heap or stack.
CollectionsMarshal.Create<string, int>(out ImmutableDictionary<string, int> namesToAge, storage);

The pattern is:

cs
// Attribute placed on collection target type, stating where to find the factory method.
// Specifies the type where it is found, and the factory method name.  Up to BCL to include.
    
[CollectionLiteralBuilder(
    typeof(CollectionsMarshal),
    nameof(CollectionsMarshal.Create))]
public class List<T> : IEnumerable<T> { } // or

[CollectionLiteralBuilder(
    typeof(CollectionsMarshal),
    nameof(CollectionsMarshal.CreateRange))] 
public sealed class ImmutableDictionary<TKey, TValue> : IDictionary<TKey, TValue> { }
    
// Factory method shape:
public static class CollectionsMarshal
{
    // For cases where the instance can share out its storage as a sequential buffer for caller to place the values in.
    // Storage span is passed out for caller to populate with no overhead.
    public static void Create<T>(
        int capacity,
        out List<T> collection,
        out Span<T> storage);
    
    public static void Create<T>(
        int capacity,
        out ImmutableArray<T> collection,
        out Span<T> storage);
    
    // For cases where the final instance has to manage non-sequential structure.  Storage is passed in and copied over to
    // internal structure.
    public static void CreateRange<T>(
        out ImmutableHashSet<T> collection,
        ReadOnlySpan<T> storage);
    
    // Example of the shape for things like dictionaries.
    public static void CreateRange<TKey, TValue>(
        out ImmutableDictionary<TKey, TValue> dictionary,
        ReadOnlySpan<KeyValuePair<TKey, TValue>> storage);
}

Open Questions:

  1. Does LDM agree with introducing a new pattern for collection construction?
  2. If this API can't come in time, is it OK for the compiler to temporarily special case List<T> (and potentially ImmutableArray<T>)?

Dictionaries

Created as a natural syntactic extension on collection literals:

c#
List<int> list = [1, 2, 3];
Dictionary<string, int> nameToAge = ["Dustin": 42, "Cyrus": 43];

Syntax picked for two reasons:

  1. Very brief. A core goal of this feature is to prevent excessive syntax for simple ideas.
  2. Very familiar to many languages that use colon here (JS, Python, Swift, Dart, Go, etc.). Want to put best foot forward.

Downsides: 3. Does introduce syntactic ambiguity

csharp
Dictionary<K, V> d = [a ? [b] : c]; // [a ? ([b]) : c)] or [(a ? [b]) : c]?

WG feeling is that this ambiguity though would be extremely rare to hit. Picking 'expression' here (not k:v) seems totally fine. If you do want K:V, parenthesize the key.

Spreading is also supported for dictionaries, allowing for nice things like:

c#
Dictionary<string, int> nameToAge = ["Dustin": 42, "Cyrus": 43];
Dictionary<string, int> newNameToAge = [.. nameToAge, "Mads": 25];

Dictionaries (without 'Construct' methods) are rewritten as:

cs
Dictionary<string, int> nameToAge = ["Dustin": 42, "Cyrus": 43]; // rewritten as:

Dictionary<string, int> nameToAge = new Dictionary<string, int>(capacity: 2);
nameToAge["Dustin"] = 42;
nameToAge["Cyrus"] = 43;
    
// But it could have been (throw vs update semantics):
nameToAge.Add("Dustin", 42);
nameToAge.Add("Cyrus", 43);

Open Questions:

  1. Does LDM agree with supporting dictionary literals for C#12?
  2. When target-typed to interface, do we guarantee a particular concrete type?
  3. Does dictionary creation use update or add semantics? (Are duplicate keys allowed?)

Consider the CollectionsMarshal.CreateRange<TKey, TValue>() factory method above. The BCL already has similar construction methods for dictionaries, either as constructors or static helpers, that take an IEnumerable<KeyValuePair<K, V>>. Those existing methods typically don't allow duplicated keys with distinct values, and this method will probably need to work similarly. In short, the construct method may have add (and throw) semantics.

  1. Allow k:v expression inside a non-dictionary collection literal? e.g.:
c#
List<KeyValuePair<string, int>> pairs = ["Dustin": 42, "Cyrus": 43]; // Is this ok?  Or would we require you say:
    
List<KeyValuePair<string, int>> pairs = [new("Dustin", 42), new("Cyrus", 43)];

Proposal: dictionary_element is supported for non-dictionary collection literals if the element type of the collection is some KeyValuePair<,>.

  1. Allow expression_element and spread_element in dictionary? This requires effectively that constructing dictionaries understands how to incorporate a KVP value from an external source. For example:
c#
KeyValuePair<string, int> GetMads() => new("Mads", 24);
    
Dictionary<string, int> nameToAge = ["Dustin": 42, "Cyrus": 43, GetMads()]; // otherwise, you'd have to write:
    
var mads = GetMads();
Dictionary<string, int> nameToAge = ["Dustin": 42, "Cyrus": 43, mads.Key: mads.Value]; 

Note: supporting spreads for dictionaries strongly implies that this should "just work". But we want LDM to be ok here.

Proposal:

  • expression_element is supported in a dictionary if the element type is some KeyValuePair<,> or dynamic.
  • spread_element is supported in a dictionary if the enumerated element type is some KeyValuePair<,> or dynamic.
  1. Collection literal syntax does not allow for a custom comparer.

Is it sufficient to rely on extension methods such as the following? Will type inference infer TKey, TValue from k:v?

C#
d = ["Alice": 42, "Bob": 43].AsDictionary(comparer);

static class Extensions
{
    public static Dictionary<TKey, TValue> AsDictionary<TKey, TValue>(
        this List<KeyValuePair<TKey, TValue>> list,
        IEqualityComparer<TKey> comparer = null);
}
  1. Syntax ambiguity (shown above)

Spreads

Corresponding 'construction' form of the 'slice' pattern.

C#
if (x is [var start, .. var middle, var end]) // pattern form:
    
List<string> values = ["start", .. middle, "end"]; // translates to potentially:
    
List<string> values = new List<string>(capacityIfKnown);
values.Add("start");
values.AddRange(middle);
values.Add("end");
    
// or potentially:
    
List<string> values = new List<string>(capacityIfKnown);
values.Add(start);
foreach (var v in middle)
    values.Add(v);
values.Add(end);
  1. Should we use AddRange if available? What if it boxes? We could leave it underspecified and utilized compiler heuristics. Fits into the overall discussion of the presumption that "construction" semantics are well-behaved (like pattern matching semantics).

Pros: This will commonly be quite fast for many collections passed to AddRange. AddRange often uses TryGetNonEnumeratedCount and CopyTo to update internal capacity, and then write directly into it.

Cons: There are definitely cases where this will allocate more than direct foreach (for example, if the AddRange impl cannot use CopyTo, and must instead create a heap allocated IEnumerator). This will box in the case where 'middle' is a struct. We could detect that and foreach in that case.

  1. Should evaluation of spread elements be lazy? For instance:
C#
static IEnumerable<int> GetIntegers()
{
    for (int i = 0; ; i++)
        yield return i;
}

IEnumerable<int> e = [..GetIntegers()];
int first = e.First();