Collection expressions: LDM proposals 2023-08-14

Overload resolution

Collection expressions can be implicitly converted to multiple target types, which increases the chance of overload resolution ambiguities.

We may not be able to choose between two arbitrary collection types when neither collection type is implicitly convertible to the other.

For performance though, we could consider choosing spans over arrays or interfaces. The rule could apply to collection expression arguments only, to avoid a breaking change to existing code.

SpanAndArray([1, 2, 3]);  // (proposed) uses Span<T> overload
SpanAndInterface([4, 5]); // (proposed) uses Span<T> overload

static void SpanAndArray<T>(Span<T> args) { }
static void SpanAndArray<T>(T[] args) { }
    
static void SpanAndInterface<T>(Span<T> args) { }
static void SpanAndInterface<T>(IEnumerable<T> args) { }

For instance, an additional rule could be added to better conversion from expression.

Given an implicit conversion C₁ that converts from an expression E to a type T₁, and an implicit conversion C₂ that converts from an expression E to a type T₂, C₁ is a better conversion than C₂ if one of the following holds:

...

C₁ and C₂ are collection expression conversions and the following hold:

T₁ is a span type Span<S> or ReadOnlySpan<S>.

T₂ is a single-dimensional array type E[] or an interface IEnumerable<E>, IReadOnlyCollection<E>, IReadOnlyList<E>, ICollection<E>, IList<E>.

S is implicitly convertible to E.

In practice though, pairs of overloads with spans and either arrays or interfaces may be uncommon, at least in the BCL, because overload resolution prefers the array and interface overloads when passing array arguments.

SpanAndArray(new[] { 1, 2, 3 });  // uses T[] overload
SpanAndInterface(new[] { 4, 5 }); // uses IEnumerable<T> overload

Type inference

See proposal: Type inference

Method type inference involving collection expressions allows inferring the iteration type of a collection type. There is no support for inferring the containing collection type.

The iteration type is defined by foreach — it is the type of the foreach-able item. If a collection initializer type implements IEnumerable only and does not otherwise implement the GetEnumerator() pattern, the iteration type is object.

Inferences are made to the iteration type, independently for each element.

AsArray([null]);             // error: cannot infer T
AsArray([1, 2, 3]);          // AsArray<int>(int[])

static T[] AsArray<T>(T[] arg) => arg;

byte b = 1;
int i = 2;

ArrayAndValue(new[] { b }, i); // error: cannot infer T
ArrayAndValue([b], i);         // ArrayAndValue<int>()

static void ArrayAndValue<T>(T[] x, T y) { }

The type inference change is handled with two rules — one rule for input type inference and a matching rule for output type inference: see proposal.

Input type inference is necessary for walking into the element expressions to infer from explicitly-typed lambda parameters, from tuple element types, or from nested collection expressions.

InputTypeInference([(int y) => { }]); // InputTypeInference<int>()

static void InputTypeInference<T>(List<Action<T>> x) { }

Output type inference is necessary for inferring from the return type of lambdas and method groups in particular.

OutputTypeInference([() => 1]);       // OutputTypeInference<int>()

static void OutputTypeInference<T>(List<Func<T>> x) { }

Conversions

See proposal: Conversions

There is a collection expression conversion from a collection expression to the following types. These represent the valid target types for a collection expression.

single-dimensional arrays
span types
types with a builder method
generic interfaces implemented by List<T>
collection initializer types

int[]               a = [1]; // array
ReadOnlySpan<int>   b = [2]; // span
ImmutableArray<int> c = [3]; // builder
IReadOnlyList<int>  d = [4]; // generic list interface
List<int>           e = [5]; // collection initializer

For all but collection initializer types, there must be an implicit conversion to the iteration type T for each expression element in the collection expression, and an implicit conversion to theT for the iteration type of each spread element.

string[] a = [x, y, z];
int[] b = [1, null];           // error: cannot convert 'null' to 'int'
ImmutableArray<int> c = [..a]; // error: cannot convert 'string' to 'int'

For collection initializer types, there must be an applicable instance or extension Add method for each expression element in the collection expression, and an applicable Add method for an argument of the iteration type of each spread element.

MyCollection m = [1, "2", (object)3]; // error: no 'Add(object)' method found

class MyCollection : IEnumerable
{
    public void Add(int i) { ... }
    public void Add(string s) { ... }
    public IEnumerator GetEnumerator() { ... }
}

Collection initializer types that implement IEnumerable only, are the one case of a target type without a strongly-typed iteration type.

Adding target types is a breaking change for overload resolution and type inference.

Question: Should "generic interfaces implemented by List<T>" be an explicit set instead?

Question: Support conversions to Memory<T> and ReadOnlyMemory<T>?

Question: Support conversions to inline array types? We'd need to validate the collection length at compile for known length collections.

Ref safety

See proposal: Ref safety

For collection expressions of ref struct target types, the compiler may allocate the storage for the collection on the callstack when the following hold:

The collection expression has local scope
The collection has a known length and is not empty
The runtime supports inline array types

An empty collection requires no allocation for storage and has global scope.

A ReadOnlySpan<T>, where T is one of several primitive types, and where the collection expression contains constant values only, is stored in the assembly data section and does not allocate at the use-site, and therefore has global scope.

The compiler will use heap allocation if the runtime does not support inline array types. The working group considered using stackalloc instead on older runtimes but managing stackalloc buffers would require unnecessary effort for what is otherwise an unsupported scenario.

Print([]);               // empty collection, no allocation
Print([1, 2, 3]);        // primitive constants, assembly data, global scope
Print([x, y, z]);        // stack allocation, local scope
Print((int[])[1, 2, 3]); // heap allocation, global scope

static void Print(ReadOnlySpan<object> values) { ... } // argument is implicitly scoped

The span argument to a builder method may be allocated on the callstack.

ImmutableArray<int> ia = [x, y, z]; // stack allocate Create() argument

[CollectionBuilder(typeof(ImmutableArray), "Create")]
public struct ImmutableArray<T> { ... }

public static class ImmutableArray
{
    public static ImmutableArray<T> Create(ReadOnlySpan<T> values) { ... }
}

What about other uses of collection expressions as ref struct instances where the scope is not clear from the immediate context? Should those collection expressions be considered local or global scope?

static ReadOnlySpan<T> AsSpan2<T>(T x, T y)
{
    Span<int> s = [x, y]; // local scope or global?
    return s;             // error if local scope
}

For that question, we considered three options - see meeting notes:

Use scope of target
Determine scope of target based on usage
Use local scope always

Option 1 may go against user intuition that a collection expression represented as a span can be allocated on the stack, and as a result, it may be a pit of failure. To mitigate that, the compiler should report a diagnostic if heap allocation is required.

Option 2 seems fragile since minor changes in the method (or external method signatures) may result in a collection expression being allocated on the heap. We'd need the diagnostic from option 1 to mitigate this. This option would also require the compiler to include flow analysis when checking ref safety.

Option 3 ensures collection expressions that directly target spans can be allocated on the stack. However, this means such spans are not returnable unless the user explicitly converts to a heap-allocated type.

static ReadOnlySpan<T> Option3_AsSpan2<T>(T x, T y)
{
    return [x, y];    // error: span may refer to stack data
}

static ReadOnlySpan<T> Option3_AsSpan3<T>(T x, T y, T z)
{
    return (T[])[x, y, z]; // ok: span refers to T[] on heap
}

The working group recommendation is option 3: treat the collection expression as local scope, unless empty or cached in assembly data section.

Question: When stack allocation cannot be used for a collection expression with span target type, should we fall back to a diagnostic and/or heap allocation?