Back to Csharplang

Collection literals working group meeting for July 26, 2023

meetings/working-groups/collection-literals/CL-2023-07-26.md

latest5.1 KB
Original Source

Collection literals working group meeting for July 26, 2023

Agenda

  • [CollectionBuilder] Create methods returning a derived/implementing type
  • Considering extension GetEnumerator() when determining element type
  • Inline array sizes to synthesize

Discussion

[CollectionBuilder] Create methods returning a derived/implementing type

For the BCL, this mainly concerns the IImmutableXyz<T> interfaces. The compiler will have its own rules for which concrete types to use when a collection expression targets certain interfaces such as IReadOnlyList<T>. If we don't allow [CollectionBuilder] to be placed on the IImmutableXyz<T> interfaces, they won't be able to be targeted by collection expressions.

We're lukewarm on the importance of the IImmutableXyz<T> interfaces themselves, but we don't feel great about blocking people from making their own interface type constructible using [CollectionBuilder]. It feels like a toe-stubbing situation, and the cost to the compiler to allow this is very low.

Then if we do allow [CollectionBuilder] to be placed interface types, the runtime would go ahead with adding [CollectionBuilder] to the IImmutableXyz<T> interfaces so long as it would be valid for the attribute to point to the existing Create methods that return concrete types. For instance, [CollectionBuilder] on IImmutableList<T> would point to the Create method whose return type is the concrete ImmutableList<T> rather than the interface type. We would thus want to do both things together: allow [CollectionBuilder] on interface types, and allow the Create method to return a type which derives from or implements the attributed type.

Conclusion: Ask at a language design meeting whether to allow [CollectionBuilder] on interface types. If the answer is yes, the runtime is only willing to make use of this if the attribute can point to Create methods whose return types are concrete implementing types rather than interface types.

Considering extension GetEnumerator() when determining element type

The spec currently says that the element type of a collection literal is the iteration type of the collection type. However, there's a downside to considering extension GetEnumerator methods: you can no longer tell whether the collection type is a valid target for a collection expression by looking at the collection type declaration itself. The iteration type of that collection type is dependent on the using directives where the collection expression is written because the using directives may or may not bring in a GetEnumerator extension method.

This makes it impossible for the compiler to provide a diagnostic for an obvious mistake where [CollectionBuilder] is applied to a type that declares no GetEnumerator method of its own. After all, a usage site could just happen to import an extension method which makes collection expressions compile when targeted to this collection type. If we rule this out by disallowing extension GetEnumerator, the situation becomes unequivocally invalid, and all diagnostics can be provided on the collection type declaration rather than the usages of collection expressions with that collection type.

Conclusion: Do not consider extension GetEnumerator methods when determining the element type of a collection expression. Provide an error diagnostic if [CollectionBuilder] is applied to a type that doesn't declare its own GetEnumerator method.

Inline array sizes to synthesize

The current plan is to use inline arrays in the compiler-generated code. (stackalloc complicates codegen because the operation should be hoisted outside of loops. It also only works for unmanaged element types.)

An inline array requires a different type to be declared for each array length. If the BCL declares some well-known inline array types for certain array lengths, the compiler can use those, but array lengths not declared by the BCL require the compiler to generate types in the private implementation details of the assembly being compiled. A lot of inline array types could end up being generated per assembly which uses collection expressions.

Do we want to try to cut down on the number of inline array types, for instance by only generating in powers of two and overallocating on the stack but slicing to the needed size? Also, should there be a cutoff where above a certain size, the heap is always used?

The performance downside of overallocating (using oversized inline arrays) is time spent zeroing memory. We don't want to leave performance on the table. The number of types being generated is not a concern. The compiler already generates large numbers of types of various fixed sizes for other features. If there are too many after all, the BCL will declare some of these.

We're not going to get it right on the first release. We can start simple, react to performance feedback from C# 12, and change the strategy as needed.

Conclusion: When the element count is known at compile time, always use an exactly-sized inline array and never use the heap. When the element count can vary at runtime, always allocate on the heap.