Nullability improvements working group meeting for November 22nd, 2022

We continued our discussion on nullable LINQ analysis, focusing particularly on introducing a "filtering" attribute which allows nullable state within predicate lambdas to flow out of the resulting collection.

We used the following attribute declaration as a starting point:

namespace System.Diagnostics.CodeAnalysis;

/// <summary>
/// Can be applied to a parameter of a delegate type such as <see cref="Func<object, bool>"/> which has a single parameter and returns <see langword="bool"/>.
///
/// When this attribute is used, the language assumes that the predicate returns true for all values of its input type which are returned by the containing method.
/// </summary>
[AttributeUsage(AttributeTargets.Parameter, Inherited = false)]
public sealed class NullableFilterAttribute : Attribute { }

Where usage would look like the following:

using System.Collections.Generic;
using System.Diagnostics.CodeAnalysis;

namespace System.Linq;

public static class Enumerable
{
    public static IEnumerable<TSource> Where<TSource>(
        this IEnumerable<TSource> source,
        [NullableFilter] Func<TSource, bool> predicate)
    {
        // ...
    }
}

When a [NullableFilter] parameter is used, it is associated with the parameter type of the delegate. In this case, that parameter is predicate, the delegate parameter's type is TSource, and the containing method is Where. All values of type TSource which flow out of Where are assumed to have been passed in to a call to predicate, and a true value was returned from each call.

One of the first bits of feedback was on the name. We think this attribute is not itself about nullability. It's just that if lambdas passed as arguments for the predicate parameter do null tests, then null state information about the collection elements can propagate through. Other forms of static analysis could easily take advantage of the same assumption, that the predicate returned true for all elements returned from the containing method. Therefore, we think perhaps the attribute name [Filter] or [Predicate] would be more suitable.

We want to ensure we consider the suitability of this attribute for other APIs in System.Linq.Enumerable that have overloads that apply a predicate to a sequence, such as:

First
Last
TakeWhile
...

Relating to the above, we'd like to consider what the impacts are with various signatures, especially with various return types. For example, for Collection<T, T> Method<T>([NullableFilter] Func<T, bool> predicate), do both of the Ts in the return type get the additional flow state from the predicate?

We're not sure whether we should have both "predicate returned true" and "predicate returned false" flavors of the attribute. If the implementation wanted to guarantee that a predicate returned false, they would need something like [Filter(predicateReturnValue: false)]. If we're not able to find compelling examples where people would want to adopt in this manner, we'll probably want to stick with just the "predicate returns true" flavor.

We also briefly considered whether this attribute can affect other outputs besides returns. For example, what if you had a method void Where(IEnumerable<T>, [NullableFilter] Func<T, bool>, out IEnumerable<T>)? At this point, we think there isn't a scenario involving out parameters, but we'll want to keep it in mind as we proceed with the design. It's worth noting that attributes like [NotNullIfNotNull] can be applied to both return values and out parameters (where the attributed thing is the output affected by some named input.)

We noted that some indirect usages which rely on a more "set theory" approach are unlikely to result in useful changes to flow analysis.

csharp

// Bad usage that would be a problem if the attribute is on the method, not on the lambda expression:
var allNullValues = someCollection.Where(i => i is null); // IEnumerable<Widget?>
allNullValues[0].ToString(); // warn because the predicate didn't contain a null check.
var dataToWrite = someCollection.Except(allNullValues); // lost track

We are tentatively thinking of the state-propagation which occurs for a filtering attribute as modifying/reinferring type arguments present in output positions.

record Widget(string? Prop) { }

var en = widgets.Where(widget => widget.Prop != null); // IEnumerable<Widget+__predicateReturnedTrue>
foreach (var item in en)
{
    item.Prop.ToString(); // ok
}

// lambda for 'lastChance' can't assume T passed the filter (it didn't in this case)
public static IEnumerable<T> Where(IEnumerable<T> source, [Filter] Func<T, bool> primary, Func<T, bool> lastChance)
{
    foreach (T item in source)
    {
        if (primary(item)) yield return item;
        else if (lastChance(item)) yield return item;
    }
}

We considered would happen when a filtered enumerable en is enumerated multiple times, potentially modifying its contents. Our conclusion is basically that we don't want to attempt to track changes to the collection element across statements. We think it is best to avoid any scheme where the initial state of elements in en depends on where en is used.

var en = widgets.Where(widget => widget.Prop != null);
foreach (var item in en)
{
    item.Prop = null;
}
foreach (var item in en)
{
    item.Prop.ToString(); // no warning and NRE at runtime
}

We think the NRE present above is similar to aliasing scenarios which already don't work:

var item = new Widget("a");
var item1 = item;
item1.Prop = null; // doesn't apply to `item`
item.Prop.ToString(); // NRE but no warning

We also note that the language assumes that member method calls don't change the state of fields, unless specifically annotated in the signature:

var en = widgets.Where(widget => widget.Prop != null);
// When does the compiler decide a mutation has taken place?
// mutation and repeated enumeration
foreach (var item in en)
{
    item.M();
}
foreach (var item in en)
{
    item.Prop.ToString(); // ?
}

// similar to the following method scenario:
local(item);
item.Prop.ToString(); // kaboom, no warning
void local(Widget i) => i.Prop = null;

There is still some ambiguity about when the collection element flow state disappears, particularly when the collection element state is contained in some mutable collection which we go on to mutate.

var en = widgets.Where(widget => widget.Prop != null);
var arr = en.ToArray(); // T[] ToArray<T>(this IEnumerable<T>)
arr[0].Prop.ToString(); // ok? Do we remember that we passed the `widget.Prop != null` test?
foreach (var item in arr)
{
    item.Prop.ToString(); // ok
}

// ToArray() followed by mutation
// what happens here? do we just accept it?
// Is this itself a warning--this is not a *Widget-with-non-null-Prop*, it's a *Widget-with-null-Prop*.
arr[0] = new Widget(null); 
foreach (var item in arr)
{
    item.Prop.ToString(); // do we still expect non-null-Props here?
}

// Concat()
var en1 = en.Concat(new Widget(Prop: null)); // "unify" into a standard Widget type
foreach (var item in en1)
{
    item.Prop.ToString(); // warning
}

// Null tests through methods with nullable postcondition attributes
var en1 = widgets.Where(widget => !string.IsNullOrWhitespace(widget.Prop));
foreach (var item in en1)
{
    item.Prop.ToString(); // ok (we expect the compiler to understand that widget.Prop is not-null when the predicate returns true)
}

We think that this scheme should be naturally composable, and for example, allow a sequence of Where() calls to just do the right thing.

record Widget(string? Prop) { }

var x = en.Where(w => w is not null) // IEnumerable<Widget>
          .Where(w => w.Prop is not null); // IEnumerable<Widget+element-flow-state>

There won't be checking of the implementation for correctness, so a Where method like the following will result in a misleading analysis at the call-site:

IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> input, Func<TSource, bool> predicate)
{
    foreach (var item in input)
    {
        if (predicate(item)) { } // ignore result of the predicate call
        yield return item;
    }
}

As mentioned earlier, we want to take care to avoid inventing flow-dependent typing here. So there is some question on a technical level about whether to associate the collection element slot information with variables and expressions or with types. Also, we don't think this collection element information should appear in the public API for types.

Review of https://github.com/dotnet/csharplang/blob/main/proposals/csharp-9.0/nullable-reference-types-specification.md could be helpful when we go deeper in attempting to formalize this.

Implementation brainstorming

A subset of the working group brainstormed briefly about how the implementation would work. This part is more speculative than the rest, and depends on the design leading to it being fully formed and approved by the language design team.

The NullableWalker in Roslyn has a structure called VisitResult which is produced whenever an expression is visited. It currently contains an rvalue and lvalue flow state, and we think with this feature we might augment it to also include a map of "filtered type" to "element flow state".

For example, en.Where(w => w.Prop is not null) would have an entry for the type of w which points to the final state-when-true after w.Prop is not null. Then, when we access a member off of en.Where(...) whose type is a filtered type, we can get the slot for that element flow state and use it as the initial state of variables of the filtered type.

In our next meeting we plan to dive deeper into the mechanics of the filtering attribute, and hopefully reach a point where we understand the solution we want well enough to write a proposal for it.