`ref` parameters, arguments, returns and `val` returns

Pull request

Abstract
Problem
Background
Proposal
- ref bindings
- ref and val returns
Details
Future work
Rationale
Alternatives considered

Abstract

A parameter binding can be marked ref instead of var or the default. It will bind to reference argument expressions in the caller and produces a reference expression in the callee.
- Unlike pointers, a ref binding can't be rebound to a different object.
- This replaces addr, and is not restricted to the self parameter.
- A ref binding, like a value binding, can't be used in fields of classes or structs.
- When calling functions, arguments to non-self ref parameters are also marked with ref.
The return of a function can optionally be marked ref, val, or var. These control the category of the call expression invoking the function, and how the return expression is returned.
- These may be mixed for functions returning tuple or struct forms.
Any parameters whose lifetime needs to contain the lifetime of the return must be marked bound.
The address of a ref binding is nocapture and noalias.
We mark parameters of a function that may be referenced by the return value with bound.

Problem

Reference bindings have come up multiple times:

as a better alternative to addr self: Self*;
for use in lambda captures;
to model different kinds of C++ references for interop and migration;
to support nested bindings within a destructured var, see issue #5250 and proposal #5164;
for forwarding arguments while preserving expression category;
to add a feature to pattern matching to modify things after they have been matched;
to support refactoring code without changing all the uses of a name, a problem we are already seeing with self and addr self, and would be a point of friction in local pattern matching in the future; and
to support breaking up an expression into pieces without altering the expression category of individual pieces.

Reference returns have also come up before, particularly to support operators such as indexing [...] and other functions that should produce a reference expression. It is desirable, though, that this not introduce new memory unsafety concerns, due to returning a reference to something with insufficient lifetime.

In addition, we have been interested in adding other return mechanisms that support returning values in registers in cases that our current convention won't.

Background

Carbon has reference expressions.
Using the addr keyword on mutating methods to get a self with a pointer type was introduced in proposal #722: "Nominal classes and methods".
Leads issue #5261: "We should add ref bindings to Carbon, paralleling reference expressions" supports adding ref bindings to Carbon.
LLVM's noalias attribute is used to mark a pointer as being aliased in only limited ways to enable optimization. Also see LLVM's pointer aliasing rules.
Marking a pointer as not captured, to allow optimizations, was originally done with LLVM's nocapture attribute, which has become captures(none) and captures(ret: address, provenance), which is governed by pointer capture rules.
Clang allows C++ code to use the clang::lifetimebound attribute to mark parameters that may be referenced by the return value, in order to detect some classes of use-after-free memory-safety bugs.
C++ has reference types.

Proposal

`ref` bindings

We introduce a new keyword ref. This may be added to a : binding to mark it as binding to a reference expression, as in:

carbon

fn F(ptr: i32*) {
  // A reference binding `x`.
  let ref x: i32 = *ptr;

  // Use of `x` is a reference expression that
  // refers to the same object as `*ptr`.
  Assert(&x == ptr);

  // Equivalent to `*ptr += 1;`.
  x += 1;
}

fn G() {
  var y: i32 = 2;
  F(&y);
  Assert(y == 3);
}

The use of the name (x in the example) of a ref binding forms a durable reference expression. We ensure that reference expressions formed by way of reference bindings do not dangle. A ref binding may only bind to a durable reference expression or an expression that can be converted to one. The bound durable reference expression must outlive the ref binding.

The address of a ref bound name gives the address of the bound object, so &x == ptr above. The reference itself does not have an address, and unlike a pointer can't be rebound to reference a different object.

We remove addr, and use instead use ref for the self parameter when an object is required. Note that the type will change from Self* to Self in this case. In the future, we might re-add addr back if needed.

carbon

class C {
  // ❌ No longer valid.
  fn OldMethod[addr self: Self*]() {
    // Previously would dereference `self` in
    // the body of the method.
    self->x += 3;
  }

  // ✅ Now valid.
  fn NewMethod[ref self: Self]() {
    // Now `self` is a reference expression,
    // and is not dereferenced.
    self.x += 3;
  }

  // ✅ Other uses are unchanged.
  fn Get[self: Self]() -> i32 {
    return self.x;
  }

  var x: i32;
}

Potentially abbreviating the syntax further (to allow ref self as a short form of ref self: Self) is left as future work.

The ref modifier is allowed on : bindings that are not:

inside a var pattern,
a field of a class type, or
a field of a struct type.

carbon

fn AddTwoToRef(ref x: i32) {
  x += 1;
  let ref y: i32 = x;
  y += 1;
}

// Equivalent to:
fn AddTwoToRef(ref x: i32) {
  x += 1;
  let y_ptr: i32* = &x;
  *y_ptr += 1;
}

We add support for ref and var in a struct pattern when using the shorthand a: T syntax for .a = a: T:

carbon

let {var a: i32, ref b: i32} = ...;

// Now equivalent to:
let {.a = var a: i32, .b = ref b: i32} = ...;

Note: This takes us one step closer to { ambiguity. Previously we could distinguish between a struct literal/pattern and a non-empty block with only up to two tokens of lookahead (the struct cases start with . or _ or identifier followed by :, and the block cases don't). Now we have things like:
carbon
fn F() -> X { var a: i32 = 0; }
... where we're getting incrementally closer to ambiguity. We've got a few more steps before we get there, though, since we don't have an X{...} expression yet, and var ... is only allowed in struct patterns rather than struct expressions. So we're still fine, but this is cutting down our options for future syntactic expansion a little.

The ref modifier is forbidden on the bindings in class or struct type fields.

var outer_size: i32 = 123;

class Invalid {
  // ❌ Invalid. We don't currently have runtime `let` bindings in classes,
  // or `ref` on `var`s, but the intent is to not have `ref` bindings as fields.
  let ref invalid_ref_field: i32 = outer_size;
}

// ❌ Invalid.
var invalid_struct_type_field:
    {ref .invalid: i32} = {.invalid = outer_size};

In a function argument list, arguments to non-self ref parameters are also marked with ref. Continuing the example:

carbon

var z: i32 = 3;
AddTwoToRef(ref z);
Assert(z == 5);

// No `ref` though on the `self` argument.
var c: C = {.x = 4};
c.NewMethod();
Assert(c.Get() == 7);

Note: It is important that this restriction is syntactic, not just semantic, because it means that ref is never the first token of a full expression, and so we know without lookahead that a ref in a pattern context must be the start of a binding pattern, not the start of an expression pattern.

Normally an argument to a non-ref parameter should not be marked ref, but it is allowed in a generic context where the parameter may sometimes be ref.

Expression operators will mostly not take ref parameters, with these exceptions:

the address-of operator &;
the first operand of the indexing operator [...]; and
the member access operator introduced in proposal #3720: "Member binding operators" ..

The statement operators now use ref instead of pointers:

the left-hand operand of assignment operators such as = and +=.
the ++ and -- operators.

Even in these cases, the arguments will not be marked with ref at the call site. (Generally the ref parameter is the self parameter, and so wouldn't be marked. The exception is BindToRef, but we don't want to mark its argument with ref.)

As an experiment, we are saying a pointer formed by taking the address of a ref bound name is LLVM-captures(none) and LLVM-noalias. This means that while a ref parameter could be passed into a function by address, the restrictions also allow a "move-in-move-out" approach (once we define the move operation), assuming it is not marked bound. The intent here is to leave the door open to a calling convention using registers and less indirection for small-enough objects.

This means that the following code is invalid:

carbon

fn F(ref a: i32, ref b: i32) -> bool;

fn G() -> bool {
  var v: i32 = 1;
  return F(ref v, ref v);
}

Enforcing this restriction will be part of the memory safety story. Until then, doing this is erroneous behavior. This means that the compiler won't use those LLVM attributes unless the compiler can itself prove that the restrictions hold.

`ref` and `val` returns

The return of a function can optionally be marked ref or val. These control the category of the call expression invoking the function, and how the return expression is returned.

carbon

var global: i32 = 2;
fn ReturnRef() -> ref i32 {
  // ❌ Invalid: return 2;

  // ✅ Valid: return a reference expression with
  // sufficient lifetime.
  return global;
}
// Call `ReturnRef` and use the resulting reference.
ReturnRef() += 3;
Assert(global == 5);

// Result of `ReturnRef` can be bound using a `ref`
// binding.
fn AddFive() {
  let ref r: i32 = ReturnRef();
  r += 5;
}
AddFive();
Assert(global == 10);

fn ReturnVal() -> val i32 {
  return 2;
}
// ReturnVal() is a value expression.
let l: i32 = ReturnVal();

// Returning an initializing expression is the
// default.
fn ReturnDefault() -> i32 {
  return 2;
}
// ReturnDefault() is an initializing expression.
var j: i32 = ReturnDefault();

// Use `var` to explicitly specify returning an
// initializing expression.
fn ReturnVar() -> var i32 {
  return 2;
}
// `ReturnVar()` is the same as `ReturnDefault()`.

A call to a function declared -> ref T is a durable reference expression. The generated code for that function will return the address of a T object.
A call to a function declared -> val T is a value expression. The function will return the value representation of T. Since values have no address, the value representation may be returned in registers.
The behavior of a call to a function declared -> T is unchanged. It is an initializing expression, returning in place or by copy depending on the initializing representation of T. This is the same behavior as -> var T.
The behavior of auto as the return type is unchanged, but now supports an optional ref, val, or var between the -> and auto. -> auto continues to return an initializing expression, as does -> var auto. -> val auto returns a value expression, and -> ref auto returns a durable reference expression.
Using => to specify a return continues to return an initializing expression, as before. See this relevant alternative considered.

A function may have multiple returns, each with their own marker, by using a tuple or struct compound return form.

carbon

fn TupleReturn() -> (val bool, ref i32, C) {
  return (true, global, {.x = 3});
}

fn StructReturn()
    -> {.a: val bool,
        .b: ref i32,
        .c: C} {
  return {.a = true,
          .b = global,
          .c = {.x = 3}};
}

If the return of a function may reference the storage of one or more parameters to the function, those parameters must be marked bound. This allows the compiler to diagnose if the function's return is used after the lifetime of the bound parameter ends. The semantics of bound are intended to match the clang::lifetimebound attribute.

carbon

fn Member(bound ref c: C) -> ref i32 {
  return c.x;
}

// Lifetime of a pointer includes the lifetime
// of what it points to.
fn Deref(bound p: i32*) -> ref i32 {
  return *p;
}

fn Both(bound pc: C*) -> ref i32 {
  return p->x;
}

fn Invalid1() -> ref i32 {
  var x: i32 = 4;
  // ❌ Error: returning reference to `x`
  // whose lifetime ends when this function
  // returns.
  return x;
}

fn Invalid2() -> ref i32 {
  var c: C = {.x = 1};
  // ❌ Error: returning reference bound to `c`
  // whose lifetime ends when this function
  // returns.
  return Member(c)
}

The address of a bound ref parameter is the LLVM attribute captures(ret: address, provenance) instead of captures(none).

Details

The intent is that we would encourage using references instead of pointers when possible. Their benefits are related to their limitations, so to get those benefits we should use them when a use is restricted enough to be within those limitations.

Compound return forms and patterns

Mirroring the tuple and struct pattern forms, we also support tuple and struct return forms.

carbon

// `->` begins a "return form"
fn F()     -> <return-form>;

// Within any return form, if the first token is
// `val`, `ref`, `var`, `(`, or `{`, it is not
// treated as type expression:

// Value return, with type as specified
fn Val()   -> val <type-expr>

// Reference return, with type as specified
fn Ref()   -> ref <type-expr>

// Initializing return, with type as specified
fn Var()   -> var <type-expr>

// Tuple compound return, with a list of
// return forms.
fn TupleCompound() -> ( <return-form>, ... )
// Tuple return, with a list of type
// expressions. Used if all members of the
// list are type expressions.
fn Tuple() -> ( <type-expr>, ... )

// Struct compound return, with a mapping from
// designators to return forms.
fn StructCompound()
    -> { .<id>: <return-form>, ... }
// Struct return, used if all of the members
// are type expressions.
fn Struct() -> { .<id>: <type-expr>, ... }

// Otherwise, implicit `var` means returns
// an initializing expression.
fn Other() -> <type-expr>

Note that in the absence of val, ref, and var keywords, the implicit var is placed in the outermost position, minimizing the number of primitive forms returned. So fn F() -> (i32, i32) means fn F() -> var (i32, i32) not fn F() -> (var i32, var i32). Generally the var is left off if not required, and so will be rare in return forms, to minimize confusion with val.

carbon

fn TupleReturn(...)
    -> (val bool, ref i32, C);
// Equivalent to:
//  -> (val bool, ref i32, var C);

let (a: bool, ref b: i32, var c: C)
    = TupleReturn(...);

fn StructReturn(...)
    -> {.a: val bool,
        .b: ref i32,
        .c: C};
// Equivalent to:
//  -> {.a: val bool,
//      .b: ref i32,
//      .c: var C};

// Binds to the names `x`, `y`, `z`:
let {.a = x: bool,
     .b = ref y: i32,
     .c = var z: C} = StructReturn(...);

// Binds to the names `a`, `b`, `c`:
let {a: bool,
     ref b: i32,
     var c: C} = StructReturn(...);

// Above two can be mixed, binding to
// names `a`, `y`, `z`.
let {a: bool,
     .b = ref y: i32
     .c = var z: C} = StructReturn(...);

Only types are allowed after a -> val, -> ref, or -> var, not a compound return form. Examples:

carbon

// Returns a tuple of type
// `(bool, f32, C, i32)`.
fn OneTupleReturn(...)
    -> (bool, f32, C, i32);

// Returns a compound tuple form
fn CompoundReturn(...)
    -> (bool, val f32);
// Equivalent to:
//  -> (var bool, val f32);

// ❌ Invalid, can't specify `ref` inside
// of `val`.
fn Invalid(...) -> val (bool, ref f32);

The compound return forms may be nested, as in:

carbon

fn CompoundInParens(...)
    -> ({.a: bool, .b: val f32}, C, ref i32);
// Equivalent to:
//  -> ({.a: var bool, .b: val f32}, var C, ref i32);

let ({.a = var x: bool, .b = val y: f32},
     var c: C, ref d: i32) = CompoundInParens(...);
// or without renaming:
let ({var a: bool, b: f32},
     var c: C, ref d: i32) = CompoundInParens(...);

// Contrast with a compound tuple form containing
// a struct type (not compound):
fn StructInParens(...)
    -> ({.a: bool, .b: f32}, C, ref i32);
// Equivalent to:
//  -> (var {.a: bool, .b: f32}, var C, ref i32);

fn CompoundInBraces(...)
    -> {.a: bool, .b: (val f32, C), .c: ref i32};
// Equivalent to:
    -> {.a: var bool, .b: (val f32, var C), .c: ref i32};

let {a: bool,
     .b = (x: f32, var y: C),
     ref c: i32} = ParensInBraces(...);

// Contrast with a compound struct form containing
// a tuple type (not compound):
fn TupleInBraces(...)
    -> {.a: bool, .b: (f32, C), .c: ref i32};
// Equivalent to:
    -> {.a: var bool, .b: var (f32, C), .c: ref i32};

This feature is intended to support cases like enumerate that will want to return a value for the index but a reference to the element of the sequence being enumerated.

Nested binding patterns

Since a ref binding may only bind to a durable reference expression, it can't be used to bind the result of a function returning an initializing expression. However, if the initializing expression is bound to a var, any nested patterns are reference binding patterns bound to the subobject, following proposal #5164: "Updates to pattern matching for objects".

For example:

carbon

fn F() -> (bool, (C, i32));
let (b: bool, var (c: C, i: i32)) = F();

is equivalent to:

carbon

fn F() -> (bool, (C, i32));
let (b: bool, var v: (C, i32)) = F();
let ref c: C = v.0;
let ref i: i32 = v.1;

Note that ref is disallowed inside var since that would be redundant.

Mututation restriction on objects bound to a value

Mutation of objects with a non-copy-value representation in an active value binding ("borrowed objects") is erroneous behavior.

Our plan is to prevent mutation of borrowed objects in Carbon's strict safe dialect.
- We should only relax our stance here and consider making such mutation allowed if we discover difficulty with this that we cannot overcome.
- But we should revisit the underlying idea of mutation being erroneous if enforcing it in the strict mode proves fundamentally untenable due to ergonomic costs.
There will always be the potential for unchecked code, either unsafe Carbon code or C++ code by way of interop, to mutate a borrowed object, hence the need to define it as erroneous behavior.
There is no need to ever make it anything more than erroneous behavior, see below.
If we can prove the mutation doesn't occur, then we can use that to optimize under "as-if", and we don't need anything else.
We are deferring the decision of whether strict enforcement is enabled in Carbon's current C++-friendly mode, when not explicitly marking the code as "unsafe."

This was discussed 2025-05-22 and then made the subject of leads issue #5524.

No optimization on erroneous behavior

The Carbon compiler should not optimize on erroneous behavior, ever, unless the compiler literally proves that it does not occur, ever. In which case, we don't need any license to start optimizing on this, as it falls under "as-if".

The fact that undefined behavior ("UB", cppreference, wikipedia) provides "as-if" without a proof is precisely the risk of using UB for any semantics, and why we don't use it here and elsewhere we use erroneous behavior.

`bound` parameters

It is erroneous behavior to return something that references a local object that won't live once the function returns, even if it is a parameter marked bound. Local objects include local variables, temporary objects, and var parameters.

carbon

fn Invalid1() -> i32* {
  var x: i32 = 4;
  // ❌ Invalid.
  return &x;
}

fn Invalid2(bound x: i32) -> ref i32 {
  var y: i32 = x;
  // ❌ Invalid.
  return y;
}

// ✅ Valid
fn Valid1(bound p: i32*) -> ref i32 {
  return *p;
}

fn Invalid3(bound var x: i32) -> ref i32 {
  // ❌ Invalid: lifetime of `var` parameter
  // ends when function returns.
  return x;
}

class ReturnMember {
  // ✅ Valid
  fn ValidRef[bound ref self: Self]() -> ref i32 {
    return self.m;
  }

  // ❌ Invalid: can't return reference to value.
  fn InvalidVal[bound self: Self]() -> ref i32 {
    return self.m;
  }

  // ❌ Invalid: `var self` lifetime ends.
  fn InvalidVar[bound var self: Self]()
      -> ref i32 { return self.m; }

  var m: i32;
}

class DerefPointerMember {
  // ✅ Valid
  fn ValidRef[bound ref self: Self]() -> ref i32 {
    return *self.pm;
  }

  // ✅ Valid
  fn ValidVal[bound self: Self]() -> ref i32 {
    return *self.pm;
  }

  // ✅ Valid
  fn ValidVar[bound var self: Self]()
      -> ref i32 { return *self.pm; }

  var pm: i32*;
}

Otherwise, bound parameters and global variables are the sources of storage that can be referenced by a return, but need not be referenced, particularly not on every code path.

carbon

// Result references `r` if `b` is true, and `p`
// otherwise. Valid as long as both are marked `bound`.
fn Conditional(b: bool, bound ref r: C, bound p: C*)
    -> ref C {
  if (b) {
    return r;
  } else {
    return *p;
  }
}

The parameters of functions defined in an interface may also be marked as bound. The impl of that interface for a type can omit occurrences of bound from the interface, but cannot add new ones.

carbon

interface I {
  fn F[bound ref self: Self]
      (a: Self, bound b: Self, bound c: Self*)
      -> ref Self;
}

impl C1 as I {
  // ✅ Valid: matches interface
  fn F[bound ref self: Self]
      (a: Self, bound b: Self, bound c: Self*)
      -> ref Self;
}

impl C2 as I {
  // ✅ Valid: proper subset of `bound` params
  fn F[ref self: Self]
      (a: Self, bound b: Self, c: Self*)
      -> ref Self;
}

impl C2 as I {
  // ❌ Invalid: `a` is not bound in `I.F`.
  fn F[ref self: Self]
      (bound a: Self, b: Self, c: Self*)
      -> ref Self;
}

Like [[clang::lifetimebound]] in C++, bound does not affect semantics or calling conventions, just what code is legal. This helps avoid mismatches between typechecking against the signatures in an interface when the impl functions are different. Exception: the question of whether bound affects the lifetime of temporaries is future work.

Note that all combinations of a val/ref/default return can be bound to a value/ref/var parameter. Examples:

carbon

fn RefToVal(bound ref x: C) -> val D { return x.d; }
fn ValToRef(bound y: C) -> ref D { return *y.ptr; }
fn VarToRef(bound var p: i32*) -> ref i32 { return *p; }
fn VarToDefault(bound var p: i32*) -> i32* { return p; }

For full safety, we need each bound parameter to be immutable for the duration of the lifetime of the returned result. However, the objective for now is only matching [[clang::lifetimebound]], which has the goal of preventing some classes of bugs, not full memory safety. We will reconsider this with the memory safety design.

Clang's lifetimebound attribute also only applies to the immediately pointed to objects (by pointers or reference parameters, or pointers or reference subobjects of an aggregate parameter). We suggest a simpler, transitive model here that is more restrictive but should be compatible. That said, pinning down the exact and firm semantics of bound, especially in these complex cases, is deferred to the full memory safety design as well.

How addresses interact with `ref`

The address of a ref binding is noalias and either captures(none) or captures(ret: address, provenance), depending on whether the binding is marked bound.

noalias means like C restrict; you can't observe mutations through aliases; mutation through a restricted pointer is not observable through another pointer
captures(none) means there is no transitive escape: you can pass a nocapture pointer to another nocapture function, but you can't store to memory or return
captures(ret: address, provenance): is like captures(none) but may be referenced by a return.

The combination of noalias and captures(none) semantics are the minimum for the "move-in-move-out" optimization. But this condition is hard to check, so safe code will use a stricter criteria. Unsafe code will be required to adhere to just the noalias restrictions, but will not be checked (except possibly by a sanitizer at runtime). The details here will be tackled as part of the memory safety design.

Optimizations will only be performed based on information that is enforced or checked by the compiler, so these attributes won't be passed to LLVM unless their requirements can be established. This avoids introducing undefined behavior, which we particularly don't want to do in situations where C++ doesn't.

The goal of these rules it to nudge us towards function boundaries that don't constructively create aliasing in their API boundary and don't capture pointers unnecessarily.

These restrictions are experimental, and we should keep track of everything we end up needing to do to work around these restrictions so any reconsideration can be properly informed.

Improved interop and migration with C++ references

We expect this to improve interop and migration by allowing significantly more interface similarity between Carbon and C++. Previously, many things in C++ that used references on interface boundaries would be forced to switch to pointers. This adds ergonomic friction both at a basic level because of the forced change but also a deeper level because it will make it significantly harder to see the parallel usage across the boundary between C++ and Carbon. With reference bindings, the vast majority of this dissonance will be removed.

This does create a migration concern, raised in open discussion on 2025-05-01, that the nocapture and noalias modifiers don't match C++ restrictions, particularly on the this parameter that we are going to require migrate to ref self. We may have to add back in addr to allow a different pointer type for those cases.

Future work: Addressing how we model the various kinds of C++ references that Carbon code may need to interact with is something we are actively considering and will be tackled in a future proposal.

Part of the expression type system, not object types

Much like value/val and var bindings, ref binding and the new return forms are are part of the type system, but only through expression categories, patterns (function parameters and so on), and returns. Specifically, we don't expect them to be part of the object types in Carbon. Like value bindings, we retain a great deal of implementation flexibility around layout, and the specifics of how they are lowered.

This specifically means we will need to incorporate ref bindings into the Call interface and we will be adding complexity there that will need to be handled by overloading. The changes to the Call interface is future work, and overloading, once we add support, will need to carry additional complexity to handle ref.

Interaction with `returned var`

The rule is: returned var may only be used when there is a single atomic return form, and it is the default var category.

carbon

// ✅ Allowed
fn F(...) -> V {
  returned var v: V = ...;
  // ...
  return var;
}

fn F(...) -> {var .a : T} {
  // ❌ Invalid: composite form
  returned var ret: T = ...;
  // ...
}

fn F(...) -> val T {
  // ❌ Invalid: value return
  returned var ret: T = ...;
  // ...
}

We can revisit and expand this later if this does not handle use cases we would like to support.

Use case: `Deref` interface

To support customization of the prefix-* dereferencing operator, we introduce the Deref interface.

carbon

interface Deref {
  let Result:! type;
  fn Op[bound ref self: Self]() -> ref Result;
}

final impl forall [T:! type] T* as Deref {
  where Result = T;
  fn Op[bound self: Self]() -> ref T
      = "builtin.deref";
}

Then *p is rewritten to p.(Deref.Op)(), and p->m is rewritten to p.(Deref.Op)().m. For example, this might be used by a smart pointer:

class SmartPtr(T:! type) {
  fn Make(p: T*) -> Self { return {.ptr = p}; }
  impl as Deref {
    where Result = T;
    fn Op[bound ref self: Self]() -> ref Result {
      return *self.ptr;
    }
  }
  private var ptr: T*;
}

Use case: indexing interfaces

Proposal #2274: "Subscript syntax and semantics" added the interfaces used to support indexing with the subscripting operator [...]. We change these in the following ways:

The addr self parameters are changed to bound ref self, to allow the result to reference the self object.
The At method returns by val.
The Addr methods are renamed Ref and return a reference instead of a pointer that is automatically dereferenced.

This proposal's PR makes those changes to the indexing design.

Use case: member binding interfaces

The member binding interface used for reference expressions from proposal #3720 can now be changed to use references instead of pointers.

Before:

carbon

// For a reference expression `x` with type `T`
// and an expression `y` of type `U`, `x.(y)` is
// `*y.((U as BindToRef(T)).Op)(&x)`
interface BindToRef(T:! type) {
  extend impl as Bind(T);
  fn Op[self: Self](p: T*) -> Result*;
}

After:

carbon

// For a reference expression `x` with type `T`
// and an expression `y` of type `U`, `x.(y)` is
// `y.((U as BindToRef(T)).Op)(x)`
interface BindToRef(T:! type) {
  extend impl as Bind(T);
  fn Op[self: Self](bound ref p: T) -> ref Result;
}

Similarly, the BindToValue interface is changed to use a val/value return, potentially avoiding a copy of large objects.

Before:

carbon

interface BindToValue(T:! type) {
  extend Bind(T);
  fn Op[self: Self](x: T) -> Result;
}

After:

carbon

interface BindToValue(T:! type) {
  extend Bind(T);
  fn Op[self: Self](bound x: T) -> val Result;
}

Use case: class accessors

A ref return can be used to expose the state of an object in a way that can be mutated:

carbon

class Four {
  fn Get[self: Self](i: i32) -> i32 {
    Assert(i >= 0 and i < 4);
    return self.m[i];
  }
  fn GetMut[bound ref self: Self](i: i32) -> ref i32 {
    Assert(i >= 0 and i < 4);
    return self.m[i];
  }
  private var m: array(i32, 4);
}

var x: HasMember = {.m = (0, 2, 4, 6)};
x.GetMut(2) += 1;
fn Check(y: Four) {
  Assert(y.Get(2) == 5);
}
Check(x);

Future work: this will in the future often be done with an overloaded method, as in:

carbon

class Four {
  overload Access {
    fn [bound ref self: Self](i: i32) -> ref i32 {
      Assert(i >= 0 and i < 4);
      return self.m[i];
    }
    fn [self: Self](i: i32) -> i32 {
      Assert(i >= 0 and i < 4);
      return self.m[i];
    }
  }
  private var m: array(i32, 4);
}

var x: HasMember = {.m = (0, 2, 4, 6)};
x.Access(2) += 1;
fn Check(y: Four) {
  Assert(y.Access(2) == 5);
}
Check(x);

This may be a common enough use case that we will want to introduce a dedicated syntax:

carbon

class HasMember {
  fn Access[bound ref? self: Self](i: i32) -> ref? i32 {
    Assert(i >= 0 and i < 4);
    return self.m[i];
  }
  private var m: array(i32, 4);
}

Type completeness

Not a change by this proposal, but note that our existing rules will require the type in a ref binding to be complete in situations where it would not need to be if you were using a value binding with a pointer type instead. We may need to change this in the future to match C++ which treats reference types more like pointer types for completeness purposes.

After this change, a ref binding to type T will require T to be complete in the same situations that other bindings to type T require T to be complete.

Pointer value representation

Purely as a change in syntax, the way to specify that the value representation of a class uses a pointer is changed from writing const Self * to const ref.

Future work

Temporary lifetimes

For safety, we need bindings and returns that reference storage to only be used while that storage remains valid. When the referenced storage is owned by a temporary, we have a choice to either control the lifetime of the temporary or diagnose when the lifetime of the temporary is insufficient. Deciding on our policy is future work.

Note that in many cases we can explicitly provide storage in a variable instead of referencing a temporary. For example, using var x: ... = ReturnsATemporary(); instead of let ref x: ... = ReturnsATemporary();. This won't apply in all situations, though, such as temporaries that are reachable transitively through pointers.

`ref` bindings in lambdas

We have already identified future work to support reference captures in lambdas as part of proposal #3848. This might be a reason to support ref bindings as fields of objects, with all the restrictions that comes with that.

Interaction with effects

We still need to determine how references and the other return types interact with effects, like Optional, errors, co-routines, and so on. For example, we don't want to give up the benefits of being able to directly return a reference when a function has an error path.

It is unclear if this will mean putting references into the object type system, but we may be able to handle this with additional types or the ability to customize return representations. For example, we might have an alternate version of the Optional type that holds a reference:

carbon

class OptionalRef(T:! type) {
  fn Make(bound ref r: T) -> Self {
    return {.p = &r};
  }
  fn MakeEmpty() -> Self {
    return {.p = Optional(T*).MakeEmpty()};
  }
  fn HasValue[self: Self]() -> bool {
    return p.HasValue();
  }
  fn Get[bound ref self: Self]() -> ref Result {
    Assert(self.HasValue());
    return *self.p.Get();
  }

  private var p: Optional(T*);
}

More precise lifetimes

More precise lifetime tracking will be considered with the memory safety design. For example, the bound approach does not distinguish different components of a compound return, or different parts of a parameter object that might have different lifetimes.

Combining with compile-time

We plan to support references to compile-time state when executing a function at compile time. That will be part of a future proposal.

Interaction with `Call` or other interfaces

For now, ref is not represented in the Call interface introduced in proposal #2875: Functions, function types, and function calls. This will be tackled together in a future proposal with other aspects of bindings not represented by the type, such as var and compile-time, along with being generic across these aspects of bindings.

Destructuring assignment

Having more support for multiple returns from a function opens the question of how to do different things with the different returns. We may want a syntax for saying some of the returns are bound to new names, and some are used in assignments to existing variables. One possibility would be to have some pattern syntax for re-initializing an existing object, as in:

carbon

fn F() -> (-> bool, ->T);
fn G() {
  var x: T = ...;
  Consume(~x);
  let (b: bool, init x) = F();
  // Continue to use `x`...
}

This was discussed in the #syntax channel on Discord on 05-19-2025.

Restore `addr`

There are two reasons we might restore addr as an alternative to ref for the implicit self parameter:

As a way to express uses of self that we want to disallow for ref bindings generally but need an escape hatch for migrated C++ code that relies on these patterns.
To allow the self parameter to use one of the pointer semantics we create as part of the upcoming memory safety design, that can't be achieved with ref.

Rationale

This proposal tries to advance these Carbon goals:

Performance-critical software
- Having a "move-in-move-out" option as a calling convention is a potential performance improvement for using ref parameters instead of pointers.
- Giving additional options for the return convention gives opportunities for improved performance. Having this set by explicit return markings is about giving control and predictability to the code author.
Code that is easy to read, understand, and write
- ref bindings and returns avoid the ceremony of round-tripping through pointers.
Practical safety and testing mechanisms
- Checking that reference bindings are not dangling is important for avoiding use-after-free bugs.
- bound markings on parameters to allow safety equivalent to Clang's [[clang::lifetimebound]] when returning a reference.
Interoperability with and migration from existing C++ code
- Including references in Carbon allows for less mismatch for C++ code using references.

Alternatives considered

These ideas were discussed in open discussion on:

They were also discussed in the #pointers-and-references channel in Discord starting 2025-05-05, and #syntax on 2025-05-14.

No `ref`, only pointers

The rationale to add ref instead of staying with pointers was discussed on 2025-05-01. In addition to the motivating problems given in the "Problem" section, that discussion included some additional depth to the reasons to add reference bindings.

There is a tension between wanting to have mutating expressions and only having pointers. You need some concept like a reference in order to mutate an object with an expression. The question is how small a box the references are restricted to, and where the line is drawn. C has lvalues, which contain references but are restricted to a quite small box. Reference bindings specifically are about keeping a small box around references while still adding enough expressivity to support our use cases. We have started with a model similar to C, but it fell down when it comes down to composition. Decomposing an expression into pieces loses the tools the expression provided to you. The missing tool for that was reference bindings.

We saw how much we were leaning on value bindings. The asymmetry between having value binding but not reference bindings when have value expressions and reference expressions was creating pressure. For example, when accessing members of an object, we had to escape to pointers in that operator.

One downside of this change is that before indirections were more visible in the code.

Also, this does fundamentally mean that we now have another kind of "pointer", potentially adding complexity to any memory-safety story. However, this ship already sailed to some extent with value bindings. Fundamentally, bindings are allowed to have pointer-like semantics from a lifetime perspective, and so will need to be considered as a pointer-like thing as we build out lifetime safety.

Remove pointers after adding references

If we removed pointers after adding references, we would need something rebindable for assignable objects. The viable path forward without separate pointers and references is to have something rebindable like pointers but automatically dereferenced like references, which is the approach Rust takes. See this comment on issue #5261.

One of the features of a reference is what it cannot do, and we would have to remove those restrictions to be able to satisfy the pointer use cases with references.

Allow `ref` bindings in the fields of classes

A type with reference binding fields would need a lot of restrictions since reference bindings are not assignable. We did not see enough motivation to put references into objects, given the complexity that it would introduce, so we are keeping references out of types for now. This could change to support lambda reference captures.

No call-site annotation

This question was discussed on 2025-05-07.

We decided that the marking is not about lifetime, but ability to mutate. A val may reference an object in a similar way to a ref, restricting operations on the original object, but we are not going to mark vals since those restrictions are enforced by the compiler. We thought the ability to mutate, though, was something important enough to highlight to readers of the code, even at the expense of extra work for the writer.

Swift inout parameters are marked at caller with an & before the argument. Jordon Rose has published a regret that they didn't use inout to mark the argument instead of &.

On the other hand, not marking is not known to be a source of bugs.

This is a "try it and see how well it works" sort of decision.

Top-level `ref` introducer

For now, we don't believe let ref to be so common as to need a shorter way to write, unlike what we do for var. This was considered in leads issue #5523, which provided this rationale:

I feel like this would often be used for non-local mutation due to it fundamentally deporting mutable value semantics and instead having reference semantics. Unlike the local mutation, that seems more worthwhile to have incentives around minimizing.

However, this seems the easiest of all to revisit later if we discover that the added verbosity in practice is costing more than is worth any improvements from explicitly flagging mutable reference semantics, or if we find code is reaching for antipatterns due to the incentive.

In addition, this comment on leads issue #5522 argued that it would be more consistent for ref and val to only apply to bindings, and not introduce patterns, like let and var.

Allow immutable value semantic bindings nested within variable patterns

This was considered in leads issue #5523, which provided this rationale:

While it may be an obvious point of orthogonality, I think it adds choice without sufficient motivation, and even having that choice does add some complexity to the language.

It also seems like we could add this later if there is sufficient demand when we have larger usage experience body to pull from with the rest of the Carbon language. Currently, the affordance that feels more natural to me is what we have.

I think we're happy to see motivating use cases and revisit this. At the moment, we've just not seen motivating use cases -- everything has seemed a bit too contrived.

Remove `var` as a top-level statement introducer

This was considered in leads issue #5523, which provided this rationale:

Locals are important, frequent, and frequently mutable. I don't think forcing varying locals to go through let var for orthogonality aids readability enough to offset the verbosity cost of added keywords on a reasonably common pattern.

I'm still currently in the position that locally owned objects being mutable should not be "discouraged" or "disincentivized" by the language. And I think adding artificial incentives to try and avoid needing a mutable local variable would either have no effect beyond verbosity, or if it did have effect, it wouldn't be a net positive effect due to code being written in a less straightforward manner in order to avoid mutation.

To be clear, this is based on intuition and judgement based on my experience, not in any way based on data or specific motivating examples. I can imagine data or evidence or even a new perspective changing my position here, but so far the discussion we've had haven't done that.

`ref` as a type qualifier

The big concern is that any effect that is represented by a type, like Optional or Result, will want to compose with reference returns. This could be done by allowing ref to create an object type that could be used as a parameter to those, as in Optional(ref T), but we are trying to avoid going down that path. We have future work to tackle this problem specifically.

There was also a concern that we might need ref types to represent argument lists with tuples, but tuples already can't represent var or compile-time parameters. We have other plans for this, instead of trying to stretch tuples to encompass these use cases.

We also noted that including references in the type system led to a number of inconsistencies in C++, such as no there not being references of references.

`bound` would change the default return to `val`

We considered saying that bound would change the default return to use the ->val return convention. This was discussed on 2025-05-01 and 2025-05-08. The idea is that val is expected to be efficient, so we should encourage using it, but we can't always use val, since some types have a reference value representation, but bound alleviates that concern.

Once we realized that bound is relevant for all return conventions, we reconsidered that approach, since has a number of concerns:

Changing defaults is action at a distance, changing the behavior without changing the code in the relevant location.
We don't want to have to make changes to the return category of copy-paste of a function from an interface to an impl of it when removing bound.
Lifetimes in Rust and Clang's [[clang::lifetimebound]] don't change calling conventions, only what code is valid.

Going with an approach where less depends on bound makes sense for now, since we are going to reconsider these issues as part of our upcoming memory safety work.

Other return conventions

We also considered other conventions for returning from functions, in a comment on #5434, on 2025-05-08 and on 2025-05-12, most notably:

in place: This convention was like ->, but always using the "in place" convention where the caller allocates storage and provides the callee with its address, the callee initializes the storage at that address, and the caller is responsible for destroying after the return.
var without storage: The callee returns a pointer to the storage of a subobject of a bound var parameter, that caller is then responsible for destroying. A call to this function is reference expression, but with additional responsibility to destroy.
hybrid: If the type has a copy value representation or trivial destructive move then return the object representation directly; otherwise caller passes a pointer and callee initializes it.

There were also some variations on what the conditions for returning in registers using the default return convention.

We seriously considered "var without storage", but the fact that it couldn't reliably be used to initialize a variable, particularly in the middle of an object, meant it did not seem valuable enough to include.

It seemed more valuable to support the "in place" return convention. That return form allows you to guarantee knowing the address of the object being constructed, and was a good match for returned var. However, we realized that var declarations shouldn't always be associated with in-memory storage, in particular for types that may be trivially moved. For example, a var parameter with the C++ type std::unique_ptr<T> should be passed in registers. A function returning a std::unique_ptr<T> in place would not be as efficient as returning it by moving it into registers.

`return var` with compound return forms

We considered various syntax options on 2025-05-12, but none of them seemed good enough to justify inclusion at this time:

carbon

fn F(...) -> (ref R, val L, V) {
  // No longer a `var` being returned. Ideally these
  // shouldn't have to be initialized together.
  returned ??? (ref r: R, val l: L, var v: V) = ...?
  return var;
}

// We could restrict to one `var` return component,
// but this is a lot of machinery for a small increase
// in expressiveness and applicability.
fn F(...) -> (ref R, val L, V) {
  returned var v: V = ...;
  let l: L = ...;
  return (*r, l, var);
}

fn F(...) -> (ref R, val L, V) {
  // These don't have the right category, and ideally
  // shouldn't have to be initialized together.
  returned var ret: (R, L, V) = ...?
  return var;
}

fn F(...) -> (ref R, val L, V) {
  returned var (_, _, var v: V) = <what goes here?>;
}

There was another approach we considered for returned var originally:

carbon

fn F(...) -> (ref R, val L, v1: V1, var v2: V2) {
  // ...
  // Must use the same names for the `var` (implicit or explicit) returns
  // with bound names.
  return (r, l, v1, v2);
}

But this had downsides that still apply:

Requires V1 and V2 to have unformed states. Otherwise, v1 and v2 would need be initialized when they are declared.
This does not support only having some branches use return var.

Our current approach handles our main use case for returned var: factory functions.

We could support an "only vars" approach in the future if we want:

carbon

fn F(...) -> (var V1, var V2, var V3) {
  returned (var v1: V1, var v2: V2, var v3: V3) = ...;
  // ...
  return var;
}

Other syntax for compound return forms

We considered other options for the syntax of compound return forms on 2025-05-13, #syntax in Discord on 2025-05-14, and 2025-05-14. The option of omitting the -> in each component did not distinguish tuples from tuple return forms sufficiently:

carbon

-> (ref i32, var i32)
-> (bool, ref i32)

// Is this a single return of a tuple, or a triple
// return using the default return convention?
-> (i32, i32, i32)

We also considered an approach where compound return forms would start with ->?, but this raised concerns about what the meaning of that syntax would be and whether we want to expose users to that in cases we might be able to avoid it.

The original proposed syntax used an arrow -> in each component of a compound form.

carbon

fn TupleReturn(...)
    -> (->val bool, ->ref i32, -> C);

fn StructReturn(...)
    -> {->val .a: bool,
        ->ref .b: i32,
        -> .c: C};

This avoided ambiguity, but was verbose and visually noisy. An alternative was suggested in discussion on 2025-06-13. This alternative used a default of compound returns for paren (...) and brace {...} expressions, which you could opt out of by using one of the three category keywords such as var to introduce a type expression that would not be considered a compound return.

This had the downside that -> T would be interpreted as -> var T even if T was a tuple type like (i32, i64). However, textually substituting in (i32, i64) in for T to get -> (i32, i64) would instead be interpreted as -> (var i32, var i64).

To overcome this problem, in discussion on 2025-06-16, we switched to the approach from this proposal.

`ref` parameters allow aliasing

If requiring ref parameters to be noalias ends up being too restrictive, we could instead have the "move-in-move-out" optimization be done only when the compiler can prove it safe. One strategy would be to generate an alternate version of the function that is only used in the cases where the noalias conditions can be shown to hold statically.

`let` to mark value returns instead of `val`

This proposal initially used let instead of val to mark immutable value returns. However, let is used, in Carbon and other languages, primarily to bind names. In Carbon, the default binding is a value binding, but that was not considered a close enough to connect the let keyword to value semantics.

There was also a concern about reusing let in multiple contexts to mean different things, and having separate keywords that were only used to mark the category of the binding was deemed better separation of concerns.

We considered making a parallel change to use init instead of var, but this had some problems:

By making initializing returns the default, there is little expected usage, so perhaps not worth spending another keyword on.
The init keyword would be particularly expensive, because C++ code commonly use that word in APIs.

This question was considered in leads issue #5522.

`=>` infers form, not just type

There was support for the idea that the => return syntax introduced in proposal #3848 should deduce the form of the return, not just its type. This was discussed in #lambdas on 2025-05-20.

However, trying to infer the expression category from the category of the expression after the => runs into the problem of this often requiring the parameters to be marked bound. A more complicated rule, for example using whether any parameter is marked bound, could be adopted in the future, if the simple rule proves inadequate.

Alternatively, the compiler could infer which parameters should be marked bound in this case. That is something to consider with the memory safety design.

`ref` parameters, arguments, returns and `val` returns

ref parameters, arguments, returns and val returns

Table of contents

Abstract

Problem

Background

Proposal

ref bindings

ref and val returns

Details

Compound return forms and patterns

Nested binding patterns

Mututation restriction on objects bound to a value

No optimization on erroneous behavior

bound parameters

How addresses interact with ref

Improved interop and migration with C++ references

Part of the expression type system, not object types

Interaction with returned var

Use case: Deref interface

Use case: indexing interfaces

Use case: member binding interfaces

Use case: class accessors

Type completeness

Pointer value representation

Future work

Temporary lifetimes

ref bindings in lambdas

Interaction with effects

More precise lifetimes

Combining with compile-time

Interaction with Call or other interfaces

Destructuring assignment

Restore addr

Rationale

Alternatives considered

No ref, only pointers

Remove pointers after adding references

Allow ref bindings in the fields of classes

No call-site annotation

Top-level ref introducer

Allow immutable value semantic bindings nested within variable patterns

Remove var as a top-level statement introducer

ref as a type qualifier

bound would change the default return to val

Other return conventions

return var with compound return forms

Other syntax for compound return forms

ref parameters allow aliasing

let to mark value returns instead of val

=> infers form, not just type

`ref` parameters, arguments, returns and `val` returns

`ref` bindings

`ref` and `val` returns

`bound` parameters

How addresses interact with `ref`

Interaction with `returned var`

Use case: `Deref` interface

`ref` bindings in lambdas

Interaction with `Call` or other interfaces

Restore `addr`

No `ref`, only pointers

Allow `ref` bindings in the fields of classes

Top-level `ref` introducer

Remove `var` as a top-level statement introducer

`ref` as a type qualifier

`bound` would change the default return to `val`

`return var` with compound return forms

`ref` parameters allow aliasing

`let` to mark value returns instead of `val`

`=>` infers form, not just type