folly/result/docs/future_small_value.md
result?result<T> is effectively Expected<storage_type, error_or_stopped>, where the
error type wraps rich_exception_ptr (REP). REP is a bit-discriminated variant
that already reserves SMALL_VALUE_eq for future small-value storage.
On 64-bit platformse, at least 61 bits are available:
Trivially copyable values with sizeof(T) <= 4 && alignof(T) <= 4 can be
stored in the top 4 bytes and transparently exposed by-reference.
Non-trivial objects <= 4 bytes could also be stored transparently, but there are few use-cases for lifetime-managed objects smaller than a pointer.
Aligned pointers (e.g., T*, std::unique_ptr<T> with alignof(T) >= 8)
fit in 61 bits, but cannot support standard by-reference accessors — this
requires specialization-driven access APIs.
The optimization reduces result<T> from 16 to 8 bytes for eligible types. No
immediate implementation plans exist, but the design preserves feasibility.
Later in this file, you will find a discarded implementation of small-value storage, with lifetime management done by rich_exception_ptr<SmallValue>.
I removed that approach since it conflicts with providing const reference
access to error_or_stopped from result<SV>.
const error_or_stopped& error_or_stopped() const& {
The result could store some kind of error_or_stopped_impl<SV> that manages
lifetime, and play a type-punning game to expose error_or_stopped with the
same bit layout, but incapable of small-value access. However, this would add
even more UB to the implementation, and doesn't seem like the right design.
So, the preferred design is instead to branch result on small-value vs not,
where it remains TBD whether result<int> etc should automatically opt into
the optimization, or we should use some kind of marker type.
When sizeof(T) <= 4 and alignof(T) <= 4, like uint32_t, the small-value
optimization can be fully transparent -- the mangled small value and "packed"
union bits can coexist in the same 8 bytes. So,by-reference accessors like
this, or the equivalent or_unwind, will just work:
const T& value_or_throw() const&;
Some 5-7 byte types also fit by shifting bits on LE systems (or no shift on BE):
struct A { char c [5]; };
struct alignas(2) B { char c[5]; };
Thanks to the low alignment requirements, and sizeof 5 & 6 respectively, we can simply shift them up by 1-2 bytes on LE systems, or leave them in place on BE systems, and not conflict with the union bits.
However, not all structs that look like they fit in 7 bytes can actually fit. One problem is that alignment will waste the first alignof(T) bytes on LE systems. Another problem is that not all padding is reusable padding. See: https://github.com/llvm/llvm-project/issues/125863
The above small value types can be termed "transparently accessible". But we
can also consider "non-transparent" storage, which lets us fit anything that
needs less than 61 bits. This is most compelling for unique_ptr<T> and
friends.
To implement small-value support for result<pointer-like>, the user-facing
API has to change. Specifically, to support thread-safe const value access,
when the low-bits of the pointer are occupied, we cannot use the 8 bytes owned
by the result to provide by-reference accessors. This means we can only
provide:
(1) By-value access for safely copyable stored types like T* or
std::reference_wrapper<T> (powering result<T&>), OR
(2) Use a specializable trait to define copyable "reference-like" types for
these. For example, result_small_value_ref_t<unique_ptr<T>> might be
prototyped as just T* (though a full version should reference the actual
result and demangle it to implement get/release/reset/etc).
This generalization of (1) is a bit laborious, but it's probably
worth it since only a handful of pointer-like types are ever likely
to opt into the small-value optimization.
(3) Only have "move-out access" for move-only types like unique_ptr<T>, OR
(4) (maybe not) Can specialize the awaiter (or_unwind) to provide ephemeral
storage for by-reference access. The reason this is a dubious idea is that
these refs would only work in the same full-expression as the co_await,
whereas the lifetime of normal refs is as long as the result being awaited.
This by-ref usage CAN be useful for pointers anyway, if you deref them
in that same full expression—after all, the contained pointer has sane
lifetime—but it's tricky to define semantics that aren't excessively
dangerous. It's worth investigating if any of the clang lifetime safety
attributes could make this usage robust.
(5) (maybe) A small_value_ref_guard-like (bleh) or with_value()-like
(safer) API that works uniformly even for unique_ptr.
My current preferred resolution for the above pointer troubles is to implement
(1), and (2) for unique_ptr and perhaps a few other highly useful types.
For whatever result instantiations are opted in, we need a separate
result_crtp that only stores error_or_stopped and manages small-value
lifetime on top, following a5f4e603c5151fae144437f99a9bea069745e8ea.
Careful: The above rev has some minor deviations from the desired end
state for when result_crtp / result are managing value lifetime.
Obviously, these need to be lifted -- and I started sketching this below:
Furthermore, REP itself needs changes. Specifically:
result impl:
Other things to tidy:
We of course reuse detail/rich_exception_ptr_small_value.h, but rename
everything to say "result" instead of "rich exception".
The logic there should be updated in view of the "what can we store
with transparent access" discussion above. Having result support
result_small_value_ref_t accessors for non-transparently accessible types
is probably also worthwhile.
It is possible for a type not to be trivially copyable, but to have a trivial move ctor or copy ctor. Need to consider if these are worthwhile automatically opting into small-value storage.
The discussion of bitwise relocatability needs to mention:
folly::IsRelocatable / FOLLY_ASSUME_RELOCATABLEis_replaceable, but it's worth
double-checking if that's relevant to our needs here.As noted above, this implementation was discarded because implementing
small-value lifetime management within REP directly conflicts with result
being able to expose its error_or_stopped() by reference, which is an important
use-case.
Nonetheless, these notes should be instructive in building similar lifetime
management into a specialization of result.
Below, SmallValue was a REP template param with the type being stored.
static_assert(
std::is_void_v<SmallValue> ||
rich_exception_ptr_valid_small_value<SmallValue>);
static constexpr bool MayHaveSmallValue = !std::is_void_v<SmallValue>;
// Explained on the ctors and in `copy_from_impl` / `move_from_impl`
static constexpr bool MayCopySmallValue =
!MayHaveSmallValue || std::is_nothrow_copy_constructible_v<SmallValue>;
static constexpr bool MayMoveSmallValue =
!MayHaveSmallValue || std::is_nothrow_move_constructible_v<SmallValue>;
Any implementationw would need a simple setter for the storage:
template <rich_exception_ptr_valid_small_value SV = SmallValue>
constexpr void set_small_value_and_bits(SV& sv, B::bits_t bits) {
B::apply_bits_after_setting_data_with(
[&sv](auto& d) {
d.uintptr_ =
rich_exception_ptr_small_value_traits<SmallValue>::to_mangled(sv);
},
bits);
}
The destructor destroy_owned_state_and_bits_caller_must_reset grew:
if constexpr (MayHaveSmallValue) {
if (B::SMALL_VALUE_eq == B::get_bits()) {
small_value_ref_guard<SmallValue> sv{B::get_uintptr()};
sv.ref().~SmallValue();
return;
}
}
The debug_assert in copy_unowned_pointer_sized_state got another clause:
// We use this code path to copy / move small values, if the corresponding ctor
// is trivial.
//
// Future: If benchmarks show that major compilers consistently optimizes away
// the combo of (unmangle, trivial ctor, mangle), then remove the optimizations
// and this `||` clause.
|| (B::SMALL_VALUE_eq == other.get_bits() &&
(std::is_trivially_copy_constructible_v<SmallValue> ||
std::is_trivially_move_constructible_v<SmallValue>))
In copy_from_impl, we had to run the copy ctor of the small value:
if constexpr (
MayHaveSmallValue &&
!std::is_trivially_copy_constructible_v<SmallValue>) {
// When a small value has a nontrivial copy ctor, we have to run it --
// just copying the bits may violate the type's internal invariants, even
// if it's bitwise-relocatable (silly example: it counts copy ops).
//
// Note that we implement both copy construction & assignment in terms of
// the `SmallValue` copy ctor. We cannot use `SmallValue::operator=`,
// since the destination might not even be in a valid `SmallValue` state.
if (B::SMALL_VALUE_eq == other.get_bits()) {
small_value_ref_guard<SmallValue> other_sv{other.get_uintptr()};
alignas(SmallValue) char buf[sizeof(SmallValue)];
// Future: We might want to relax this for copies, but then there's
// some work to do to ensure that throwing during copy-assignment
// doesn't cause the assigned-to object to become invalid.
static_assert(noexcept(SmallValue{std::as_const(other_sv.ref())}));
auto& sv = new (&buf) SmallValue{std::as_const(other_sv.ref())};
set_small_value_and_bits(sv, other.get_bits());
return;
}
// else: Fall back to safe & fast `copy_unowned_pointer_sized_state`
}
Similarly, move_from_impl has to run the move ctor:
if constexpr (
MayHaveSmallValue &&
!std::is_trivially_move_constructible_v<SmallValue>) {
// Run nontrivial move ctors, for the same reason as in `copy_from_impl`.
// The stronger reason is that we want to match the moved-from state
// semantics of `std::expected` / `folly::{Try,Expected,result}`, which
// preserves the discriminator & keeps the value. Here, that requires
// leaving `other` containing a moved-out `SmallValue`.
if (B::SMALL_VALUE_eq == other_bits) {
small_value_ref_guard<SmallValue> other_sv{other.get_uintptr()};
alignas(SmallValue) char buf[sizeof(SmallValue)];
// Why? (1) Move should generally not throw. (2) Throwing today
// leaves an invalid assigned-to object, same as `copy_from_impl`.
static_assert(noexcept(SmallValue{std::move(other_sv.ref())}));
auto& sv = new (&buf) SmallValue{std::move(other_sv.ref())};
set_small_value_and_bits(sv, other_bits);
// Leave `other` with the moved-out small value.
other.set_small_value_and_bits(other_sv, other_bits);
return;
}
// else: Fall back to safe & fast `copy_unowned_pointer_sized_state`
}
REP today has a small-value debug-fatal stub in compare_equal, but a
result-based implementation would of course need to actually handle
comparisons correctly, along these lines:
if (B::SMALL_VALUE_eq == lbits && B::SMALL_VALUE_eq == rbits) {
small_value_ref_guard<SmallValue> l{lp->get_uintptr()};
small_value_ref_guard<SmallValue> r{rp->get_uintptr()};
return std::as_const(l.ref()) == std::as_const(r.ref());
}
Storing a small-value with nontrivial standard ctors would correspondingly constrain move/copy/assignment of the REP that stored it.
/// Copy constructor & assignment
///
/// `SmallValue` requires **nothrow** copyability, since the current
// `operator=` impls would leave the assigned-to object in an invalid state
// if something threw. This could be fixed.
rich_exception_ptr_impl(const rich_exception_ptr_impl&)
requires(!MayCopySmallValue)
= delete;
constexpr rich_exception_ptr_impl(const rich_exception_ptr_impl& other)
requires MayCopySmallValue
{ ... }
void operator=(const rich_exception_ptr_impl&)
requires(!MayCopySmallValue)
= delete;
rich_exception_ptr_impl& operator=(const rich_exception_ptr_impl& other)
requires MayCopySmallValue
{ ... }
/// Move constructor & assignment
///
/// Only movable if `SmallValue` is **nothrow**-movable, same reason as for
/// the copy ctor. More generally, we're not keen on supporting throwing
/// move, since it's a footgun.
rich_exception_ptr_impl(rich_exception_ptr_impl&&)
requires(!MayMoveSmallValue)
= delete;
constexpr rich_exception_ptr_impl(rich_exception_ptr_impl&& other) noexcept
requires MayMoveSmallValue
{ ... }
void operator=(rich_exception_ptr_impl&&)
requires(!MayMoveSmallValue)
= delete;
rich_exception_ptr_impl& operator=(rich_exception_ptr_impl&& other) noexcept
requires MayMoveSmallValue
{ ... }
detail/rich_exception_ptr_small_value.hThe file below was geared towards bit-packing bigger values into the 61 bits we
have available. You would need something like this to store unique_ptr<T>.
Transparent, aligned, by-reference access to the small value is much more restrictive & simple.
#pragma once
#include <folly/Portability.h> // FOLLY_HAS_RESULT
#include <folly/Utility.h> // FOLLY_DECLVAL
#include <cstring>
#if FOLLY_HAS_RESULT
namespace folly {
namespace detail {
// IMPORTANT: the overall rule is that all the small-value conversions must
// live in this file, and must produce these bottom alignment bits as 0, even
// though the input representation might not have that property.
inline constexpr int min_alignment_for_packed_rich_exception_ptr = 8; // 3 bits
// DANGER: It is UB to use this, unless the memory layout of `T` is guaranteed
// to consist of a single pointer to an object of alignment of at least
// `min_alignment_for_packed_rich_exception_ptr` bytes.
template <typename T>
struct rich_exception_ptr_small_value_aligned_wrapped_pointer {
static constexpr bool value = true;
// Packed storage requires 0s in the bottom bits.
static_assert(
alignof(decltype(*FOLLY_DECLVAL(T))) >=
min_alignment_for_packed_rich_exception_ptr);
// These asserts aim to detect if `T` isn't just a pointer, or similar issues
// that would make the `reinterpret_cast` into risky UB.
static_assert(!std::is_polymorphic_v<T>);
static_assert(sizeof(T) == sizeof(void*));
static_assert(sizeof(T) == sizeof(uintptr_t));
static_assert(alignof(T) >= alignof(uintptr_t));
static uintptr_t to_mangled_uintptr(T& t) {
return *std::launder(reinterpret_cast<uintptr_t*>(&t));
}
static uintptr_t unmangle_uintptr(uintptr_t n) { return n; }
static T& aligned_unmangled_as_ref(uintptr_t* n) {
return *std::launder(reinterpret_cast<T*>(n));
}
};
// Collects the type invariants required across all the supported storage
// classes for `rich_exception_ptr`. In practice, these are the requirements
// of `rich_exception_ptr_packed_storage`. For now, we don't want to encourage
// usage of `rich_exception_ptr` with a fixed `Storage`, so there's no
// mechanism for relaxing the requirements for "separate bits" storage.
//
// In detail, since it's not reasonable to expect end users to know enough
// about the implementation of `rich_exception_ptr` to specialize this safely.
// Instead, see the note about P2786 / IsRelocatable.
template <typename>
// `false` means: types matching the primary template aren't valid small values
struct rich_exception_ptr_small_value_traits : std::false_type {};
template <typename T>
// Packed storage requires 0s in the bottom bits.
requires(alignof(T) >= min_alignment_for_packed_rich_exception_ptr)
struct rich_exception_ptr_small_value_traits<T*>
: rich_exception_ptr_small_value_aligned_wrapped_pointer<T*> {};
// Note: The C++ standard doesn't _require_ for `unique_ptr<T>` to have the
// memory layout of "just a pointer", but all our implementations of interest
// (clang, GCC, MSVC) implement it as-expected for the default deleter. This
// specialization could be further extended to allow empty deleters.
// Conversely, if we have to support lower-quality implementations, conditional
// compilation can be added later.
template <typename T>
// Packed storage requires 0s in the bottom bits.
requires(alignof(T) >= min_alignment_for_packed_rich_exception_ptr)
struct rich_exception_ptr_small_value_traits<std::unique_ptr<T>>
: rich_exception_ptr_small_value_aligned_wrapped_pointer<
std::unique_ptr<T>> {};
template <typename T>
// Future: We would be fine with any "bitwise-relocatable" type. That is, if
// we `memcpy` the `sizeof(T)` bytes from its current location to another
// `alignas(T)` location, then we can safely treat the destination bytes as the
// new location of the object, as-if it was move-constructed in the new
// location, and the old moved-out object was destructed. Compare:
// - `std::unique_ptr<T>` is bitwise-relocatable since the type's internal
// validity in no way depends on its address.
// - `std::list<T>` may NOT be bitwise-relocatable, since it is often
// implemented as a circular linked list.
//
// Reference: folly::IsRelocatable, P2786
requires(sizeof(T) <= 7 && std::is_trivially_copyable_v<T>)
struct rich_exception_ptr_small_value_traits<T> {
static constexpr bool value = true;
static uintptr_t to_mangled(T& t) {
uintptr_t n = 0;
std::memcpy(&n, &t, sizeof(T)); // widen
return n;
}
static void unmangle_to_uninitialized(uintptr_t mangled, T& t) {
uintptr_t n = mangled >> 8;
std::memcpy(&t, &n, sizeof(T)); // narrow
}
};
// Non-lifetime-owning view of the small `T` stored in `data_t::uintptr_` of
// `rich_exception_ptr`. Assumes `rich_exception_ptr_valid_small_value<T>`.
template <typename T>
class small_value_ref_guard {
private:
alignas(T) union {
uintptr_t unused_;
T t_;
};
public:
explicit small_value_ref_guard(uintptr_t mangled) {
rich_exception_ptr_small_value_traits<T>::unmangle_to_uninitialized(
mangled, t_);
}
T& ref() { return t_; }
};
} // namespace detail
// Check if `T` may be stored as a small value in `rich_exception_ptr`.
template <typename T>
concept rich_exception_ptr_valid_small_value =
detail::rich_exception_ptr_small_value_traits<T>::value;
} // namespace folly
#endif // FOLLY_HAS_RESULT