eng/StackExchange.Redis.Build/AsciiHash.md
Efficient matching of well-known short string tokens is a high-volume scenario, for example when matching RESP literals.
The purpose of this generator is to efficiently interpret input tokens like bin, f32, etc - whether as byte or character data.
There are multiple ways of using this tool, with the main distinction being whether you are confirming a single
token, or choosing between multiple tokens (in which case an enum is more appropriate):
When using individual tokens, a static partial class can be used to generate helpers:
[AsciiHash] public static partial class bin { }
[AsciiHash] public static partial class f32 { }
Usually the token is inferred from the name; [AsciiHash("real value")] can be used if the token is not a valid identifier.
Underscores are replaced with hyphens, so a field called my_token has the default value "my-token".
The generator demands all of [AsciiHash] public static partial class, and note that any containing types must
also be declared partial.
The output is of the form:
static partial class bin
{
public const int Length = 3;
public const long HashCS = ...
public const long HashUC = ...
public static ReadOnlySpan<byte> U8 => @"bin"u8;
public static string Text => @"bin";
public static bool IsCS(in ReadOnlySpan<byte> value, long cs) => ...
public static bool IsCI(in RawResult value, long uc) => ...
}
The CS and UC are case-sensitive and case-insensitive (using upper-case) tools, respectively.
(this API is strictly an internal implementation detail, and can change at any time)
This generated code allows for fast, efficient, and safe matching of well-known tokens, for example:
var key = ...
var hash = key.HashCS();
switch (key.Length)
{
case bin.Length when bin.Is(key, hash):
// handle bin
break;
case f32.Length when f32.Is(key, hash):
// handle f32
break;
}
The switch on the Length is optional, but recommended - these low values can often be implemented (by the compiler)
as a simple jump-table, which is very fast. However, switching on the hash itself is also valid. All hash matches
must also perform a sequence equality check - the Is(value, hash) convenience method validates both hash and equality.
Note that switch requires const values, hence why we use generated types rather than partial-properties
that emit an instance with the known values. Also, the "..."u8 syntax emits a span which is awkward to store, but
easy to return via a property.
In some cases, you want to be able to say "match this value, only known at runtime". For this, note that AsciiHash
is also a struct that you can create an instance of and supply to code; the best way to do this is inside your
partial class:
[AsciiHash]
static partial class bin
{
public static readonly AsciiHash Hash = new(U8);
}
Now, bin.Hash can be supplied to a caller that takes an AsciiHash instance (commonly with in semantics),
which then has instance methods for case-sensitive and case-insensitive matching; the instance already knows
the target hash and payload values.
The AsciiHash returned implements IEquatable<AsciiHash> implementing case-sensitive equality; there are
also independent case-sensitive and case-insensitive comparers available via the static
CaseSensitiveEqualityComparer and CaseInsensitiveEqualityComparer properties respectively.
Comparison values can be constructed on the fly on top of transient buffers using the constructors that take arrays. Note that the other constructors may allocate on a per-usage basis.
When identifying multiple values, an enum may be more convenient. Consider:
[AsciiHash]
public static partial bool TryParse(ReadOnlySpan<byte> value, out SomeEnum value);
This generates an efficient parser; inputs can be common byte or char types. Case sensitivity
is controlled by the optional CaseSensitive property on the attribute, or via a 3rd (bool) parameter
bbon the method, i.e.
[AsciiHash(CaseSensitive = false)]
public static partial bool TryParse(ReadOnlySpan<byte> value, out SomeEnum value);
or
[AsciiHash]
public static partial bool TryParse(ReadOnlySpan<byte> value, out SomeEnum value, bool caseSensitive = true);
Individual enum members can also be marked with [AsciiHash("token value")] to override the token payload. If
an enum member declares an empty explicit value (i.e. [AsciiHash("")]), then that member is ignored by the
tool; this is useful for marking "unknown" or "invalid" enum values (commonly the first enum, which by
convention has the value 0):
public enum SomeEnum
{
[AsciiHash("")]
Unknown,
SomeRealValue,
[AsciiHash("another-real-value")]
AnotherRealValue,
// ...
}
The tool has an additional facility when it comes to enums; you generally don't want to have to hard-code things like buffer-lengths into your code, but when parsing an enum, you need to know how many bytes to read.
The tool can generate a static partial class that contains the maximum length of any token in the enum, as well
as the maximum length of any token in bytes (when encoded as UTF-8). For example:
[AsciiHash("SomeTypeName")]
public enum SomeEnum
{
// ...
}
This generates a class like the following:
static partial class SomeTypeName
{
public const int EnumCount = 48;
public const int MaxChars = 11;
public const int MaxBytes = 11; // as UTF8
public const int BufferBytes = 16;
}
The last of these is probably the most useful - it allows an additional byte (to rule out false-positives), and rounds up to word-sizes, allowing for convenient stack-allocation - for example:
var span = reader.TryGetSpan(out var tmp) ? tmp : reader.Buffer(stackalloc byte[SomeTypeName.BufferBytes]);
if (TryParse(span, out var value))
{
// got a value
}
which allows for very efficient parsing of well-known tokens.