stacker/Performance.md
extend_from_slice(&key) calls memcpy, which is relatively slow, since most keys are relatively short. For now there's a specialized version toavoid memcpy calls.
Wild copy 16 bytes in a loop is faster, but would require a guard against overflow from the caller side. (We probably can do that).fastcmp and fastcpy both employ the same trick, to compare slices of odd length, e.g. 2 operations unconditional on 4 bytes, instead 3 operations with conditionals (1 4byte, 1 2byte, 1 1byte). [1, 2, 3, 4, 5, 6, 7] [1, 2, 3, 4] [4, 5, 6, 7]
Default impls.