3rdParty/highway/g3doc/op_wishlist.md
[TOC]
atan2, cbrt, cosh, erf, exp2, fmod, hypot, ilogb, lgamma, logb, modf, nextafter, nexttoward, pow, scalbn, tan, tgamma
Port https://github.com/richgel999/sserangecoding to Highway (~50 instructions).
= Not(FirstN()), replaces several instances. WHILEGE on SVE.
For crypto. Native on Icelake+.
Potentially useful for comparing neighbors e.g. for RLE.
rgather_vx for broadcasting redsum result?Broadcast,
Interleave, LoadDup128): use 64-bit for initial shuffle. For
TwoTablesLookupLanes, use 16-bit indices.RotateRightCombineShiftRightBytes use TableLookupLanes instead?Shuffle*: use TableLookupLanes instead?#pragma unroll(1) in all loops to enable autovectorizationReuse same wasm256 file, #if for wasm-specific parts. Use reserved avx slot.
MaxOfLanes, MinOfLanes returning scalarAvoids extra broadcast.
For orthogonality; already done for x86+NEON.
For iterating in hash table.
For hash tables. Use VPCONFLICT on ZEN4.
PromoteToEvenFor WidenMul, MinOfLanes.
DupEven for 16-bitUse in MinOfLanes (helps NEON).
For tolower (subtract if in range) or hash table probing.
Issue 633.
AddSubInterval arithmetic?
Dup128TableLookupBytesAvoids having to add offset on RVV. Table must come from LoadDup128.
LoadPromoteToFor SVE (svld1sb_u32)+WASM? Compiler can probably already fuse.
OddEven for <64bit lanes: use Set of wider constant 0_1Reverse2 16-bitReverse2 for 8-bitTwoTablesLookupLanesTableLookupLanesFindLastTruePromoteTo for all types (#915)