exp-simd-vectorization
Installation
SKILL.md
SIMD Vectorization
Decision Gate
- Check
Span<T>andMemoryExtensionsfirst. If the operation can be expressed using built-inSpan<T>methods (e.g.,Contains,IndexOf,CopyTo,SequenceEqual) orMemoryExtensions, use them — no additional dependency is needed and the runtime already vectorizes many of these internally. - Check for TensorPrimitives next. If one or more TensorPrimitives methods cover the operation → use them. If the
.csprojdoes NOT already referenceSystem.Numerics.Tensors, add the package, for example:<PackageReference Include="System.Numerics.Tensors" />(or use the versioning approach already used by your solution). Then replace the scalar loop with TP calls and stop. See the full API table below. Compose multiple TP calls when needed (e.g., finding both min and max →TensorPrimitives.Min(span)+TensorPrimitives.Max(span)as two calls). Do NOT write manual Vector128 code for operations TP already handles. - Scalar loop over contiguous array/span of
byte,sbyte,short,ushort,int,uint,long,ulong,nint,nuint,float,double(andcharvia reinterpretation asushort)? → Implement with explicitVector128<T>/Vector256<T>/Vector512<T>intrinsics using the patterns below. - No contiguous numeric arrays to process (dictionary lookups, tree traversals, linked lists, state machines, string formatting, small collections, enum comparisons, recursive algorithms, decimal arithmetic)? → Report
[NO SIMD OPPORTUNITY]and write a full paragraph explaining WHY, referencing the specific code characteristics that prevent vectorization (e.g., "State machines require sequential branching on enum values — there are no contiguous numeric arrays to process in parallel, and each transition depends on the previous state"). This explanation is graded.
TensorPrimitives API Reference
TensorPrimitives APIs are generic and work for any primitive type that satisfies the method's generic constraints — not just float/double. For example, Sum requires IAdditionOperators<T,T,T> + IAdditiveIdentity<T,T> and works for all primitive numeric types, while CosineSimilarity requires IRootFunctions<T> and only works for float/double. If the project doesn't already reference System.Numerics.Tensors, add it to the .csproj. Replace the entire manual loop with one or more TensorPrimitives calls as needed (prefer a single call when possible):